GenBankFactory

Data extraction and normalization pipeline used for bi-monthly GenBank data dumps.

Javadocs should be kept up to date here.

Dependencies:

Import the project into an IDE as "Existing Maven Project"
Create an GenBankFactory.local.properties file in the src folder with your SQL and Lucene details. Refer to GenBankFactory.local.properties.template
Run the build.sh script
The build should run successfully and generate a runnable jar in the target folder.

Always ~~double~~ triple check parameters before building and running the .jar, as it may delete databases.
Any changes to GenBankFactory.local.properties will only be reflected after running a new build.
Allocate at minimum 6GB RAM, preferably 8GB.
Typical usage scenario commands:
- Fresh data dump: nohup java -Xms4G -Xmx8G -jar target/zoophy-genbank-factory-1.x.x-jar-with-dependencies.jar dump create -f gbvrl &
- Re-Run data dump: nohup java -Xms4G -Xmx8G -jar target/zoophy-genbank-factory-1.x.x-jar-with-dependencies.jar dump clean -f gbvrl &
- Rebuild Index: nohup java -Xms4G -Xmx8G -jar target/zoophy-genbank-factory-1.x.x-jar-with-dependencies.jar index &

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
doc		doc
src		src
.gitignore		.gitignore
README.md		README.md
build.sh		build.sh
genbank_dump_script.sh		genbank_dump_script.sh
pom.xml		pom.xml