Data extraction and normalization pipeline used for bi-monthly GenBank data dumps.
Javadocs should be kept up to date here.
- JDK 1.8.x
- Maven 3.x
- PostgreSQL 9.x for SQL Database
- Lucene 5.5.x for Lucene Index
- Java IDE, Spring Tool Suite is recommended
-
Import the project into an IDE as "Existing Maven Project"
-
Create an GenBankFactory.local.properties file in the src folder with your SQL and Lucene details. Refer to GenBankFactory.local.properties.template
-
Run the build.sh script
-
The build should run successfully and generate a runnable jar in the target folder.
- Always
doubletriple check parameters before building and running the .jar, as it may delete databases. - Any changes to GenBankFactory.local.properties will only be reflected after running a new build.
- Allocate at minimum 6GB RAM, preferably 8GB.
- Typical usage scenario commands:
- Fresh data dump:
nohup java -Xms4G -Xmx8G -jar target/zoophy-genbank-factory-1.x.x-jar-with-dependencies.jar dump create -f gbvrl &
- Re-Run data dump:
nohup java -Xms4G -Xmx8G -jar target/zoophy-genbank-factory-1.x.x-jar-with-dependencies.jar dump clean -f gbvrl &
- Rebuild Index:
nohup java -Xms4G -Xmx8G -jar target/zoophy-genbank-factory-1.x.x-jar-with-dependencies.jar index &
- Fresh data dump: