There a number of ETL requirements to support other sub-projects within Phenomantics. They are described here.
- hpo_687_130220.obo
- genes_to_phenotype_26_1302201.txt
For more information, HPO Downloads
The etl/hpo_ingest/hpo_file_converters.scala file is an ETL script that converts the HPO ontology file, hpo_version_date.obo, and the HPO terms to Entrez genes annotation file, genes_to_phenotype_version_date.txt, to properly formatted resource files that can be used by the phenomantics API application.
The output files are: ENTREZ.txt HPO_ALT_IDS.txt HPO_TERMS.txt ENTREZ_HPO_ANNOTATIONS.txt HPO_ANCESTORS.txt
The output files should be moved to apps/api/src/main/resources
- DataExpress
- hpo_BUILD_DATE.obo
- genes_to_phenotype_BUILD_DATE.txt
The genesis application generates sample similarity scores for all genes for a randomly selected set of phenotype queries of length k. This data is output to value delimited files. The etl/gp_dist_ingest/genesis.scala script will import process the raw similarity scores to create CDF values and store them to a database with the proper configuration for use with the phenomantics API application. To run the ingest script as is:
-
Move the genesis output data files to etl/gp_dist_ingest/data/SIMFUNC_k_## where SIMFUNC is one of ASymSim1, SymSim1, SymSim2 and ## is the number of query terms used in generating the data (prefix a leading 0 if k<10, e.g. 01, 02)
-
Create etl/gp_dist_ingest/conf/postgres.properties file with following content (values in CAPS must be replaced with appropriate values)
driverClassName=org.postgresql.Driver
jdbcUri=dbc:postgresql://IP:PORT/dbName
user=USERNAME
password=PASSWORD -
Ensure postgresql service is running and has appropriate access privileges for user specified in postgres.properties file above
-
Add dataexpress.jar to etl/gp_dist_ingest/lib
-
Run ingest from within dir etl/gp_dist_ingest with
scala -cp "lib/*" genesis.scala