ETL

There a number of ETL requirements to support other sub-projects within Phenomantics. They are described here.

HPO Ingest

Last executed

hpo_687_130220.obo
genes_to_phenotype_26_1302201.txt

Obtaining Source Files

hpo_latest.obo

genes_to_phenotypes.txt

For more information, HPO Downloads

Description

The etl/hpo_ingest/hpo_file_converters.scala file is an ETL script that converts the HPO ontology file, hpo_version_date.obo, and the HPO terms to Entrez genes annotation file, genes_to_phenotype_version_date.txt, to properly formatted resource files that can be used by the phenomantics API application.

The output files are: ENTREZ.txt HPO_ALT_IDS.txt HPO_TERMS.txt ENTREZ_HPO_ANNOTATIONS.txt HPO_ANCESTORS.txt

The output files should be moved to apps/api/src/main/resources

Dependencies

DataExpress
hpo_BUILD_DATE.obo
genes_to_phenotype_BUILD_DATE.txt

Genesis Similarity Scores to CDF Ingest

The genesis application generates sample similarity scores for all genes for a randomly selected set of phenotype queries of length k. This data is output to value delimited files. The etl/gp_dist_ingest/genesis.scala script will import process the raw similarity scores to create CDF values and store them to a database with the proper configuration for use with the phenomantics API application. To run the ingest script as is:

Move the genesis output data files to etl/gp_dist_ingest/data/SIMFUNC_k_## where SIMFUNC is one of ASymSim1, SymSim1, SymSim2 and ## is the number of query terms used in generating the data (prefix a leading 0 if k<10, e.g. 01, 02)
Create etl/gp_dist_ingest/conf/postgres.properties file with following content (values in CAPS must be replaced with appropriate values)

driverClassName=org.postgresql.Driver
jdbcUri=dbc:postgresql://IP:PORT/dbName
user=USERNAME
password=PASSWORD
Ensure postgresql service is running and has appropriate access privileges for user specified in postgres.properties file above
Add dataexpress.jar to etl/gp_dist_ingest/lib
Run ingest from within dir etl/gp_dist_ingest with

scala -cp "lib/*" genesis.scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly