Skip to content
Arnold Kuzniar edited this page May 15, 2019 · 9 revisions

!!!UPDATE!!!

Objective: Generate a prioritized list of candidate genes from a QTL region based on phenotype information.

Output: Semantic integration of plant genomic and phenotypic data to enable ranking of candidate genes associated with fruit ripening in tomatoes (Solanum lycopersicum).

Workflow
Fig 1. Biological entities and data flow. TODO: Add potato graphs.

Platform: Virtuoso Universal Server (OSE). The installation and deployment instructions can be found here.

Data sources:

Features:

  • web-based Faceted Browser on Linked Data sets
    • (Google-like)Text Search (e.g. fruit quality, Myb 12, SGN-M6466)
    • Entity Label Lookup including genome, chromosome/location (chromosome 11), QTL (QTL:PMC4321030_4_1_54), trait (fruit ripening), genetic marker (variation gene231_0-i11), gene symbol/ID (gene Solyc11g008770.1), protein accession/ID (K4D5D7), GO term/ID (GO:0009835), pathway (carotenoid biosynthesis)
    • Entity URI Lookup (e.g. http://purl.obolibrary.org/obo/TO_0002728)
  • programmatic data access via SPARQL endpoint including some example queries & output
  • Docker-ized Virtuoso server to easy on premise deployment
  • automated data ingest & reconciliation procedures, which can aid in future updates of the platform when new releases of data sources become available

Current issues & limitations

  • see this list of open (or closed) issues
  • making tomato SGN data (in GFF) and QTLs from literature (in CSV) available in RDF requires manual effort aided by OpenRefine and a custom script in Python
  • (non-)RDF data quality & curation (e.g. some Ensembl links to other resources)
  • data licensing & re-use by private partners

Possible extensions:

  • couple the Linked Data platform with an algorithm(s) to score/rank (candidate) genes associated with the trait of interest
  • web interface including data visualization tailored to domain scientists
Clone this wiki locally