Skip to content
Arnold Kuzniar edited this page May 15, 2019 · 9 revisions


Objective: Generate a prioritized list of candidate genes from a QTL region based on phenotype information.

Output: Semantic integration of plant genomic and phenotypic data to enable ranking of candidate genes associated with fruit ripening in tomatoes (Solanum lycopersicum).

Fig 1. Biological entities and data flow. TODO: Add potato graphs.

Platform: Virtuoso Universal Server (OSE). The installation and deployment instructions can be found here.

Data sources:


  • web-based Faceted Browser on Linked Data sets
    • (Google-like)Text Search (e.g. fruit quality, Myb 12, SGN-M6466)
    • Entity Label Lookup including genome, chromosome/location (chromosome 11), QTL (QTL:PMC4321030_4_1_54), trait (fruit ripening), genetic marker (variation gene231_0-i11), gene symbol/ID (gene Solyc11g008770.1), protein accession/ID (K4D5D7), GO term/ID (GO:0009835), pathway (carotenoid biosynthesis)
    • Entity URI Lookup (e.g.
  • programmatic data access via SPARQL endpoint including some example queries & output
  • Docker-ized Virtuoso server to easy on premise deployment
  • automated data ingest & reconciliation procedures, which can aid in future updates of the platform when new releases of data sources become available

Current issues & limitations

  • see this list of open (or closed) issues
  • making tomato SGN data (in GFF) and QTLs from literature (in CSV) available in RDF requires manual effort aided by OpenRefine and a custom script in Python
  • (non-)RDF data quality & curation (e.g. some Ensembl links to other resources)
  • data licensing & re-use by private partners

Possible extensions:

  • couple the Linked Data platform with an algorithm(s) to score/rank (candidate) genes associated with the trait of interest
  • web interface including data visualization tailored to domain scientists
Clone this wiki locally