Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
FlyBase Chado query guide
FlyBase maintains a very large sets of annotation of Drosophila phenotypes using the Drosophila phenotype ontology. It also uses other terms in sub-ontologies of its controlled vocabularies (fbcv) as qualifiers for phenotype and expression assertions and to curate a range of other information into FlyBase including publication metadata (which fbcv maps to MeSH terms) and to record which chemical mutagen was used to generate specific alleles (fbcv links these to CheBI).
One way to access this information is via pre-computed tsv and XML files hosted on FlyBase (see links under 'alleles and stocks'). Please note that the tsv files do not contain ontology IDs and in some cases term names are embedded in longer strings following a standard syntax.
It is also possible to access this data via direct queries of an open FlyBase CHADO database. This has the advantage that individual terms can be parsed out separately along with their identifiers. This page provides SQL queries for pulling this data from FlyBase CHADO.
Accessing FlyBase CHADO
Representation of ontologies in FlyBase
Bulk download options
FlyBase hosts pre-computed tsv and XML files containing phenotype data (see links under 'alleles and stocks'). Please note that the tsv files do not contain ontology IDs and the ontology term names they contain are embedded in phenotype strings consisting of a primary term (phenotype, anatomy or GO cell component), potentially followed by multiple qualifying terms, where each term is delimited from the next by ' | '. In order to do any analyses that take advantage of ontology structure, these will, of course, need to be parsed and matched to names in the version of the ontology used in the FlyBase release matching the pre-computed files.
FlyBase CHADO queries
:::sql -- genotype (as text string only) to phenotype ontology SELECT g.uniquename as genotype, pccv.name as aname, pccv.name as pcidp, pcdbx.accession as pcid FROM cvterm pccv JOIN phenotype p on (pccv.cvterm_id = p.cvalue_id) JOIN phenstatement ps on (ps.phenotype_id=p.phenotype_id) JOIN genotype g on (ps.genotype_id = g.genotype_id) JOIN dbxref pcdbx on (pccv.dbxref_id=pcdbx.dbxref_id) JOIN db pcdb on (pcdbx.db_id=pcdb.db_id) WHERE pcdb.name = 'FBcv' AND pccv.name != 'unspecified' -- Note - this is not strictly necessary given the restriction to FBcv -- pub + feature to phenotype ontology (note pretty straighforward to pull out pubmed id for refs where present) SELECT pub.uniquename, pub.miniref, f.name as fname, f.uniquename as feature_id, acv.name as aname, adb.name obs_id, adbx.accession as obs_acc FROM cvterm pccv JOIN phenotype p on (pccv.cvterm_id = p.cvalue_id) JOIN phenstatement ps on (ps.phenotype_id=p.phenotype_id) JOIN pub on (ps.pub_id=pub.pub_id) JOIN feature_genotype fg on (ps.genotype_id = fg.genotype_id) JOIN feature f on (fg.feature_id=f.feature_id) JOIN dbxref adbx on (acv.dbxref_id=adbx.dbxref_id) JOIN db adb on (adbx.db_id=adb.db_id) WHERE pcdb.name = 'FBcv' AND pccv.name != 'unspecified'; -- Note - this is not strictly necessary given the restriction to FBcv