FlyBase Chado query guide

Marta Costa edited this page Jun 12, 2015 · 1 revision

FlyBase maintains a very large sets of annotation of Drosophila phenotypes using the Drosophila phenotype ontology. It also uses other terms in sub-ontologies of its controlled vocabularies (fbcv) as qualifiers for phenotype and expression assertions and to curate a range of other information into FlyBase including publication metadata (which fbcv maps to MeSH terms) and to record which chemical mutagen was used to generate specific alleles (fbcv links these to CheBI).

One way to access this information is via pre-computed tsv and XML files hosted on FlyBase (see links under 'alleles and stocks'). Please note that the tsv files do not contain ontology IDs and in some cases term names are embedded in longer strings following a standard syntax.

It is also possible to access this data via direct queries of an open FlyBase CHADO database. This has the advantage that individual terms can be parsed out separately along with their identifiers. This page provides SQL queries for pulling this data from FlyBase CHADO.

Accessing FlyBase CHADO

Details for connecting an open server hosting the latest version of FLyBase chado can be found here. Copies of the FlyBase CHADO database can be downloaded from here.

Representation of ontologies in FlyBase

TBA

Phenotypes

Bulk download options

FlyBase hosts pre-computed tsv and XML files containing phenotype data (see links under 'alleles and stocks'). Please note that the tsv files do not contain ontology IDs and the ontology term names they contain are embedded in phenotype strings consisting of a primary term (phenotype, anatomy or GO cell component), potentially followed by multiple qualifying terms, where each term is delimited from the next by ' | '. In order to do any analyses that take advantage of ontology structure, these will, of course, need to be parsed and matched to names in the version of the ontology used in the FlyBase release matching the pre-computed files.

FlyBase CHADO queries

:::sql

-- genotype (as text string only) to phenotype ontology
SELECT g.uniquename as genotype, pccv.name as aname, pccv.name as pcidp, pcdbx.accession as pcid
FROM cvterm pccv 
JOIN phenotype p on (pccv.cvterm_id = p.cvalue_id)
JOIN phenstatement ps on (ps.phenotype_id=p.phenotype_id)
JOIN genotype g on (ps.genotype_id = g.genotype_id)
JOIN dbxref pcdbx on (pccv.dbxref_id=pcdbx.dbxref_id)
JOIN db pcdb on (pcdbx.db_id=pcdb.db_id)
WHERE pcdb.name = 'FBcv'
AND pccv.name != 'unspecified' -- Note - this is not strictly necessary given the restriction to FBcv


-- pub + feature to phenotype ontology (note pretty straighforward to pull out pubmed id for refs where present)
SELECT pub.uniquename, pub.miniref, f.name as fname, f.uniquename as feature_id, acv.name as aname, adb.name obs_id, adbx.accession as obs_acc
FROM cvterm pccv 
JOIN phenotype p on (pccv.cvterm_id = p.cvalue_id)
JOIN phenstatement ps on (ps.phenotype_id=p.phenotype_id)
JOIN pub on (ps.pub_id=pub.pub_id)
JOIN feature_genotype fg on (ps.genotype_id = fg.genotype_id)
JOIN feature f on (fg.feature_id=f.feature_id)
JOIN dbxref adbx on (acv.dbxref_id=adbx.dbxref_id)
JOIN db adb on (adbx.db_id=adb.db_id)
WHERE pcdb.name = 'FBcv'
AND pccv.name != 'unspecified'; -- Note - this is not strictly necessary given the restriction to FBcv

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.