Warning
In the future the analysis methods may migrate from the AssociationSet class to dedicated analysis engine classes.
See the Notebook example
OntoBio allows for generalized gene set enrichment: given a set of annotations that map genes to descriptor terms, and an input set of genes, and a background set, find what terms are enriched in the input set compared to the background.
With OntoBio, enrichment tests work for any annotation corpus, not necessarily just gene-oriented. For example, disease-phenotype. However, care must be taken with underlying assumptions with non-gene sets.
The very first thing you need to do before an enrichment analysis is fetch both an Ontology object and an AsssociationSet object. This could be a mix of local files or remote service/database. See :ref:`inputs` for details.
Assume that we are using a remote ontology and local GAF:
from ontobio import OntologyFactory
from ontobio import AssociationSetFactory
ofactory = OntologyFactory()
afactory = AssociationSetFactory()
ont = ofactory.create('go')
aset = afactory.create_from_gaf('my.gaf', ontology=ont)
Assume also that we have a set of sample and background gene IDs, the test is:
enr = aset.enrichment_test(subjects=gene_ids, background=background_gene_ids, threshold=0.00005, labels=True)
This returns a list of dicts (TODO - decide if we want to make this an object and follow a standard class model)
NOTE the input gene IDs must be the same ones used in the AssociationSet. If you load from a GAF, this is the IDs that are formed by combining col1 and col2, separated by a ":". E.g. UniProtKB:P123456
What if you have different IDs? Or what if you just have a list of gene symbols? In this case you will need to map these names or IDs, the subject of the next section.
For reproducible analyses, use a versioned PURL for the ontology
You can use the ontobio-assoc command to run enrichment analyses. Some examples:
Create a gene set for all genes in "regulation of bone development" (GO:1903010). Find other terms for which this is enriched (in human)
# find all mouse genes that have 'abnormal synaptic transmission' phenotype
# (using remote sparql service for MP, and default (Monarch) for associations
ontobio-assoc.py -v -r mp -T NCBITaxon:10090 -C gene phenotype query -q MP:0003635 > genes.txt
# get IDs
cut -f1 -d ' ' genes.txt > genes.ids
# enrichment, using GO
ontobio-assoc.py -r go -T NCBITaxon:10090 -C gene function enrichment -s genes.ids
# resulting GO terms are not very surprising...
2.48e-12 GO:0045202 synapse
2.87e-11 GO:0044456 synapse part
3.66e-08 GO:0007270 neuron-neuron synaptic transmission
3.95e-08 GO:0098793 presynapse
1.65e-07 GO:0099537 trans-synaptic signaling
1.65e-07 GO:0007268 chemical synaptic transmission
For API docs, see enrichment_test in AssociationSet model
TODO
TODO
To follow progress, see this PR
TODO
TODO
See the lexmap API docs
You can also use the command line:
ontobio-lexmap.py ont1.json ont2.json > mappings.tsv
The inputs can be any kind of handle - a local ontology file or a remote ontology accessed via services.
For example, this will work:
ontobio-lexmap.py mp hp wbphenotype > mappings.tsv
See :ref:`inputs` for more details.
For examples of lexical mapping pipelines, see:
- https://github.com/cmungall/sweet-obo-alignment
- `<https://github.com/monarch-initiative/monarch-disease-ontology/tree/master/src/icd10>_
These have examples of customizing configuration using a yaml file.