This is a Matlab toolbox for performing gene-category enrichment analysis relative to two different types of null models:
- Random-gene nulls, in which categories assessed relative to categories of the same size but annotated by the same number of random genes. This follows the permutation-based method of Gene Score Resampling (as implemented in ermineJ).
- Ensemble-based nulls, in which categories are assessed relative to an ensemble of randomized phenotypes.
The toolbox was introduced in our paper:
- Fulcher et al. Nature Communications (2021) 📗 'Overcoming false-positive gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data'.
Instructions for performing the basic functions of these analyses are in the wiki 📓.
The package is currently set up to perform enrichment on Gene Ontology (GO) Biological Process annotations, but could be modified straightforwardly to use other types of GO annotations, or even to use other annotation systems like KEGG.
Pull requests to improve the functionality and clarity of documentation are very welcome!
Note that this repository is no longer in active development, but the same null-testing procedure has been re-implemented in other packages. I would recommend investigating these alternatives:
- ABAnnotate (Matlab)
- The Imaging-Transcriptomics toolbox (python).
The package is organized into directories as follows:
RawData
: all data downloaded from external sources (like GO, MouseMine, etc.)ProcessedData
: raw data processed into Matlab-readable files.
DataProcessing
: code required to process raw data.GeneScoreResampling
,EnsembleEnrichment
: code to run both random-gene and randomized-phenotype enrichment analysis.ResultsComparison
: code to compare GSEA results to ermineJ.Peripheral
: additional code files.
To initialize this toolbox, all of these subdirectories should be added to the Matlab path by running the startup
script.
A summary of how to run an enrichment analysis with this package is describd here, but please read the wiki 📓 for more detailed instructions.
NOTE: This package relied on MySQL downloads of the GO data, but GO no longer provides their ontology in this format. As a workaround, the directory oboConversion
has been added which includes instructions and code for converting recent GO releases (available as go-basic.obo
files) into an sqlite database, and DataProcessing
scripts have been updated with sqlite commands to bypass the need to use a MySQL connection.
The first step in running an enrichment analysis is defining the set of gene categories, and the genes annotated to each category. Results of this, using hierarchy-propagated gene-to-category annotations corresponding to GO biological processes (processed on 2019-04-17), can be downloaded from this partner Zenodo data repository.
Code in this repository also allows you to reprocess these annotations from raw data from GO, as described on this wiki page.
You can test this pipeline using the term
and term2term
tables from a mySQL download of the GO term data on 2019-04-17, which are also available in the associated Zenodo data repository.
All parameters are set using GiveMeDefaultEnrichmentParams
, as described in the wiki.
The Gene Score Resampling method assesses significance relative to a 'random-gene null', and is implemented in the SingleEnrichment
function.
Instructions to implement this are in the wiki.
Ensemble enrichment computes the enrichment of a given phenotype relative to an ensemble of randomized phenotypes, as described in our paper.
This proceeds across ComputeAllCategoryNulls
(precompute category nulls) and EnsembleEnrichment
(evaluate significance relative to these nulls), as described in the wiki.