Skip to content

Ensemble enrichment

Ben Fulcher edited this page Aug 14, 2020 · 3 revisions

Running Enrichment Relative to an Ensemble of Null Phenotypes

Ensemble enrichment computes the enrichment of a given phenotype relative to an ensemble of randomized phenotypes. The approach is described in this bioRxiv preprint.

It proceeds through three steps:

  1. Compute the null phenotype ensemble.
  2. Compute the null distribution corresponding to this ensemble using ComputeAllCategoryNulls.
  3. Perform enrichment for a phenotype of interest relative to this pre-computed null ensemble using EnsembleEnrichment.

Step 2 is the most computationally expensive step.

Step 1: Defining a null phenotype ensemble

There are a a couple of common choices for null phenotype ensembles:

  1. Independent random maps
  2. Spatially autocorrelated random maps, e.g., fitted to a phenotype of interest and then generated using brainSMASH.

Independent random phenotypic maps

In case (1), you can straightforwardly set enrichmentParams.whatEnsemble = 'randomMap' in GiveMeDefaultEnrichmentParams and you're good to go.

Custom null phenotype ensembles

For any custom ensemble (such as an ensemble of spatially autocorrelated maps) you will need to make different modifications to GiveMeDefaultEnrichmentParams. First, set enrichmentParams.whatEnsemble = 'customEnsemble'. You will then need to specify the .mat file containing these null phenotypes as, e.g., enrichmentParams.dataFileSurrogate = myNullPhenotypes.mat. This file should contain the matrix, nullMaps, in the form region x map. E.g., nullMaps would be a 100 x 1000 matrix for 1000 null phenotypes defined across 100 brain regions (matching the 100 rows of the gene-expression data).

Before proceeding to Step 2, check that enrichmentParams is looking sensible.

Step 2: Computing category-score null distribution for every gene category

Using the parameters set in Step 1, you are then ready to compute null distributions for your category scores relative to the specified phenotype ensemble. The results of these null distributions are saved to the .mat file: enrichmentParams.fileNameOut (check this looks ok before running).

You also need to set up the geneDataStruct so that this function can match genes to their expression data, which should be a Matlab structure containing two elements: expressionMatrix (a region x gene expression matrix) and entrezIDs (a vector of entrez IDs labeling the columns of the expression matrix, used to match genes to their category annotations).

ComputeAllCategoryNulls(geneDataStruct,enrichmentParams,[],true,true);

Step 3: Compute enrichment of a given phenotype

Now that we have a null distribution for every gene category, we can assess the significance of the scores obtained for a given phenotype relative to these (precomputed and saved) nulls. The results from Step 2 are saved in enrichmentParams.fileNameOut, so you can specify this, as well as your specific phenotype to compute the enrichment results as a table:

GOTablePhenotype = EnsembleEnrichment(enrichmentParams.fileNameOut,phenotypeVector);