Compute enrichment of gene sets in our predictions #6

tmmurali · 2020-04-02T01:02:17Z

We have a ranked list of predictions coming from network propagation or from host-virus PPI prediction. This issue is relevant mainly for human proteins. We also have a set of gene sets, e.g., from https://amp.pharm.mssm.edu/covid19/. We want to assess to what extent each gene set is enriched in our list of predictions.

There are two approaches I suggest:

For every top-k predictions, use Fisher's exact test (hypergeometric test) to compute the p-value of the intersection of the top-k predictions with a gene set. Plot the absolute value of the logarithm of the p-value as we increase k. Alternately, plot the size of the overlap and colour the point differently based on whether the overlap is statistically significant or not. There is no need to try all values of k. It may be sufficient to use increments of 10, 50, or 100. This value can be a parameter to the code.
Use an enrichment method such as GSEA that can consider the entire ranked list of predictions.

We must correct for testing multiple hypotheses.

tmmurali · 2020-04-02T01:10:42Z

Let us catalogue gene sets here. We need to download each one (see #5) and add it to the enrichment analysis.

COVID-19 Crowd Generated Gene and Drug Set Library Ignore all the gene sets with Krogan in the name, since we are already using them for prediction.
Gene expression datasets
Protein expression datasets

jlaw9 · 2020-04-08T04:33:35Z

Currently the downloadable gmt file available for the COVID-19 Crowd Generated Gene sets does not have the main descriptor text of the gene set in the file, making most gene sets unidentifiable.

I made an issue on their repo (#82) asking them to fix it.

jlaw9 · 2020-04-08T04:55:18Z

Just found out that besides running GSEA, GSEApy also has an enrichr module, which lets you run Enrichr's analysis using its api. Could be very useful as Enrichr has tons of gene sets!

jlaw9 · 2020-04-09T05:40:45Z

They fixed the gmt file for the COVID-19 Crowd Generated Gene!

of the top predictions of each algorithm, and of any given gene set. Currently only tests GO BP, MF, and CC. Issue #6

tmmurali · 2020-05-22T17:50:58Z

@jlaw9 @n-tasnina what is the status of running our enrichment pipeline on the COVID-19 gene sets?

jlaw9 · 2020-05-22T18:27:39Z

We have the COVID-19 gene sets in GMT format, just need to update our scripts to test for enrichment of them. Here's the clusterProfiler documentation for our own gene sets.
@n-tasnina can you add a function for that in our enrichment.py?

n-tasnina · 2020-05-22T18:56:25Z

Yeah, sure.I will add a function in enrichment.py to do this.

…

On Fri, May 22, 2020, 2:27 PM Jeff Law ***@***.***> wrote: We have the COVID-19 gene sets in GMT format, just need to update our scripts to test for enrichment of them. Here's the clusterProfiler documentation for our own gene sets <https://guangchuangyu.github.io/2015/05/use-clusterprofiler-as-an-universal-enrichment-analysis-tool/> . @n-tasnina <https://github.com/n-tasnina> can you add a function for that in our enrichment.py? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANSAMM23BOLFRTZ2HLHTQ7LRS272RANCNFSM4LZYUK7A> .

n-tasnina · 2021-06-24T21:07:17Z

We can close this issue as well. Here is the link to the python script where we did enrichment analysis.
https://github.com/Murali-group/SARS-CoV-2-network-analysis/blob/enrichment/src/Enrichment/fss_enrichment.py

tmmurali assigned n-tasnina Apr 2, 2020

jlaw9 added a commit that referenced this issue May 7, 2020

Added scripts to test for gene set enrichment

d52a138

of the top predictions of each algorithm, and of any given gene set. Currently only tests GO BP, MF, and CC. Issue #6

n-tasnina pushed a commit that referenced this issue May 16, 2020

Added scripts to test for gene set enrichment

e2295b3

of the top predictions of each algorithm, and of any given gene set. Currently only tests GO BP, MF, and CC. Issue #6

n-tasnina closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute enrichment of gene sets in our predictions #6

Compute enrichment of gene sets in our predictions #6

tmmurali commented Apr 2, 2020

tmmurali commented Apr 2, 2020

jlaw9 commented Apr 8, 2020

jlaw9 commented Apr 8, 2020

jlaw9 commented Apr 9, 2020

tmmurali commented May 22, 2020

jlaw9 commented May 22, 2020

n-tasnina commented May 22, 2020 via email

n-tasnina commented Jun 24, 2021

Compute enrichment of gene sets in our predictions #6

Compute enrichment of gene sets in our predictions #6

Comments

tmmurali commented Apr 2, 2020

tmmurali commented Apr 2, 2020

jlaw9 commented Apr 8, 2020

jlaw9 commented Apr 8, 2020

jlaw9 commented Apr 9, 2020

tmmurali commented May 22, 2020

jlaw9 commented May 22, 2020

n-tasnina commented May 22, 2020 via email

n-tasnina commented Jun 24, 2021