[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JinmiaoChenLab/DISCOtoolkit_py/blob/main/docs/scEnrichment.ipynb)
# scEnrichment

In this tutorial, we will provide a quick guideline for applying the scEnrichment feature of discotoolkit on DEGs (Differentially Expressed Genes). Follow these steps:

1. First, we load the DEGs from Example 1, available on the [DISCO](http://www.immunesinglecell.org/) website.
2. Finally, we convert the retrieved data into a Pandas DataFrame, which will serve as the input for the `dt.CELLiD_enrichment` function. By running this function, we can obtain the desired enrichment analysis.

In [7]:
# for google colab
# pip install discotoolkit
# import package
import discotoolkit as dt
import scanpy as sc
import pandas as pd
import numpy as np
import io

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


The user can input either a gene list or a gene list with log fold change as the input to the `dt.CELLiD_enrichment` function.

Example of the DEGs:

In [8]:
# testing DEG from DEGs of acinar cell(Peng, Junya et al.)
# Reference is in DISCO website

test_genes = {"gene":["PRSS1", "CTRB2", "CELA3A", "CTRB1", "REG1B", "CLPS", "CPB1", "CPA1", "PLA2G1B", "REG3A", "CTRC"],
              "fc":[5.236605052, 5.179753462, 4.724315075, 4.702706704, 4.65145949, 4.513887613, 4.351968886, 4.311988687, 4.272185339, 4.253882194, 4.208933992]
              }

deg_df = pd.DataFrame(test_genes)

deg_df.head()

Unnamed: 0,gene,fc
0,PRSS1,5.236605
1,CTRB2,5.179753
2,CELA3A,4.724315
3,CTRB1,4.702707
4,REG1B,4.651459


Now that we have our data ready, we can proceed to run the function.

In [9]:
# get enrichment analysis result using dt.CELLiD_enrichment
results = dt.CELLiD_enrichment(deg_df)

INFO:root:Comparing the ranked gene list to reference gene sets...
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done 1364 tasks      | elapsed:  1.0min
[Parallel(n_jobs=10)]: Done 3942 tasks      | elapsed:  3.4min
[Parallel(n_jobs=10)]: Done 3999 out of 3999 | elapsed:  4.4min finished


In [10]:
# get the top 10 results
results.head(10)

Unnamed: 0,pval,or,name,gene,background,overlap,geneset
109,0.0,30492.704,Acinar cell vs All others in PDAC,"CTRC,CELA3A,CPB1,CLPS,CPA1,PRSS1,PLA2G1B,CTRB2...",9711,11,235
81,0.0,27138.739,Acinar cell vs CCL2+ pancreatic ductal cell in...,"PRSS1,CELA3A,CTRC,CTRB1,CPB1,CPA1,CTRB2,CLPS,P...",9711,11,234
52,0.0,23270.64,Acinar cell vs All others in pancreas,"CTRC,CELA3A,REG1B,REG3A,CPB1,CLPS,CPA1,PRSS1,P...",5783,11,91
80,0.0,21845.335,Acinar cell vs Pancreatic ductal cell in PDAC,"PRSS1,CTRB1,CTRC,CELA3A,CTRB2,CPB1,CPA1,CLPS,P...",9711,11,229
82,0.0,20201.577,Acinar cell vs Ductal/EC doublet like cell in ...,"CTRB2,PRSS1,CPB1,CPA1,CTRC,CTRB1,CELA3A,PLA2G1...",9711,11,164
169,0.0,19699.378,Acinar cell vs HSP+ pancreatic ductal cell in ...,"CTRC,CTRB2,CTRB1,CELA3A,CPB1,PRSS1,CPA1,CLPS,P...",9711,11,224
39,0.0,13920.853,Acinar cell vs TUBA1A+ ductal cell in pancreas,"CTRB2,CTRB1,CELA3A,PRSS1,PLA2G1B,CTRC,CPB1,CPA...",5783,11,129
18,0.0,1953.086,Paneth cell vs Goblet cell in intestine,"REG3A,REG1B",11184,2,50
19,0.0,1139.309,Paneth cell vs Intestinal stem cell in intestine,"REG3A,REG1B",11184,2,62
23,0.0,803.051,Paneth cell vs All others in intestine,"REG3A,REG1B",11184,2,92
