# scEnrichment

In this tutorial, we will provide a quick guideline for applying the scEnrichment feature of discotoolkit on DEGs (Differentially Expressed Genes). Follow these steps:

1. First, we load the DEGs from Example 1, available on the [DISCO](http://www.immunesinglecell.org/) website.
2. Finally, we convert the retrieved data into a Pandas DataFrame, which will serve as the input for the `dt.CELLiD_enrichment` function. By running this function, we can obtain the desired enrichment analysis.

In [1]:
# import package
import discotoolkit as dt
import scanpy as sc
import pandas as pd
import numpy as np
import io

%load_ext autoreload
%autoreload 2

The user can input either a gene list or a gene list with log fold change as the input to the `dt.CELLiD_enrichment` function.

Example of the DEGs:

In [2]:
# testing DEG from DEGs of acinar cell(Peng, Junya et al.)
# Reference is in DISCO website

test_genes = """PRSS1	5.236605052; CTRB2	5.179753462; CELA3A	4.838744463; CTRB1	4.724315075; REG1B	4.702706704; CLPS	4.65145949; CPB1    4.513887613; CPA1	4.351968886; PLA2G1B	4.311988687; REG3A	4.272185339; REG1A	4.253882194; CTRC	4.208933992"""

# Split the string into gene-fc pairs
pairs = test_genes.split(";")

# Split each pair into gene and fc values
gene_fc_list = [pair.split() for pair in pairs]

# Create the DataFrame
deg_df = pd.DataFrame(gene_fc_list, columns=["gene", "fc"])

# Convert fc column to numeric type
deg_df["fc"] = pd.to_numeric(deg_df["fc"])
deg_df.head()

Unnamed: 0,gene,fc
0,PRSS1,5.236605
1,CTRB2,5.179753
2,CELA3A,4.838744
3,CTRB1,4.724315
4,REG1B,4.702707


Now that we have our data ready, we can proceed to run the function.

In [3]:
# get enrichment analysis result using dt.CELLiD_enrichment
results = dt.CELLiD_enrichment(deg_df)

INFO:root:Comparing the ranked gene list to reference gene sets...
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done 1364 tasks      | elapsed:  1.1min
[Parallel(n_jobs=10)]: Done 3942 tasks      | elapsed:  3.4min
[Parallel(n_jobs=10)]: Done 3999 out of 3999 | elapsed:  4.4min finished


In [4]:
# get the top 10 results
results.head(10)

Unnamed: 0,pval,or,name,gene,background,overlap,geneset
125,0.0,33575.635,Acinar cell vs All others in PDAC,"CTRC,CELA3A,REG1A,CPB1,CLPS,CPA1,PRSS1,PLA2G1B...",9711,12,235
92,0.0,29803.353,Acinar cell vs CCL2+ pancreatic ductal cell in...,"PRSS1,CELA3A,CTRC,CTRB1,CPB1,CPA1,CTRB2,CLPS,R...",9711,12,234
61,0.0,27745.849,Acinar cell vs All others in pancreas,"CTRC,CELA3A,REG1B,REG1A,REG3A,CPB1,CLPS,CPA1,P...",5783,12,91
93,0.0,24141.299,Acinar cell vs Ductal/EC doublet like cell in ...,"CTRB2,PRSS1,REG1A,CPB1,CPA1,CTRC,CTRB1,CELA3A,...",9711,12,164
91,0.0,23880.166,Acinar cell vs Pancreatic ductal cell in PDAC,"PRSS1,CTRB1,CTRC,CELA3A,CTRB2,CPB1,CPA1,CLPS,R...",9711,12,229
197,0.0,22596.12,Acinar cell vs HSP+ pancreatic ductal cell in ...,"CTRC,CTRB2,CTRB1,CELA3A,CPB1,PRSS1,REG1A,CPA1,...",9711,12,224
46,0.0,16146.113,Acinar cell vs TUBA1A+ ductal cell in pancreas,"CTRB2,REG1A,CTRB1,CELA3A,PRSS1,PLA2G1B,CTRC,CP...",5783,12,129
22,0.0,1863.359,Paneth cell vs Goblet cell in intestine,"REG3A,REG1A,REG1B",11184,3,50
146,0.0,1039.143,CXCL2/3+ surface mucous cell vs GKN1/2+ surfac...,"REG1A,REG3A",5884,2,342
29,0.0,740.408,Paneth cell vs All others in intestine,"REG1A,REG3A,REG1B",11184,3,92
