# scEnrichment

In this tutorial, we will provide a quick guideline for applying the scEnrichment feature of discotoolkit on DEGs (Differentially Expressed Genes). Follow these steps:

1. First, we load the DEGs from Example 1, available on the [DISCO](http://www.immunesinglecell.org/) website.
2. Finally, we convert the retrieved data into a Pandas DataFrame, which will serve as the input for the `dt.CELLiD_enrichment` function. By running this function, we can obtain the desired enrichment analysis.

In [1]:
# import package
import discotoolkit as dt
import scanpy as sc
import pandas as pd
import numpy as np
import io

%load_ext autoreload
%autoreload 2

The user can input either a gene list or a gene list with log fold change as the input to the `dt.CELLiD_enrichment` function.

Example of the DEGs:

In [2]:
# testing DEG from DEGs of acinar cell(Peng, Junya et al.)
# Reference is in DISCO website

test_genes = """PRSS1	5.236605052
CTRB2	5.179753462
CELA3A	4.838744463
CTRB1	4.724315075
REG1B	4.702706704
CLPS	4.65145949
CPB1	4.513887613
CPA1	4.351968886
PLA2G1B	4.311988687
REG3A	4.272185339
REG1A	4.253882194
CTRC	4.208933992
PNLIP	4.06135454
CELA3B	4.00238189
PRSS3	3.846159863
SYCN	3.779047838
CELA2A	3.540312596
CPA2	3.500611205
AMY2A	3.44786795
REG3G	3.195345533
CEL	3.188155718
GP2	3.163436945
SPINK1	2.676256023
PNLIPRP1	2.521229193
KLK1	2.393377839
CELA2B	2.178530503
MT1G	2.032090542
GATM	2.010975328
CTRL	1.851992458
AMY2B	1.633008632
CUZD1	1.630544859
LGALS2	1.169203136
GSTA1	1.088434738
PDIA2	1.013342677
ERP27	0.987826475
SERPINI2	0.905250021
GSTA2	0.886794909
TMEM97	0.831745861
AQP8	0.81411901
ALB	0.804637248
MT1H	0.755453305
FGL1	0.532791166
EPB41L4B	0.396768756
SLC30A2	0.371914878
KIAA1324	0.345214203
CBS	0.341016631
GNMT	0.3173318
FSCN2	0.290973002
TMEM52	0.256893286
AC078941.1	0.250852721
RP11-320N7.2	0.250044449
ANPEP	0.780142301
SLC39A5	0.292027399
SLC43A1	0.305606072
RP11-986E7.7	0.520641843
GAMT	0.670413262
RBP1	1.121586543
SERPINA4	0.395926247
IMPA2	0.517676734
FAM3B	0.551741858
CTC-479C5.12	0.576434405
MT1F	0.909068755
NR5A2	0.322338819
HOMER2	0.338742532
STXBP6	0.384391781
C2CD4B	0.331035247
TCEA3	0.457396612
BNIP3	0.306867407
XBP1	0.641992282
CITED4	0.477171588
ERO1LB	0.295500017
PDCD4	0.688726035
SCTR	0.371977531
CXCL17	0.271652803
SHC2	0.312286381
TPST2	0.32511141
BCAT1	0.282068598
SFRP5	0.279222723
COMTD1	0.507151197
AZGP1	0.293662407
DNAJC12	0.299908523
OLFM4	1.495365703
PABPC4	0.560516603
RARRES2	0.465440218
IGFBP2	0.48294206
RNASE1	0.702768855
MT1X	0.935334158
UBD 0.62226754
SLC39A14	0.374960606
CLDN10	0.334070011
FXYD2	0.369492216
CYB5A	0.576170379
GRB10	0.255664073
C6orf222	0.258015249
P4HB	0.48785023
MT1E	0.606397194
PGM1	0.449130613
EIF4EBP1	0.43589467
CCDC64B	0.278301065
C3	0.499001765
CFTR	0.423427223
NUPR1	0.409424066
RAMP1	0.258903283
PBX1	0.363952564
TC2N	0.270830998
DEFB1	0.269673492
RHOBTB3	0.288911416
IL32	0.891390415
SORBS2	0.407243496
SEL1L	0.324710347
ZNF503	0.255591773
SH3YL1	0.355056154
MT-ND3	0.57463692
SOD2	0.71505403
GDF15	0.324365014
SLC4A4	0.384404047
PPP1R14B	0.325403613
RPL36	0.262022338
RPL12	0.263090644
RPS12	0.259250287
YBX3	0.299771927
EEF2	0.282220265
FSTL3	0.306826657
GLTSCR2	0.271820906
SERPINB1	0.310268005
MT-ND1	0.339195423
"""

deg_df = pd.read_csv(io.StringIO(test_genes), sep="\t", header=None, names=["gene", "value"])
deg_df.head()


Unnamed: 0,gene,value
0,PRSS1,5.236605
1,CTRB2,5.179753
2,CELA3A,4.838744
3,CTRB1,4.724315
4,REG1B,4.702707


Now that we have our data ready, we can proceed to run the function.

In [3]:
# get enrichment analysis result using dt.CELLiD_enrichment
results = dt.CELLiD_enrichment(deg_df)

INFO:root:Comparing the ranked gene list to reference gene sets...
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done 1364 tasks      | elapsed:  1.1min
[Parallel(n_jobs=10)]: Done 3942 tasks      | elapsed:  3.4min
[Parallel(n_jobs=10)]: Done 3999 out of 3999 | elapsed:  4.4min finished


In [4]:
# get the top 10 results
results.head(10)

Unnamed: 0,pval,or,name,gene,background,overlap,geneset
547,0.0,13975.3,Acinar cell vs All others in PDAC,"CTRC,CELA2A,CELA2B,CELA3B,CELA3A,AMY2B,AMY2A,N...",9711,89,235
445,0.0,12622.148,Acinar cell vs All others in pancreas,"CTRC,CELA2A,CELA2B,CELA3B,CELA3A,AMY2B,AMY2A,R...",5783,66,91
494,0.0,5720.149,Acinar cell vs CCL2+ pancreatic ductal cell in...,"PRSS1,CELA3A,CTRC,CTRB1,CPB1,CELA3B,CPA1,CTRB2...",9711,71,234
493,0.0,5251.534,Acinar cell vs Pancreatic ductal cell in PDAC,"PRSS1,CTRB1,SPINK1,CTRC,CELA3A,CPA2,CTRB2,CPB1...",9711,76,229
495,0.0,5057.762,Acinar cell vs Ductal/EC doublet like cell in ...,"CTRB2,SPINK1,PRSS1,REG1A,CPB1,GP2,CPA1,CTRC,CT...",9711,71,164
428,0.0,4470.656,Acinar cell vs TUBA1A+ ductal cell in pancreas,"SPINK1,PNLIPRP1,CTRB2,CELA3B,CPA2,REG1A,CTRB1,...",5783,68,129
917,0.0,4293.675,Acinar cell vs HSP+ pancreatic ductal cell in ...,"CTRC,CTRB2,CTRB1,SPINK1,GP2,CELA3A,CPB1,CPA2,P...",9711,72,224
242,0.0,286.575,Paneth cell vs Goblet cell in intestine,"REG3A,REG1A,AZGP1,IGFBP2,REG1B,OLFM4,PDIA2",11184,7,50
243,0.0,142.972,Paneth cell vs Intestinal stem cell in intestine,"REG3A,DNAJC12,AZGP1,PDIA2,REG1B",11184,5,62
264,0.0,138.46,Paneth cell vs All others in intestine,"REG1A,REG3A,AZGP1,SPINK1,OLFM4,KLK1,KIAA1324,R...",11184,18,92
