## GSEA
Prep single cell data for GSEA  
**Prerequisites**  
Perform batch correction on Shiraishi *et al* data. See batch-correction-scgen.ipynb.  

dataset is 3808 gnp, 6911 pnc, and 3279 tumor cells, totalling 13998 cells across 14044 genes.  
Note that GSEA is theoretically able to handle more than 2 phenotypes, but I ran into Java heap space errors so it was simpler to split into pairwise datasets.

In [None]:
import anndata as ad
import sys
sys.path.append('/Users/ochapman/projects/oscutils/')
import gsea_converters

## pnc vs tumor

In [None]:
path='../single-cell/out/shiraishi_merge.h5ad'
data=ad.read_h5ad(path)
data = data[data.obs.annotation.isin(['ProliferativeCells','DifferentiatedCells']) & data.obs['sample'].isin(['pnc','tumor'])].copy()
data

In [None]:
#data[data.obs['sample'] == 'pnc']
#data[data.obs['sample'] == 'tumor']

In [None]:
gsea_converters.exp2gct(
    df = data.to_df().T,
    outfile = 'data/scgen_batch_tumor_pnc.gct'
)
gsea_converters.labels2cls(
    series = data.obs['sample'],
    outfile = 'data/scgen_batch_tumor_pnc.cls'
)


## gnp vs pnc

In [None]:
path='../single-cell/out/shiraishi_merge.h5ad'
data=ad.read_h5ad(path)
data = data[data.obs.annotation.isin(['ProliferativeCells','DifferentiatedCells']) & data.obs['sample'].isin(['pnc','gnp'])].copy()
data

In [None]:
data[data.obs['sample'] == 'pnc']
data[data.obs['sample'] == 'gnp']

In [None]:
gsea_converters.exp2gct(
    df = data.to_df().T,
    outfile = 'data/scgen_batch_pnc_gnp.gct'
)
gsea_converters.labels2cls(
    series = data.obs['sample'],
    outfile = 'data/scgen_batch_pnc_gnp.cls'
)


## gnp vs tumor

In [None]:
path='../single-cell/out/shiraishi_merge.h5ad'
data=ad.read_h5ad(path)
data = data[data.obs.annotation.isin(['ProliferativeCells','DifferentiatedCells']) & data.obs['sample'].isin(['tumor','gnp'])].copy()
data

In [None]:
gsea_converters.exp2gct(
    df = data.to_df().T,
    outfile = 'data/scgen_batch_tumor_gnp.gct'
)
gsea_converters.labels2cls(
    series = data.obs['sample'],
    outfile = 'data/scgen_batch_tumor_gnp.cls'
)


## gene sets

In [None]:
genes = ['Cacna1e','Cntnap4','Grin2b','Kcnk1','Neurod1','Samd12','Scrt2','Tex14','Tll1','Zmat4']
comments = ['chd7-kmt2c-targets','Chd7 and Kmt2c targets nominated by differential expression of Ptch1+/- vs DKO mice.']
gsea_converters.iterable2grp(genes,'data/chd7-kmt2c-targets.grp',comment=comments)