# Differential expression
What genes are upregulated in tumor vs. pnc hyperplastic cells?  
**Prerequisites**  
Perform batch correction on Shiraishi *et al* data. See batch-correction-scgen.ipynb.

## Introduction
Concordance across DEG approaches is low. Pseudobulk analyses outperform cell-level analyses. [source](https://www.sc-best-practices.org/conditions/differential_gene_expression.html).  
However, we have only 1 scRNA sample per phenotype (gnp, pnc, tumor). Therefore, we have to do it at the cell level.  
To estimate concordance, we will perform a few different analyses:
- T-test (https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html)
- diffxpy (https://diffxpy.readthedocs.io/en/latest/tutorials.html)
- GLM on bulk RNA-seq (edgeR, to be implemented in a new notebook).


In [None]:
import scanpy as sc
import anndata as ad

In [None]:
path='out/shiraishi_merge.h5ad'
data=ad.read_h5ad(path)
data = data[data.obs.annotation.isin(['ProliferativeCells','DifferentiatedCells']) & data.obs['sample'].isin(['pnc','tumor'])].copy()
data

In [None]:
sc.tl.rank_genes_groups(data,groupby='sample',groups=['pnc','tumor'],method='t-test')
sc.pl.rank_genes_groups(data,n_genes=30,sharey=False,save='_tumor_pnc_deg.png')

In [None]:
df1 = sc.get.rank_genes_groups_df(data,group='tumor')
df1.to_csv('out/deg/tumor_pnc_ttest_deg.tsv',sep='\t',index=False)