__Author:__ Bram Van de Sande
   
__Date:__ 1 FEB 2018

__Outline:__ Characterize the different cells in a single-cell transcriptomics experiment by the enrichment of the regulomes. Enrichment of a regulome is measures as AUC of the recovery curve of the genes that define this regulome.

In [12]:
import pickle
import os
import pandas as pd
from collections import defaultdict
from pyscenic.aucell import create_rankings, enrichment

In [5]:
RESOURCES_FOLDER="/Users/bramvandesande/Projects/lcb/resources"
DATA_FOLDER="/Users/bramvandesande/Projects/lcb/tmp"

Load and rank expression profiles from single-cell experiment.

In [6]:
ex_mtx = pd.read_csv(os.path.join(RESOURCES_FOLDER, 'GSE60361_C1-3005-Expression.txt'), sep='\t', header=0, index_col=0)

In [7]:
rnk_mtx = create_rankings(ex_mtx)

Load regulomes discovered in previous phase.

In [8]:
with open(os.path.join(DATA_FOLDER, 'regulomes.pickle'), 'rb') as f:
    regulomes = pickle.load(f)

Calculate enrichment as AUC (NES is not valid because AUC are not normally distributed) of regulomes in cells.

Best to calculate the rankings for a subset of the regulomes, i.e. per database and regulome definition.

In [16]:
context2regulomes = defaultdict(set)
for regulome in regulomes:
    context2regulomes[regulome.context].add(regulome)

In [17]:
context = ('mm9-500bp-upstream-7species', 'target weight >= 0.00')
df = pd.concat([enrichment(rnk_mtx, regulome) for regulome in context2regulomes[context]]).unstack("Regulome")

In [18]:
df

Unnamed: 0_level_0,AUC
Regulome,Alx1
Cell,Unnamed: 1_level_2
1772058148_A01,0.000342
1772058148_A03,0.000344
1772058148_A04,0.000000
1772058148_A05,0.000000
1772058148_A06,0.000000
1772058148_A07,0.000304
1772058148_A09,0.000229
1772058148_A10,0.000382
1772058148_A11,0.000493
1772058148_A12,0.000089
