In [1]:
import wot

  from ._conv import register_converters as _register_converters


# Notebook 8: Predictive Transcription Factors

In this notebook we identify key transcription factors regulating cell fates. For a given cell set of interest, we ask:

*Which transcription factors are enriched in cells fated for this cell set?*

We begin by loading the data, defining a target cell set (iPSCs on day 17), and computing a fate matrix.

In [7]:
FULL_DS_PATH = 'data/ExprMatrix.h5ad'
CELL_DAYS_PATH = 'data/cell_days.txt'
CELL_SETS_PATH = 'data/major_cell_sets.gmt'
TFS_PATH = 'data/TFs.txt'

#####
# Load expression dataset and subset to transcription factors
adata = wot.io.read_dataset(FULL_DS_PATH, obs=[CELL_DAYS_PATH], var_filter=TFS_PATH)
# Load transport map model and cell sets
tmap_model = wot.tmap.TransportMapModel.from_directory('tmaps/serum')
major_cell_sets = wot.io.read_sets('data/major_cell_sets.gmt', as_dict=True)

#####
# create indicator vector for IPS cell set at day 17
target_cell_set = tmap_model.population_from_cell_sets({'IPS':major_cell_sets['IPS']}, at_time=17)
# Compute fate matrix for IPS 
fate_ds = tmap_model.fates(target_cell_set)

We now search for transcription factors that are differentially expressed in the cells most fated to transition to iPSCs.

In [9]:
# Find differentially expressed genes
results = wot.tmap.diff_exp(adata[adata.obs['day']==14.0], fate_ds,  nperm=1000)

Finally, we visualize the results as a table.

In [10]:
df = results['IPS_Other']
df.sort_values('fraction_expressed_ratio_14.0_14.0', inplace=True, ascending=False)
df.head(10)

Unnamed: 0_level_0,mean_IPS_14.0,fraction_expressed_IPS_14.0,mean_Other_14.0,fraction_expressed_Other_14.0,fold_change_14.0_14.0,fraction_expressed_ratio_14.0_14.0,p_value_14.0_14.0,fdr_14.0_14.0,k_14.0_14.0
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Spic,0.140514,0.220444,0.000316,0.000486,0.140198,376.457458,0.0,0.0,0.0
Olig2,0.032737,0.064083,6.6e-05,0.000128,0.032672,281.184235,0.0,0.0,0.0
Sohlh2,0.130946,0.220173,0.000472,0.000777,0.130474,251.296478,0.0,0.0,0.0
Obox6,0.157129,0.261668,0.000741,0.001104,0.156388,217.468735,0.0,0.0,0.0
Dmrt1,0.063819,0.121083,0.000312,0.000581,0.063506,178.027695,0.0,0.0,0.0
Msc,0.198591,0.278676,0.001327,0.001717,0.197264,153.449982,0.0,0.0,0.0
Hesx1,0.314609,0.414081,0.002154,0.003312,0.312454,121.401947,0.0,0.0,0.0
Olig1,0.076545,0.138964,0.000837,0.001154,0.075707,110.916389,0.0,0.0,0.0
Hnf4a,0.007906,0.011647,1.6e-05,2.4e-05,0.00789,95.100861,0.0,0.0,0.0
Nanog,1.269369,0.90651,0.008559,0.009716,1.26081,92.357635,0.0,0.0,0.0


We can run the previous functions from the command line using the following two commands. The first command computes a fate matrix for iPS cells and stores it in the file IPS_fates.txt. The second command loads that fate matrix and looks for TFs whose expression is correlated with the fates. 

```wot fates \
--tmap tmaps/serum \
--cell_set data/major_cell_sets.gmt \
--day 17 \
--cell_set_filter IPS \
--out IPS \
--verbose```

```wot diff_exp \
--matrix data/ExprMatrix.h5ad \
--cell_days data/cell_days.txt \
--fates IPS_fates.txt \
--nperm 1000 \
--gene_filter data/TFs.txt \
--cell_day_filter 14 \
--verbose```