## Co-culture interaction experiment | Part 2

<br>
<b>Description</b> : In this notebook we run Intergram on the colon data.<br>
<b>Author</b> : Alma Andersson (andera29@gene.com)<br>
<b>Date</b> : 08/14/2024

In [1]:
%load_ext autoreload
%autoreload 2

Import relevant packages

In [2]:
import telegraph as tg
import anndata as ad 
import numpy as np
import scanpy as sc
import os.path as osp
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import pandas as pd

import os
import shutil

from scipy.stats import hypergeom

## Helper Functions

In [3]:
def read_h5ad_uniqify(path,tag = None):
    adata = ad.read_h5ad(path)
    adata.obs_names_make_unique()
    adata.var_names_make_unique()
    if tag is not None:
        adata.obs_names  = [f'{tag}_{x}' for x in adata.obs_names]
    return adata

def pp_adata(ad_sc,ad_sp = None):
    is_mt = ad_sc.var_names.str.startswith(("mt-","Mt","ENSMUSG"))
    is_rp = ad_sc.var_names.str.startswith(("rps", "rpl","rp-","rp"))
    is_rik = ad_sc.var_names.str.endswith(("Rik"))
    keep_genes = (~is_mt) & (~is_rp) & (~is_rik)
    ad_sc = ad_sc[:,keep_genes].copy()

    sc.pp.filter_cells(ad_sc,min_counts=300)
    
    sc.pp.filter_genes(ad_sc,min_counts=10)

    if ad_sp is not None:
        sc.pp.filter_cells(ad_sp,min_counts=100)
        sc.pp.filter_genes(ad_sp,min_counts=10)
  

    ad_sc.layers['raw'] = ad_sc.X.copy()
    sc.pp.normalize_total(ad_sc,1e4)
    sc.pp.log1p(ad_sc)
    sc.pp.highly_variable_genes(ad_sc,n_top_genes=5000)
    ad_sc.layers['norm'] = ad_sc.X.copy()
    ad_sc.X = ad_sc.layers['raw'].copy()

    if ad_sp is not None:
        return ad_sc,ad_sp
    return ad_sc


## Data and Processing
Both the single cell and spatial data was downloaded from: https://cellxgene.cziscience.com/collections/4fa07e63-f712-4d8c-b885-2c515b5e2743.
<br>
For the scRNA-seq data we used the "Global" data object, for the spatial data we took all non anti_IL10r samples.

Set data directory and filenames

In [3]:
DATA_DIR = '../data/imod/mouse_immune/'

with open('OUTPUT_DIR.txt','r+') as f:
    OUTPUT_ROOT = f.readlines()[0]

OUTPUT_DIR = osp.join(OUTPUT_ROOT, 'coculture')
os.makedirs(OUTPUT_DIR, exist_ok = True)

In [5]:
sp_files = ['1A_Hh.h5ad',
'2A_Hh.h5ad',
'3A_Hh.h5ad',
'4A_Hh.h5ad',
        ]

Load data and concatenate spatial data

In [6]:
ad_sp =  ad.concat([read_h5ad_uniqify(osp.join(DATA_DIR,fn)) for fn in sp_files])

In [7]:
ad_sp = ad_sp[ad_sp.obs['in_tissue'].values == 1]

Load single cell data

In [8]:
ad_sc = ad.read_h5ad(osp.join(DATA_DIR,'sc.h5ad'))

Convert ENSEMBL IDs to Gene Names

In [9]:
ensg_to_hgnc = ad_sc.var[['feature_name']].copy()
ad_sc.var_names = ensg_to_hgnc.loc[ad_sc.var_names].values.flatten()
inter = ad_sp.var_names.intersection(ensg_to_hgnc.index)
ad_sp = ad_sp[:,inter]
ad_sp.var_names = ensg_to_hgnc.loc[ad_sp.var_names].values.flatten()

We want to look at the interaction between CD4+ T-cells and Dendritic cells, so we'll merge some of the finer subtypes into a single T respecitevey DC class

In [10]:
cd4_t_cells = [
    'naive thymus-derived CD4-positive, alpha-beta T cell',
    'regulatory T cell',
    'CD4-positive, alpha-beta memory T cell',
    'T-helper 17 cell'
]

cd11c_dendritic_cells = [
    'conventional dendritic cell',
    'CD103-positive dendritic cell',
    'CD8_alpha-positive CD11b-negative dendritic cell',
    'dendritic cell'
]

In [11]:
ct_map = {key:key for key in np.unique(ad_sc.obs['cell_type'].values) }
ct_map = {key:(val  if key not in cd4_t_cells else 'Tcell_CD4+') for key,val in ct_map.items() }
ct_map = {key:(val  if key not in cd11c_dendritic_cells else 'DC') for key,val in ct_map.items() }

Add the curated cell type labels to the data

In [12]:
ad_sc.obs['curated_celltypes'] = ad_sc.obs['cell_type'].map(ct_map)

## Run Mapping Method and Interaction Model

Define label to use

In [13]:
label_col = 'curated_celltypes'

Preprocess data

In [14]:
ad_sc,ad_sp = pp_adata(ad_sc,ad_sp)

Prepare `state_dict` input for telegraph

In [15]:
hvg_genes = ad_sc.var_names[ad_sc.var.highly_variable.values].tolist()

input_dict_1 = tg.met.utils.adatas_to_input({'from':ad_sc, 'to':ad_sp}, # provide the data to be used
                                                    categorical_labels={'from':[label_col]}, # include cluster labels in the design matrix
                                                  )

Run mapping method

In [16]:

tg.met.pp.StandardTangramV2.run(input_dict_1)

map_res_1 = tg.met.map_methods.TangramV2Map.run(input_dict_1,
                                                num_epochs = 1000,
                                                genes =hvg_genes,
                                             )

input_dict_1.update(map_res_1)


INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 4489 genes and rna_count_based density_prior in clusters mode...
INFO:root:Printing scores every 100 epochs.


Set Solid Seed
Set Solid Seed
Score: 0.557, KL reg: 3.431, Entropy reg: -12.214
Score: 0.705, KL reg: 3.219, Entropy reg: -9.825
Score: 0.706, KL reg: 3.219, Entropy reg: -9.816
Score: 0.706, KL reg: 3.219, Entropy reg: -9.809
Score: 0.706, KL reg: 3.219, Entropy reg: -9.805
Score: 0.706, KL reg: 3.219, Entropy reg: -9.803
Score: 0.706, KL reg: 3.219, Entropy reg: -9.802
Score: 0.706, KL reg: 3.219, Entropy reg: -9.802
Score: 0.706, KL reg: 3.219, Entropy reg: -9.802
Score: 0.706, KL reg: 3.219, Entropy reg: -9.801


INFO:root:Renormalizing Single cell data
INFO:root:Begin training with 4489 genes and rna_count_based density_prior in cells mode after renormalization


Set Solid Seed


INFO:root:Printing scores every 100 epochs.


Set Solid Seed
Score: 0.690, KL reg: 0.182, Entropy reg: -97380.273
Score: 0.769, KL reg: 0.002, Entropy reg: -88910.836
Score: 0.775, KL reg: 0.001, Entropy reg: -77517.508
Score: 0.777, KL reg: 0.000, Entropy reg: -68511.109
Score: 0.778, KL reg: 0.000, Entropy reg: -64167.742
Score: 0.779, KL reg: 0.000, Entropy reg: -60468.164
Score: 0.779, KL reg: 0.000, Entropy reg: -56704.879
Score: 0.780, KL reg: 0.000, Entropy reg: -53021.523
Score: 0.780, KL reg: 0.000, Entropy reg: -49411.641
Score: 0.780, KL reg: 0.000, Entropy reg: -46150.680


INFO:root:Saving results..


Run Interaction Model

In [17]:
tg.met.pp.StandardScanpy.run(input_dict_1,target_objs = ['X_from'])

inter_res = tg.dev.imod.methods.InteractionModel.run(input_dict_1,
                                                     n_epochs = 1000,
                                                     learning_rate = 0.01)


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name         | Type | Params | Mode
---------------------------------------------
  | other params | n/a  | 326 K  | n/a 
---------------------------------------------
326 K     Trainable params
0         Non-trainable params
326 K     Total params
1.304     Total estimated model params size (MB)


Training: |                                                                                                   â€¦

`Trainer.fit` stopped: `max_epochs=1000` reached.


In [24]:
inter_res.to_netcdf(osp.join(OUTPUT_DIR,'inter_res.netcdf'))

## Downstream Analysis

Get model coefficients

In [31]:
label = 'DC_scanpy'
corr_res = dict()
overlap_res = dict()
alpha_sig = 0.05

# iterate over each interaction
for inter in beta.index.values:

    # get beta coefs. for given interaction
    pred_df = pd.DataFrame(beta.loc[inter]['beta'])
    pred_df.columns = ['beta']

    # get the reference genes
    true_df = true[label].copy()

    # only keep genes that were present in the sc data
    loc_inter = true_df.index.intersection(pred_df.index)

    # make sure pred and reference share genes
    pred_df = pred_df.loc[loc_inter]
    true_df = true_df.loc[loc_inter]

    # only keep the reference genes with a significant p-value
    true_df = true_df.iloc[true_df.pvals_adj.values <= alpha_sig]
    # get up-regulated reference genes
    true_df = true_df[true_df.logfoldchanges.values > 1]

    # sort the predicted and reference values
    top_pred =  pred_df.sort_values('beta',ascending = False).index
    top_true =  true_df.sort_values('logfoldchanges',ascending = False).index

    # compare the overlap
    overlap =  top_true.intersection(top_pred[0:len(top_true)])

    # register results
    overlap_res[inter] = dict(n_overlap = len(overlap),
                              n_pred =len(top_true),
                              n_true = len(top_true),
                              n_background = len(loc_inter),
                              overlap = overlap,
                              top_pred = top_pred,
                              top_true = top_true,
                              background = loc_inter )

In [32]:
obj = overlap_res[sel_inter]
pval = hypergeometric_test(**obj)
top_number = len(obj['top_true'])

In [33]:
print('[IM] OVERLAP: {}/{} | PVAL: {:0.2E}'.format(obj['n_overlap'],top_number,pval))

[IM] OVERLAP: 387/863 | PVAL: 9.62E-274


In [28]:
# get beta coefs. for given interaction

for inter_name in ['DC_vs_Tcell_CD4+','Tcell_CD4+_vs_DC']:
    pred_df = pd.DataFrame(beta.loc[inter_name]['beta'])
    pred_df.columns = ['beta']
    pred_df.to_csv(osp.join(OUTPUT_DIR, 'beta_{}.csv'.format(inter_name)))


Compare with Highly Variable Genes

In [34]:
pred_df = tg.methods.dea.HVGFeatureDEA.run_with_adata(input_dict_1['X_from'],
                                                      subset_col=label_col, 
                                                      subset_labels=('DC' if label.startswith('DC') else 'Tcell_CD4+'))

pred_df.index =  pd.Index([x.lower() for x in pred_df.index])

pred_df.to_csv(osp.join(OUTPUT_DIR,'hvg_{}.csv'.format(label.split('_')[0])))

[HVG] OVERLAP: 180/863 | PVAL: 1.36E-49
