# DeepCOLOR analysis on the colocalization matrix

Here, the dataset A1-1 from the Resolve data is used.

## Import libraries

In [1]:
import torch
import scanpy as sc
import numpy as np
import importlib
from matplotlib import pyplot as plt
import deepcolor
np.random.seed(1)
torch.manual_seed(1)
import pandas as pd

## Load data

Load in the scRNA-seq data and the Resolve spatial data that had already been estimated for spatial distriubtion.

In [2]:
# scRNA-seq trained data
sc_adata = sc.read_h5ad('data/deepcolor_mouseStSt.h5ad')
# Spatial trained data
sp_adata = sc.read_h5ad('data/deepcolor_A1-1.h5ad')

## Analyse the colocalization matrix

The colocalization matrix from the autoencoder is analyzed in this section.

In [None]:
# Make a copy for analysis
p_mat = sc_adata.obsm['map2sp'] / np.sum(sc_adata.obsm['map2sp'], axis=1).reshape((-1, 1))
# Calculate the colocalization matrix and log2 the values
# coloc_mat = p_mat @ p_mat.transpose()
# coloc_mat = np.log2(coloc_mat) + np.log2(p_mat.shape[1])

Convert the probability matrix (precursor to the colocalization matrix) into a dataframe.

In [4]:
df = pd.DataFrame(p_mat, index=sc_adata.obs['annot_fine_zonated'], columns=sp_adata.obs['annotationSave'])

In [11]:
df

annotationSave,Mesothelial cells,Capsule fibroblasts,Mesothelial cells,Mesothelial cells,Fibroblast,Hepatocytes_central,Fibroblast,Mesothelial cells,Hepatocytes_portal,Mesothelial cells,...,Hepatocytes_portal,Hepatocytes_central,Hepatocytes_central,Stellate cells_portal,Hepatocytes_central,KCs,Hepatocytes_portal,KCs,Capsule fibroblasts,Hepatocytes_portal
annot_fine_zonated,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
pDCs,6.314774e-06,0.000058,0.000014,4.488190e-06,1.713820e-06,2.152569e-06,3.018133e-06,0.000012,0.000005,3.347167e-06,...,0.000005,0.000004,2.729505e-06,2.585096e-06,2.214163e-06,9.322229e-07,0.000006,1.356230e-06,3.109811e-06,0.000005
KCs,1.031437e-06,0.000510,0.000005,8.706317e-07,5.317781e-07,1.533023e-06,1.111178e-06,0.000005,0.000004,5.054299e-07,...,0.000003,0.000003,2.318773e-06,2.264081e-06,1.519652e-06,9.755092e-06,0.000004,1.002687e-05,7.319547e-06,0.000004
T cells,1.086982e-05,0.000052,0.000013,6.871969e-06,1.870607e-06,1.729039e-06,2.951568e-06,0.000013,0.000004,6.003113e-06,...,0.000004,0.000003,2.284149e-06,1.735308e-06,1.888853e-06,8.384326e-07,0.000004,1.105916e-06,2.916881e-06,0.000004
pDCs,1.099171e-05,0.000028,0.000012,7.668319e-06,1.826093e-06,1.449094e-06,2.358452e-06,0.000011,0.000003,7.020060e-06,...,0.000003,0.000003,1.691284e-06,1.127798e-06,1.502512e-06,4.614550e-07,0.000004,7.245042e-07,1.341984e-06,0.000003
Monocytes,1.984784e-07,0.000186,0.000003,1.685255e-07,2.224741e-07,9.904396e-07,5.503195e-07,0.000002,0.000003,8.591471e-08,...,0.000002,0.000002,1.430834e-06,1.938570e-06,9.428663e-07,3.491571e-06,0.000003,3.533847e-06,7.557001e-06,0.000002
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
cDC1s,1.352499e-05,0.000006,0.000012,9.845427e-06,1.446495e-06,1.368862e-06,1.718909e-06,0.000013,0.000003,9.081251e-06,...,0.000002,0.000003,1.398715e-06,9.446662e-07,1.635072e-06,2.317924e-07,0.000004,4.448569e-07,6.482821e-07,0.000002
cDC1s,9.510376e-06,0.000005,0.000008,7.650667e-06,1.207297e-06,6.201688e-07,1.338949e-06,0.000008,0.000001,7.748529e-06,...,0.000001,0.000001,7.118774e-07,4.766329e-07,7.312581e-07,1.010120e-07,0.000002,2.027027e-07,2.510417e-07,0.000001
cDC1s,6.408325e-06,0.000029,0.000010,4.370406e-06,1.374683e-06,1.257555e-06,1.936367e-06,0.000009,0.000003,4.019570e-06,...,0.000003,0.000002,1.672651e-06,1.013902e-06,1.267876e-06,3.746607e-07,0.000003,5.454931e-07,1.132873e-06,0.000003
cDC2s,4.510120e-06,0.000037,0.000013,3.266477e-06,1.433462e-06,1.926854e-06,2.407534e-06,0.000011,0.000005,2.512553e-06,...,0.000004,0.000004,2.474251e-06,2.125662e-06,1.930906e-06,5.841701e-07,0.000005,8.667373e-07,2.387290e-06,0.000004


Obtain the cell types (from the spatial data) of with the highest probability to colocalize to the corresponding single-cell data.

In [5]:
top3 =pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(),  columns=['Top1', 'Top2', 'Top3'],index=df.index)

In [6]:
top3

Unnamed: 0_level_0,Top1,Top2,Top3
annot_fine_zonated,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
pDCs,pDCs,T cells,B cells
KCs,ILC1s,Neutrophils,Neutrophils
T cells,T cells,pDCs,pDCs
pDCs,ILC1s,pDCs,cDC1s
Monocytes,Neutrophils,Neutrophils,B cells
...,...,...,...
cDC1s,cDC1s,ILC1s,cDC1s
cDC1s,cDC1s,ILC1s,cDC1s
cDC1s,pDCs,pDCs,ILC1s
cDC2s,ILC1s,LSECs_central,cDC1s


Obtain the percentage of the spatial cell with the highest probability to colocalize to the cell in the single-cell data. There are 30% of cells from the single-cell data that are colocalized to the same cell type from the spatial data.

In [8]:
top3[(top3.index==top3['Top1'])].count()/top3.count()

Top1    0.298569
Top2    0.298569
Top3    0.298569
dtype: float64

In [None]:
# The below can be used to obtain the same percentage for the top three highest probabilites - result give 53.6%
# top3[(top3.index==top3['Top1'])|(top3.index==top3['Top2'])|(top3.index==top3['Top3'])].count()/top3.count()

Top1    0.535769
Top2    0.535769
Top3    0.535769
dtype: float64

## Synthetic colocalization analysis

A synthetic colocalization analysis is generated by giving equal probabilities for cell pairs with the same cell type annotations and no probability for cell pairs with different annotations.

In [18]:
# Dictionary of cell types for each cell
sc_dict = sc_adata.obs['annot_fine_zonated'].to_dict()
sp_dict = sp_adata.obs['annotationSave'].to_dict()

In [13]:
# Create empty df for the probability matrix
pmat = pd.DataFrame(np.zeros((len(sc_adata.obs),len(sp_adata.obs))), index=sc_adata.obs_names, columns=sp_adata.obs_names)

In [50]:
# Assign value of 1 if the cell type annotations are same for the two cells
for key, value in sp_dict.items():
    pmat[key] = np.where(pmat.index.map(sc_dict)==value,1,pmat[key])

The precursor to the colocalization matrix is generted and it can be seen below.

In [51]:
pmat

cells,65,66,67,129,130,192,193,195,256,257,...,32329,32393,32521,32585,32649,32713,32777,32841,32905,32969
AAACCTGAGAGCCCAA-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AAACCTGAGCTAGTCT-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
AAACCTGAGTTCGCGC-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AAACCTGCAACACCCG-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AAACCTGGTCAGTGGA-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTTGGTTGTCAGAATA-46,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TTTGTCAAGAAAGTGG-46,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TTTGTCACAGTATGCT-46,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TTTGTCAGTTGATTGC-46,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Add the matrix and calculate the probability from the matrix and store it at `p_mat`.

In [52]:
sc_adata.obsm['map2sp'] = pmat.values

In [54]:
sc_adata.obsm['p_mat'] = sc_adata.obsm['map2sp'] / np.sum(sc_adata.obsm['map2sp'], axis=1).reshape((-1, 1))

### Proximity analysis

With the new colocalization matrix, perform the DeepCOLOR analysis as normal.

First load the ligand-target matrix of NicheNet. This matrix is taken from NicheNet v2 instead of DeepCOLOR's matrix.

In [None]:
#! wget -O data/ligand_target_df.csv https://www.dropbox.com/s/2z7ogbks4504iya/ligand_target_df.csv?dl=0
#lt_df = pd.read_csv('data/ligand_target_df.csv', index_col=0)
lt_df = pd.read_csv('data/ligand_target_matrix.csv', index_col=0)

Set KCs, LAM (MoMac1), and central vein and capsule macrophages (MoMac2) as receivers.

The figure below show the full result.

In [None]:
importlib.reload(deepcolor)
# KCs, MoMac1 & 2
fig, coexp_cc_df = deepcolor.calculate_proximal_cell_communications(sc_adata, 'annot_fine_zonated', lt_df, ["KCs", 'MoMac1', 'MoMac2'], celltype_sample_num=500, ntop_genes=4000, each_display_num=3, role="receiver", edge_thresh=1)
fig

  sc_adata = sc_adata[sc_adata.obs.groupby(celltype_label).sample(celltype_sample_num, replace=True).index]
  utils.warn_names_duplicates("obs")
  ligand_adata.layers['activity'] = make_top_values(top_exps @ lt_df)
  coexp_cc_df = coexp_df.groupby(['cell2_type', 'cell1_type']).sum()
  sub_coexp_cc_df = coexp_cc_df.sort_values('coactivity', ascending=False).groupby('cell2_type', as_index=False).head(n=each_display_num)
