### Notebook for Intercellular Context Factorization using `LIANA` and `Tensor-Cell2cell`

#### Environment: LIANA

- **Developed by:** Alexandra Cirnu
- **Modified by:** Alexandra Cirnu
- **Würzburg Institute for Systems Immunology & Julius-Maximilian-Universität Würzburg**
- **Date of creation:** 240426
- **Date of modification:** 240426

`Liana` works with log1p-transformed counts and uses **all genes** (with enough counts)

### Load in required modules

In [1]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import scanpy as sc
import plotnine as p9
import liana as li
import cell2cell as c2c
import decoupler as dc # needed for pathway enrichment
import muon as mu
from muon import atac as ac
from muon import prot as pt
from scipy.sparse import csr_matrix

import warnings
warnings.filterwarnings('ignore')
from collections import defaultdict

%matplotlib inline

In [2]:
# NOTE: to use CPU instead of GPU, set use_gpu = False
use_gpu = True

if use_gpu:
    import torch
    import tensorly as tl

    device = "cuda" if torch.cuda.is_available() else "cpu"
    if device == "cuda":
        tl.set_backend('pytorch')
else:
    device = "cpu"

device

'cuda'

In [3]:
sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()

sc.settings.set_figure_params(dpi = 300, color_map = 'RdPu', dpi_save = 300, vector_friendly = True, format = 'svg')

-----
anndata     0.10.6
scanpy      1.9.8
-----
PIL                 10.2.0
asttokens           NA
cell2cell           0.7.3
colorama            0.4.6
comm                0.2.2
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0
debugpy             1.8.1
decorator           5.1.1
decoupler           1.6.0
docrep              0.3.2
exceptiongroup      1.2.0
executing           2.0.1
h5py                3.10.0
ipykernel           6.29.3
ipywidgets          8.1.2
jedi                0.19.1
joblib              1.3.2
kiwisolver          1.4.5
kneed               0.8.5
liana               1.0.5
llvmlite            0.42.0
matplotlib          3.8.3
matplotlib_inline   0.1.6
mizani              0.11.0
mpl_toolkits        NA
mudata              0.2.3
muon                0.1.6
natsort             8.4.0
networkx            3.2.1
numba               0.59.0
numpy               1.26.4
packaging           24.0
pandas              2.2.1
parso               0.8.3
patsy           

### Load in the data set

In [4]:
input = '/home/acirnu/data/ACM_cardiac_leuco/5_Leiden_clustering_and_annotation/ACM_myeloids_clustered_muon_ac240415.raw.h5mu'
mdata = mu.read_h5mu(input)
mdata

In [5]:
adata = mdata.mod["rna"]

In [6]:
X_data = adata.X.copy()
X_data_sparse = csr_matrix(X_data)
X_data_df = pd.DataFrame.sparse.from_spmatrix(X_data_sparse, index=adata.obs.index, columns=adata.var.index)
print("Shape of counts DataFrame:", X_data_df.shape)
print(X_data_df)

Shape of counts DataFrame: (34482, 29378)
                       Xkr4  Gm1992  Gm19938  Gm37381  Rp1  Sox17  Gm37587  \
AAACGCTGTTGTGTTG-1-A1     0       0        0        0    0      0        0   
AAACGCTTCTCGCTCA-1-A1     0       0        0        0    0      0        0   
AAAGGTACAGAACATA-1-A1     0       0        0        0    0      0        0   
AAAGTCCAGGGACACT-1-A1     0       0        0        0    0      0        0   
AAAGTCCCAGTAGGAC-1-A1     0       0        0        0    0      0        0   
...                     ...     ...      ...      ...  ...    ...      ...   
TTTGTTGAGGTTAGTA-1-B2     0       0        0        0    0      0        0   
TTTGTTGCAAGCTCTA-1-B2     0       0        0        0    0      0        0   
TTTGTTGGTACAGGTG-1-B2     0       0        0        0    0      0        0   
TTTGTTGTCCCAGGAC-1-B2     0       0        0        0    0      0        0   
TTTGTTGTCCGGGACT-1-B2     0       0        0        0    0      0        0   

                     

In [7]:
adata_raw = adata.copy()

### Normalize count matrix

In [8]:
sc.pp.normalize_total(adata, target_sum = 1e6, exclude_highly_expressed = True)
sc.pp.log1p(adata)

normalizing counts per cell The following highly-expressed genes are not considered during normalization factor computation:
['Mapkapk2', 'Il1b', 'Fabp5', 'Fabp4', 'S100a8', 'S100a9', 'Prdx1', 'Cxcl2', 'Spp1', 'Myl2', 'Actb', 'Igkc', 'Apoe', 'Ftl1', 'Hbb-bt', 'Hbb-bs', 'Camp', 'Ngp', 'Slc16a10', 'Lyz2', 'Hba-a1', 'Hba-a2', 'Ccl8', 'Ccl3', 'Ccl4', 'Ctla2a', 'Cma1', 'Mcpt4', 'Retnla', 'Retnlg', 'Cmss1', 'Gm26917', 'Gm42418', 'Cd74', 'Malat1', 'Fth1', 'Tmsb4x']
    finished (0:00:00)


In [9]:
X_data = adata.X.copy()
X_data_sparse = csr_matrix(X_data)
X_data_df = pd.DataFrame.sparse.from_spmatrix(X_data_sparse, index=adata.obs.index, columns=adata.var.index)
print("Shape of counts DataFrame:", X_data_df.shape)
print(X_data_df)

Shape of counts DataFrame: (34482, 29378)
                       Xkr4  Gm1992  Gm19938  Gm37381  Rp1  Sox17  Gm37587  \
AAACGCTGTTGTGTTG-1-A1     0       0        0        0    0      0        0   
AAACGCTTCTCGCTCA-1-A1     0       0        0        0    0      0        0   
AAAGGTACAGAACATA-1-A1     0       0        0        0    0      0        0   
AAAGTCCAGGGACACT-1-A1     0       0        0        0    0      0        0   
AAAGTCCCAGTAGGAC-1-A1     0       0        0        0    0      0        0   
...                     ...     ...      ...      ...  ...    ...      ...   
TTTGTTGAGGTTAGTA-1-B2     0       0        0        0    0      0        0   
TTTGTTGCAAGCTCTA-1-B2     0       0        0        0    0      0        0   
TTTGTTGGTACAGGTG-1-B2     0       0        0        0    0      0        0   
TTTGTTGTCCCAGGAC-1-B2     0       0        0        0    0      0        0   
TTTGTTGTCCGGGACT-1-B2     0       0        0        0    0      0        0   

                     

### Run `LIANA` Ligand-Receptor Inference by Sample

Before we decompose the CCC patterns across contexts/samples with tensor_cell2cell, we need to run liana on each sample. This is because tensor_cell2cell uses LIANA’s output by sample to build a 4D tensor, that is later decomposed into CCC patterns.

In [10]:
li.mt.rank_aggregate.by_sample(
    adata, 
    groupby= 'classification',
    sample_key= 'sample',
    expr_prop= 0.1, 
    verbose= True, 
    use_raw= False, 
    resource_name= 'mouseconsensus',
    return_all_lrs=True, # return all LR values
    n_perms=None # exclude permutations for speed
    )

Now running: Pkp2_Ctr_MCMV_1:   0%|          | 0/30 [00:00<?, ?it/s]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_MCMV_2:   3%|▎         | 1/30 [00:39<19:03, 39.44s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_MCMV_3:   7%|▋         | 2/30 [01:29<21:27, 45.97s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_MCMV_4:  10%|█         | 3/30 [01:51<15:35, 34.66s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_MCMV_5:  13%|█▎        | 4/30 [02:05<11:35, 26.74s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_MCMV_6:  17%|█▋        | 5/30 [02:34<11:22, 27.29s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_noninf_1:  20%|██        | 6/30 [02:51<09:34, 23.94s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_noninf_2:  23%|██▎       | 7/30 [03:13<08:54, 23.22s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_noninf_3:  27%|██▋       | 8/30 [03:50<10:08, 27.66s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_Ctr_noninf_4:  30%|███       | 9/30 [04:07<08:30, 24.32s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_MCMV_1:  33%|███▎      | 10/30 [04:41<09:07, 27.38s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_MCMV_2:  37%|███▋      | 11/30 [05:33<11:02, 34.87s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_MCMV_3:  40%|████      | 12/30 [06:38<13:13, 44.09s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_MCMV_4:  43%|████▎     | 13/30 [07:33<13:22, 47.21s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_MCMV_5:  47%|████▋     | 14/30 [08:42<14:21, 53.83s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_MCMV_6:  50%|█████     | 15/30 [09:32<13:10, 52.67s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_noninf_1:  53%|█████▎    | 16/30 [10:35<13:03, 55.96s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_noninf_2:  57%|█████▋    | 17/30 [11:20<11:24, 52.62s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_noninf_3:  60%|██████    | 18/30 [12:15<10:39, 53.32s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Pkp2_HetKO_noninf_4:  63%|██████▎   | 19/30 [12:40<08:14, 44.92s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_Ctr_MCMV_1:  67%|██████▋   | 20/30 [13:33<07:51, 47.13s/it]     

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_Ctr_MCMV_2:  70%|███████   | 21/30 [13:57<06:02, 40.23s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_Ctr_MCMV_3:  73%|███████▎  | 22/30 [14:27<04:57, 37.16s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_Ctr_noninf_1:  77%|███████▋  | 23/30 [15:05<04:23, 37.60s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_Ctr_noninf_2:  80%|████████  | 24/30 [15:40<03:40, 36.77s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_HetKO_MCMV_1:  83%|████████▎ | 25/30 [16:21<03:10, 38.00s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_HetKO_MCMV_2:  87%|████████▋ | 26/30 [17:18<02:54, 43.59s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_HetKO_MCMV_3:  90%|█████████ | 27/30 [17:49<01:59, 39.85s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_HetKO_noninf_1:  93%|█████████▎| 28/30 [18:05<01:05, 32.59s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_HetKO_noninf_2:  97%|█████████▋| 29/30 [18:15<00:25, 25.84s/it]

... as `zero_center=True`, sparse input is densified and may lead to large memory consumption


Now running: Ttn_HetKO_noninf_2: 100%|██████████| 30/30 [19:00<00:00, 38.02s/it]


In [11]:
adata.uns['liana_res'].sort_values("magnitude_rank").head(60)

Unnamed: 0,sample,source,target,ligand_complex,receptor_complex,lr_means,expr_prod,scaled_weight,lr_logfc,spec_weight,lrscore,magnitude_rank
18047878,Pkp2_HetKO_MCMV_4,MØ_general_7,Neutrophils_6,Apoe,Sorl1,9.068289,80.198822,1.657782,3.914356,0.005981,0.9214,4.752047e-12
39048110,Ttn_HetKO_MCMV_1,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,8.947151,79.825264,2.268436,5.892633,0.026617,0.919997,5.189425e-12
22604911,Pkp2_HetKO_MCMV_6,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,8.94261,79.948402,2.516556,5.291747,0.034903,0.918767,5.782592e-12
15847188,Pkp2_HetKO_MCMV_3,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,9.028346,81.37429,2.256882,4.831636,0.016583,0.912198,5.958788e-12
11420436,Pkp2_HetKO_MCMV_1,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,9.210358,84.768929,2.181037,4.959343,0.020368,0.912264,6.500215e-12
20549311,Pkp2_HetKO_MCMV_5,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,9.101649,82.581749,2.154657,4.530683,0.01932,0.919631,7.047306e-12
29903317,Pkp2_HetKO_noninf_4,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,9.063356,81.929428,2.076731,5.015485,0.02624,0.921231,7.831931e-12
43716660,Ttn_HetKO_noninf_2,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,9.114004,82.844658,2.521034,6.255741,0.032972,0.92332,8.117814e-12
24863261,Pkp2_HetKO_noninf_1,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,8.985146,80.322868,2.05056,4.67807,0.018244,0.912923,8.139352e-12
0,Pkp2_Ctr_MCMV_1,Neutrophils_6,Neutrophils_6,Il1b,Il1r2_Il1rap,9.16678,83.807648,2.503216,5.436026,0.025397,0.915417,1.117201e-11


### Building a Tensor

Before we can decompose the tensor, we need to build it. To do so, we will use the to_tensor_c2c function from liana. This function takes as input the pandas.DataFrame with the results from liana.by_sample, and returns a cell2cell.tensor.PrebuiltTensor object. This object contains the tensor, as well as other useful utility functions.

In [12]:
tensor = li.multi.to_tensor_c2c(adata,
                                sample_key= "sample",
                                score_key='magnitude_rank', # can be any score from liana
                                how='outer_cells' # how to join the samples
                                )

100%|██████████| 30/30 [23:51<00:00, 47.70s/it]


In [13]:
tensor.tensor.shape

torch.Size([30, 891, 43, 43])

In [14]:
#Save the Tensor to disk
c2c.io.export_variable_with_pickle(tensor, "/home/acirnu/data/ACM_cardiac_leuco/Cell2cell/Tensor_myeloids_20240426.pkl")

/home/acirnu/data/ACM_cardiac_leuco/Cell2cell/Tensor_myeloids_20240426.pkl  was correctly saved.


In [None]:
#Build meta data
context_dict = adata.obs[['sample', 'condition']].drop_duplicates()
context_dict = dict(zip(context_dict['sample'], context_dict['condition']))
context_dict = defaultdict(lambda: 'Unknown', context_dict)

tensor_meta = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,
                                                  metadata_dicts=[context_dict, None, None, None],
                                                  fill_with_order_elements=True
                                                  )

Plot the most ‘relevant’ interactions ordered to the magnitude_rank results from aggregated_rank

In [None]:
sc.set_figure_params(dpi =100)
li.pl.dotplot(adata = adata,
              colour='magnitude_rank',
              inverse_colour=True,              
              size='specificity_rank',
              inverse_size=True,
              source_labels=['DOCK4+MØ_11', 'MØ_general_7', 'Neutrophils_6', 'LYVE1+MØ_5'],
              target_labels=['Monocytes_9', 'Neutrophils_6', 'Monocytes_1', 'MØ_general_11', 'Monocytes_4', 'Monocytes_2', 'LYVE1+MØ_8', 'LYVE1+MØ_4', 'LYVE1+MØ_9', 'MØ_general_4'],
              top_n=10,
              orderby='magnitude_rank',
              orderby_ascending=True,
              figure_size=(15, 8),
              cmap= "inferno"
             )

Similarly, we can also treat the ranks provided by RRA as a probability distribution to which we can filter interactions according to how robustly and highly ranked they are across the different methods.

In [None]:
my_plot = li.pl.dotplot(adata = adata,
                        colour='magnitude_rank',
                        inverse_colour=True,
                        size='specificity_rank',
                        inverse_size=True,
                        source_labels=['DOCK4+MØ_11', 'MØ_general_7', 'Neutrophils_6', 'LYVE1+MØ_5'],
                        target_labels=['Monocytes_9', 'Neutrophils_6', 'Monocytes_1', 'MØ_general_11', 'Monocytes_4', 'Monocytes_2', 'LYVE1+MØ_8', 'LYVE1+MØ_4', 'LYVE1+MØ_9', 'MØ_general_4'],
                        filter_fun=lambda x: x['specificity_rank'] <= 0.01,
                        figure_size=(20, 25),
                        cmap= "inferno"
                       )
my_plot