### Notebook for the inference of cell-cell communications on COPD-IAV data using `LIANA+`

- **Developed by**: Carlos Talavera-López
- **Würzburg Institute for Systems Immunology, Faculty of Medicine, Julius-Maximilian-Universität Würzburg**
- **Created**: 231109
- **Latest version**: 240508

### Import required modules

In [1]:
import anndata
import numpy as np
import liana as li
import pandas as pd
import scanpy as sc

from liana.method import singlecellsignalr, connectome, cellphonedb, natmi, logfc, cellchat, geometric_mean

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


### Set up working environment

In [2]:
sc.settings.verbosity = 3
sc.logging.print_versions()
sc.settings.set_figure_params(dpi = 180, color_map = 'magma_r', dpi_save = 300, vector_friendly = True, format = 'svg')

-----
anndata     0.10.7
scanpy      1.10.1
-----
PIL                 9.3.0
appnope             0.1.4
asttokens           NA
comm                0.2.2
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0.post0
debugpy             1.8.1
decorator           5.1.1
docrep              0.3.2
executing           2.0.1
h5py                3.11.0
ipykernel           6.29.4
ipywidgets          8.1.2
jedi                0.19.1
joblib              1.4.2
kiwisolver          1.4.5
legacy_api_wrap     NA
liana               1.1.0
llvmlite            0.42.0
matplotlib          3.8.4
mizani              0.11.2
mpl_toolkits        NA
mudata              0.2.3
natsort             8.4.0
numba               0.59.1
numpy               1.26.4
packaging           24.0
pandas              2.2.2
parso               0.8.4
patsy               0.5.6
platformdirs        4.2.1
plotnine            0.13.5
prompt_toolkit      3.0.43
psutil              5.9.8
pure_eval           0.2.2
pydev_ipyth

### Read in data

In [3]:
adata_all = sc.read_h5ad('../../../data/Marburg_cell_states_locked_ctl240504.raw.h5ad') 
adata_all

AnnData object with n_obs × n_vars = 97573 × 27208
    obs: 'sex', 'age', 'ethnicity', 'PaCO2', 'donor', 'infection', 'disease', 'SMK', 'illumina_stimunr', 'bd_rhapsody', 'n_genes', 'doublet_scores', 'predicted_doublets', 'batch', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'percent_mt2', 'n_counts', 'percent_chrY', 'XIST-counts', 'S_score', 'G2M_score', 'condition', 'sample_group', 'IAV_score', 'group', 'Viral_score', 'cell_type', 'cell_states', 'leiden', 'cell_compartment', '_scvi_batch', '_scvi_labels', 'C_scANVI', 'viral_counts', 'infected_status', 'seed_labels', 'batch-scANVI'
    var: 'mt', 'ribo'
    uns: 'cell_compartment_colors', 'cell_states_colors', 'disease_colors', 'group_colors', 'infection_colors'
    obsm: 'X_scANVI', 'X_umap'

### Check `LIANA+` available methods

In [4]:
li.mt.show_methods()

Unnamed: 0,Method Name,Magnitude Score,Specificity Score,Reference
0,CellPhoneDB,lr_means,cellphone_pvals,"Efremova, M., Vento-Tormo, M., Teichmann, S.A...."
0,Connectome,expr_prod,scaled_weight,"Raredon, M.S.B., Yang, J., Garritano, J., Wang..."
0,log2FC,,lr_logfc,"Dimitrov, D., Türei, D., Garrido-Rodriguez, M...."
0,NATMI,expr_prod,spec_weight,"Hou, R., Denisenko, E., Ong, H.T., Ramilowski,..."
0,SingleCellSignalR,lrscore,,"Cabello-Aguilar, S., Alame, M., Kon-Sun-Tack, ..."
0,Rank_Aggregate,magnitude_rank,specificity_rank,"Dimitrov, D., Türei, D., Garrido-Rodriguez, M...."
0,Geometric Mean,lr_gmeans,gmean_pvals,CellPhoneDBv2's permutation approach applied t...
0,scSeqComm,inter_score,,"Baruzzo, G., Cesaro, G., Di Camillo, B. 2022. ..."
0,CellChat,lr_probs,cellchat_pvals,"Jin, S., Guerrero-Juarez, C.F., Zhang, L., Cha..."


### Run `cellphoneDB` for trial

In [5]:
cellphonedb(adata_all, groupby = 'cell_states', 
            expr_prop = 0.1, 
            resource_name = 'consensus', 
            verbose = True, 
            key_added = 'cpdb_res',
            use_raw = False)
adata_all.uns['cpdb_res'].head()

Using `.X`!
Make sure that normalized counts are passed!
['NC_026431.1', 'NC_026432.1', 'NC_026433.1', 'NC_026434.1', 'NC_026435.1', 'NC_026436.1', 'NC_026437.1', 'NC_026438.1'] contain `_`. Consider replacing those!
Using resource `consensus`.
0.09 of entities in the resource are missing from the data.


Generating ligand-receptor stats for 97573 samples and 1693 features


100%|██████████| 1000/1000 [02:44<00:00,  6.07it/s]


Unnamed: 0,ligand,ligand_complex,ligand_means,ligand_props,receptor,receptor_complex,receptor_means,receptor_props,source,target,lr_means,cellphone_pvals
961483,SLPI,SLPI,1292.280884,1.0,PLSCR1,PLSCR1,36.042686,1.0,mixed_Goblet2,ifn_Goblet,664.161804,0.0
734339,SLPI,SLPI,1292.280884,1.0,PLSCR1,PLSCR1,27.638779,0.998639,mixed_Goblet2,SERPINE1+Basal,659.959839,0.0
324490,SLPI,SLPI,1292.280884,1.0,PLSCR1,PLSCR1,24.531784,0.976589,mixed_Goblet2,KRT16+SupraB,658.406311,0.0
385296,SLPI,SLPI,1292.280884,1.0,PLSCR1,PLSCR1,21.717438,0.998129,mixed_Goblet2,MHCII+Club,656.999146,0.0
87863,SLPI,SLPI,1292.280884,1.0,PLSCR1,PLSCR1,20.917444,0.998349,mixed_Goblet2,DHRS9+Club,656.599182,0.0


In [6]:
adata_all.obs['cell_states'].cat.categories

Index(['APOD+Ciliated', 'CCDC3+Basal1', 'DHRS9+Club', 'FB-like_Basal',
       'IGFBP6+Basal', 'IGFBP+Basal', 'ImmuneClub', 'Ionocyte',
       'KRT14+AQP1+Secretory', 'KRT14+Goblet', 'KRT16+SupraB', 'KRT17+Goblet',
       'MHCII+Club', 'MKI67+pBasal', 'MUC5B+Goblet', 'NOTCH3+SupraB',
       'NOTCH+Basal2', 'OASiav_Ciliated', 'OMG+Ciliated', 'RARRES1+lip_Goblet',
       'S100A2+Basal', 'SCGB1+KRT5-FOXA1+iav_Club', 'SCGB1A1+Deutero',
       'SCGB1A1+Goblet', 'SERPINE1+Basal', 'SERPINE2+Basal', 'TCN1+Club',
       'TNC+Basal', 'iav-lip_Club', 'iavAPC_Epi', 'iav_Goblet', 'ifn_Basal',
       'ifn_Goblet', 'mixed_Goblet1', 'mixed_Goblet2', 'p53_Ciliated'],
      dtype='object')

### Run `cellchat` for trial

In [None]:
cellchat(adata_all, groupby = 'cell_states', 
            expr_prop = 0.1, 
            resource_name = 'consensus', 
            verbose = True, 
            key_added = 'ccdb_res',
            use_raw = False)
adata_all.uns['ccdb_res'].head()

In [None]:
li.pl.dotplot(adata = adata_all,
              colour = 'lr_probs',
              size = 'cellchat_pvals',
              inverse_size = True, 
              source_labels = ['SERPINE1+Basal', 'SERPINE2+Basal', 'iavAPC_Epi', 'MHCII+Club', 'TNC+Basal'],
              target_labels = ['SERPINE1+Basal', 'SERPINE2+Basal', 'iavAPC_Epi', 'MHCII+Club', 'TNC+Basal'],
              figure_size = (25, 50),
              filterby = 'cellchat_pvals',
              filter_lambda = lambda x: x <= 0.05,
              uns_key = 'ccdb_res',
              cmap = 'magma'
             )

In [None]:
my_plot = li.pl.tileplot(adata = adata_all,
                         # NOTE: fill & label need to exist for both
                         # ligand_ and receptor_ columns
                         fill = 'cellchat_pvals',
                         label = 'lr_probs',
                         label_fun = lambda x: f'{x:.2f}',
                         top_n = 10,
                         orderby = 'cellchat_pvals',
                         orderby_ascending = True,
                         figure_size = (20, 10),
                         source_labels = ['SERPINE1+Basal', 'SERPINE2+Basal', 'iavAPC_Epi', 'MHCII+Club', 'TNC+Basal'],
                         target_labels = ['SERPINE1+Basal', 'SERPINE2+Basal', 'iavAPC_Epi', 'MHCII+Club', 'TNC+Basal'],
                         uns_key = 'ccdb_res'
                         )
my_plot

### Run aggregated estimate 

In [23]:
li.mt.rank_aggregate(adata_all, groupby = 'cell_states', expr_prop = 0.1, verbose = True, use_raw = False)
adata_all.uns['liana_res'].head()

Using `.X`!
Make sure that normalized counts are passed!
['NC_026431.1', 'NC_026432.1', 'NC_026433.1', 'NC_026434.1', 'NC_026435.1', 'NC_026436.1', 'NC_026437.1', 'NC_026438.1'] contain `_`. Consider replacing those!
Using resource `consensus`.
0.09 of entities in the resource are missing from the data.


Generating ligand-receptor stats for 97573 samples and 1693 features
... as `zero_center=True`, sparse input is densified and may lead to large memory consumption




Assuming that counts were `natural` log-normalized!




Running CellPhoneDB


100%|██████████| 1000/1000 [02:48<00:00,  5.94it/s]


In [None]:
rank_aggregate.describe()