### Notebook for Intercellular Context Factorization using `LIANA` and `Tensor-Cell2cell`

#### Environment: LIANA

- **Developed by:** Alexandra Cirnu
- **Modified by:** Alexandra Cirnu
- **Würzburg Institute for Systems Immunology & Julius-Maximilian-Universität Würzburg**
- **Date of creation:** 240426
- **Date of modification:** 240426

`Liana` works with log1p-transformed counts and uses **all genes** (with enough counts)

### Load in required modules

In [None]:
import cell2cell as c2c
import liana as li

import pandas as pd
import decoupler as dc # needed for pathway enrichment
import scanpy as sc
import numpy as np

import plotnine as p9
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import muon as mu
from muon import atac as ac
from muon import prot as pt
from scipy.sparse import csr_matrix

import warnings
warnings.filterwarnings('ignore')
from collections import defaultdict

In [None]:
# NOTE: to use CPU instead of GPU, set use_gpu = False
use_gpu = True

if use_gpu:
    import torch
    import tensorly as tl

    device = "cuda:1" if torch.cuda.is_available() else "cpu"
    if device == "cuda:1":
        tl.set_backend('pytorch')
else:
    device = "cpu"

device

In [None]:
sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()

sc.settings.set_figure_params(dpi = 300, color_map = 'RdPu', dpi_save = 300, vector_friendly = True, format = 'svg')

### Load in the data set

In [None]:
input_folder = '/home/acirnu/data/ACM_cardiac_leuco/5_Leiden_clustering_and_annotation/'
output_folder = '/home/acirnu/data/ACM_cardiac_leuco/Cell2cell/'

In [None]:
input = input_folder + 'ACM_myeloids_clustered_muon_ac240415.raw.h5mu'
mdata = mu.read_h5mu(input)
mdata

In [None]:
adata = mdata.mod["rna"]

In [None]:
X_data = adata.X.copy()
X_data_sparse = csr_matrix(X_data)
X_data_df = pd.DataFrame.sparse.from_spmatrix(X_data_sparse, index=adata.obs.index, columns=adata.var.index)
print("Shape of counts DataFrame:", X_data_df.shape)
print(X_data_df)

In [None]:
adata_raw = adata.copy()

### Normalize count matrix

In [None]:
sc.pp.normalize_total(adata, target_sum = 1e6, exclude_highly_expressed = True)
sc.pp.log1p(adata)

In [None]:
X_data = adata.X.copy()
X_data_sparse = csr_matrix(X_data)
X_data_df = pd.DataFrame.sparse.from_spmatrix(X_data_sparse, index=adata.obs.index, columns=adata.var.index)
print("Shape of counts DataFrame:", X_data_df.shape)
print(X_data_df)

### Run `LIANA` Ligand-Receptor Inference by Sample

Before we decompose the CCC patterns across contexts/samples with tensor_cell2cell, we need to run liana on each sample. This is because tensor_cell2cell uses LIANA’s output by sample to build a 4D tensor, that is later decomposed into CCC patterns.

In [None]:
li.mt.rank_aggregate.by_sample(
    adata,
    resource_name= 'mouseconsensus',
    groupby= 'classification',
    sample_key= 'sample', 
    use_raw= False,
    verbose= True, # use 'full' to show all verbose information
    n_perms= None, # exclude permutations for speed
    return_all_lrs= True, # return all LR values
    )

In [None]:
adata.uns['liana_res'].sort_values("magnitude_rank").head(30)

**My cmap parameter is ignored**

In [None]:
plot = li.pl.dotplot_by_sample(
    adata=adata,
    colour='magnitude_rank',
    size='lrscore',
    source_labels=["MØ_general_4", "LYVE1+MØ_11", "Monocytes_10", "DOCK4+MØ_7", "Monocytes_3"],
    target_labels=["DC_14", "LYVE1+MØ_9", "DC_12", "MØ_general_1", "MØ_general_8"],
    ligand_complex=['Apoe', 'App'],
    receptor_complex=['Cd74'],
    sample_key='sample',
    inverse_colour=True,
    inverse_size=False,
    figure_size=(25, 10),
    size_range=(1, 6),
    cmap="magma"
)

plot.save(output_folder + '/Dotplot-by-sample.pdf', height=9, width=9)
plot


In [None]:
adata.uns['liana_res'].to_csv(output_folder + 'LIANA_by_sample_20240429.csv', index=False)

In [None]:
adata.write_h5ad(output_folder + 'adata_with_lr_interactions_20240429.h5ad')

### Building a Tensor

Before we can decompose the tensor, we need to build it. To do so, we will use the to_tensor_c2c function from liana. This function takes as input the pandas.DataFrame with the results from liana.by_sample, and returns a cell2cell.tensor.PrebuiltTensor object. This object contains the tensor, as well as other useful utility functions.

#### Reorder the samples, as this is later on needed for the tensor

In [None]:
#Generate a list containing all samples from the AnnData object
sorted_samples = sorted(adata.obs['sample'].unique())
sorted_samples = [  'Pkp2_Ctr_noninf_1', 'Pkp2_Ctr_noninf_2', 'Pkp2_Ctr_noninf_3', 'Pkp2_Ctr_noninf_4',
                    'Pkp2_HetKO_noninf_1', 'Pkp2_HetKO_noninf_2', 'Pkp2_HetKO_noninf_3', 'Pkp2_HetKO_noninf_4',
                    'Pkp2_Ctr_MCMV_1', 'Pkp2_Ctr_MCMV_2', 'Pkp2_Ctr_MCMV_3', 'Pkp2_Ctr_MCMV_4', 'Pkp2_Ctr_MCMV_5', 'Pkp2_Ctr_MCMV_6',
                    'Pkp2_HetKO_MCMV_1', 'Pkp2_HetKO_MCMV_2', 'Pkp2_HetKO_MCMV_3', 'Pkp2_HetKO_MCMV_4', 'Pkp2_HetKO_MCMV_5', 'Pkp2_HetKO_MCMV_6',
                    'Ttn_Ctr_noninf_1', 'Ttn_Ctr_noninf_2',
                    'Ttn_HetKO_noninf_1', 'Ttn_HetKO_noninf_2',
                    'Ttn_Ctr_MCMV_1', 'Ttn_Ctr_MCMV_2', 'Ttn_Ctr_MCMV_3',
                    'Ttn_HetKO_MCMV_1', 'Ttn_HetKO_MCMV_2', 'Ttn_HetKO_MCMV_3']

In [None]:
# Convert the 'sample' column to a categorical type with the order specified in sorted_samples
adata.obs['sample'] = pd.Categorical(adata.obs['sample'], categories=sorted_samples, ordered=True)

# Sort the DataFrame by the 'sample' column
adata.obs = adata.obs.sort_values('sample')

In [None]:
adata.obs.head(15)

Pass the communication scores from LIANA to build the 3D tensors for each sample and then concatenate them to obtain the 4D tensor

In [None]:
tensor = li.multi.to_tensor_c2c(liana_res=adata.uns['liana_res'], # LIANA's dataframe containing results
                                sample_key='sample', # Column name of the samples
                                source_key='source', # Column name of the sender cells
                                target_key='target', # Column name of the receiver cells
                                ligand_key='ligand_complex', # Column name of the ligands
                                receptor_key='receptor_complex', # Column name of the receptors
                                score_key='magnitude_rank', # Column name of the communication scores to use
                                inverse_fun=lambda x: 1 - x, # Transformation function
                                how='outer', # What to include across all samples                   #'outer_cells' would consider only LR pairs that are present in all conditions, however as we also have Ttn samples I rather take all LR pairs as they may differ between Ttn and Pkp2
                                outer_fraction=1/3., # Fraction of samples as threshold to include cells and LR pairs.
                                context_order=sorted_samples, # Order to store the contexts in the tensor
                               )

In [None]:
tensor.tensor.shape

#### Create metadata

In [None]:
context_dict = adata.obs.sort_values(by='sample') \
                        .set_index('sample')['condition'] \
                        .to_dict()

In [None]:
dimensions_dict = [context_dict, None, None, None]
meta_tensor = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,
                                                  metadata_dicts=dimensions_dict,
                                                  fill_with_order_elements=True
                                                 )

##### Export the tensor and its metadata

In [None]:
c2c.io.export_variable_with_pickle(variable=tensor, filename=output_folder + 'Tensor_20240429.pkl')
c2c.io.export_variable_with_pickle(variable=meta_tensor, filename=output_folder + '/Tensor-Metadata_20240429.pkl')

##### Load the saved tensor with metadata

In [None]:
tensor = c2c.io.read_data.load_tensor(output_folder + 'Tensor_20240429.pkl')
meta_tensor = c2c.io.load_variable_with_pickle(output_folder + '/Tensor-Metadata_20240429.pkl')

### Run Tensor-cell2cell Factorization

In [None]:
%%time
tensor2 = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
                                                    meta_tensor,
                                                    copy_tensor=True, # Whether to output a new tensor or modifying the original
                                                    rank= None, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis.
                                                    tf_optimization='regular', # To define how robust we want the analysis to be.
                                                    random_state=0, # Random seed for reproducibility
                                                    device=device, # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
                                                    #elbow_metric='error', # Metric to use in the elbow analysis.
                                                    #smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
                                                    #upper_rank=30, # Max number of factors to try in the elbow analysis
                                                    #tf_init='random', # Initialization method of the tensor factorization
                                                    #tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
                                                    #cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
                                                    #sample_col='Element', # Columns containing the elements in the tensor metadata
                                                    #group_col='Category', # Columns containing the major groups in the tensor metadata
                                                    output_fig=True, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
                                                    output_folder= output_folder
                                                    )

Export Tensor and its metadata

In [None]:
meta_tensor2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor2,
                                                  metadata_dicts=dimensions_dict,
                                                  fill_with_order_elements=True
                                                 )

In [None]:
c2c.io.export_variable_with_pickle(variable=tensor2, filename=output_folder + 'Tensor_Factorized_20240429.pkl')
c2c.io.export_variable_with_pickle(variable=meta_tensor2, filename=output_folder + '/Tensor_Factorized-Metadata_20240429.pkl')

Load Tensor and its metadata

In [None]:
tensor2 = c2c.io.read_data.load_tensor(output_folder + 'Tensor_Factorized_20240429.pkl')
meta_tensor2 = c2c.io.load_variable_with_pickle(output_folder + '/Tensor_Factorized-Metadata_20240429.pkl')

In [None]:
tensor2.factors.keys()

In [None]:
tensor2.factors['Contexts'].head(10)

##### Compare pairs within conditions with **boxplots** and statistical tests

In [None]:
groups_order = ['Pkp2_Ctr_noninf', 'Pkp2_HetKO_noninf', 'Pkp2_Ctr_MCMV', 'Pkp2_HetKO_MCMV','Ttn_Ctr_noninf', 'Ttn_HetKO_noninf', 'Ttn_Ctr_MCMV', 'Ttn_HetKO_MCMV' ]
fig_filename = output_folder + '/Conditions_Boxplots_20240429.pdf'

_ = c2c.plotting.context_boxplot(context_loadings=tensor2.factors['Contexts'],
                                 metadict=context_dict,
                                 nrows=3,
                                 figsize=(5, 10),
                                 group_order=groups_order,
                                 statistical_test='Kruskal', #'t-test_ind', 't-test_welch', 't-test_paired', 'Mann-Whitney', 'Mann-Whitney-gt', 'Mann-Whitney-ls', 'Levene', 'Wilcoxon', 'Kruskal'
                                 pval_correction='bonferroni', #'bonferroni', 'bonf', 'Bonferroni', 'holm-bonferroni', 'HB', 'Holm-Bonferroni', 'holm', 'benjamini-hochberg', 'BH', 'fdr_bh', 'Benjamini-Hochberg', 'fdr_by', 'Benjamini-Yekutieli', 'BY', None
                                 cmap='tab20',
                                 verbose=True
                                )

##### Heatmaps for the LR pairs with loadings above a certain threshold

In [None]:
fig_filename = output_folder + '/Clustermap_LRs_20240429.pdf'

_ = c2c.plotting.loading_clustermap(loadings=tensor2.factors['Ligand-Receptor Pairs'],
                                    loading_threshold=0.1,
                                    use_zscore=False,
                                    figsize=(10, 3),
                                    filename=fig_filename,
                                    row_cluster=False,
                                    tick_fontsize=12,
                                    dendrogram_ratio=0.15,
                                   )

### Overall CCI potential

Define a threshold to indicate what pair of cells are interacting. To do so, we need to get all the outer products between the loadings for the sender and receiver cells dimensions across all factors.

In [None]:
# Get all outer products as adjacency matrices, one per factor
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(tensor2.factors,
                                                                           sender_label='Sender Cells',
                                                                           receiver_label='Receiver Cells',
                                                                           )

In [None]:
# Then, flatten the adjacency matrices
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')

# And we can plot the distributions of the weights for each factor-specific network
_ = plt.hist(network_by_factors.values.flatten(), bins = 50)

Chosen threshold = 0.042

In [None]:
threshold = 0.042

##### Heatmap of sender-receiver cell pairs

Evaluate the overall interactions between sender-receiver cell pairs that are predominant in a given facort/interaction program.

X-axis = receiver cells, Y-axis = sender cells

In [None]:
selected_factor = 'Factor 3'

In [None]:
loading_product = c2c.analysis.tensor_downstream.get_joint_loadings(tensor2.factors,
                                                                    dim1='Sender Cells',
                                                                    dim2='Receiver Cells',
                                                                    factor=selected_factor,
                                                                   )

In [None]:
lprod_cm = c2c.plotting.loading_clustermap(loading_product.T, # Remove .T to transpose the axes
                                           use_zscore=False, # Whether standardizing the loadings across factors
                                           figsize=(8, 8),
                                           filename=output_folder + '/Clustermap-CC-Pairs_20240429.pdf',
                                           cbar_label='Loading Product',
                                          )

##### Interaction network of sender-receiver cell pairs

In [None]:
c2c.plotting.ccc_networks_plot(tensor2.factors,
                               included_factors= None,
                               ccc_threshold=threshold, # Only important communication
                               nrows=1,
                               panel_size=(12,12), # This changes the size of each figure panel.
                               node_label_size=30,
                               filename=output_folder + '/Factor-Networks_20240429.pdf',
                              )

### Pathway Enrichment Analysis: Interpreting the context-driven communication

##### Classical Pathway Enrichment with `KEGG Pathways`

In [None]:
lr_loadings = tensor2.factors['Ligand-Receptor Pairs']

lr_pairs = li.resource.select_resource('mouseconsensus')

# Generate list with ligand-receptors pairs in DB
lr_list = ['^'.join(row) for idx, row in lr_pairs.iterrows()]

# Specify the organism and pathway database to use for building the LR set
organism = "mouse"
pathwaydb = "KEGG"

# Generate ligand-receptor gene sets
lr_set = c2c.external.generate_lr_geneset(lr_list,
                                          complex_sep='_',
                                          lr_sep='^',
                                          organism=organism,
                                          pathwaydb=pathwaydb,
                                          readable_name=True,
                                          output_folder=output_folder
                                         )

In [None]:
pvals, scores, gsea_df = c2c.external.run_gsea(loadings=lr_loadings,
                                               lr_set=lr_set,
                                               output_folder=output_folder,
                                               weight=1,
                                               min_size=15,
                                               permutations=999,
                                               processes=6,
                                               random_state=6,
                                               significance_threshold=0.05,
                                              )

The enriched pathways are:

In [None]:
gsea_df.loc[(gsea_df['Adj. P-value'] <= 0.05) & (gsea_df['NES'] > 0.)]

The depleted pathways are:

gsea_df.loc[(gsea_df['Adj. P-value'] <= 0.05) & (gsea_df['NES'] < 0.)]

In [None]:
pathway_label = '{} Annotations'.format(pathwaydb)
fig_filename = output_folder + '/GSEA-Dotplot_20240429.pdf'

with sns.axes_style("darkgrid"):
    dotplot = c2c.plotting.pval_plot.generate_dot_plot(pval_df=pvals,
                                                      score_df=scores,
                                                      significance=0.05,
                                                      xlabel='',
                                                      ylabel='{} Annotations'.format(pathwaydb),
                                                      cbar_title='NES',
                                                      cmap='PuOr',
                                                      figsize=(8,12),
                                                      label_size=24,
                                                      title_size=24,
                                                      tick_size=20,
                                                      filename=fig_filename
                                                      )

#### Footprint enrichment

Footprint enrichment analysis build upon classic geneset enrichment analysis, as instead of considering the genes involved in a biological activity, they consider the genes affected by the activity, or in other words the genes that change downstream of said activity

In [None]:
# We first load the PROGENy gene sets
net = dc.get_progeny(organism='human', top=5000)

# Then convert them to sets with weighed ligand-receptor pairs
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, net, lr_sep="^")

In [None]:
estimate, pvals =  dc.run_mlm(lr_loadings.transpose(),
                              lr_progeny,
                              source="source",
                              target="interaction",
                              use_raw=False)

In [None]:

fig_filename = output_folder + '/PROGENy_20240429.pdf'
_ = sns.clustermap(estimate, xticklabels=estimate.columns, cmap='coolwarm', z_score=4)

t = _.ax_heatmap.set_xticklabels(_.ax_heatmap.get_xmajorticklabels(), fontsize = 16)
t = _.ax_heatmap.set_yticklabels(_.ax_heatmap.get_ymajorticklabels(), fontsize = 16, rotation=0)

plt.savefig(fig_filename, dpi=300, bbox_inches='tight')

In [None]:
selected_factor = 'Factor 5'
fig_filename = output_folder + '/PROGENy-{}_20240429.pdf'.format(selected_factor.replace(' ', '-'))

dc.plot_barplot(estimate,
                selected_factor,
                vertical=True,
                cmap='coolwarm',
                save=fig_filename)