# Cell-Cell Interaction Analysis with Xenium Data

This tutorial runs through stLearn CCI analysis on extremely large data, containing >166,000 single cells with gene expression measured in space. 

To increase computation speed, we grid the cells and perform stLearn CCI analysis while taking into account the proportion of different cell types detected per grid.

### Environment setup

In [None]:
import matplotlib.pyplot as plt
import stlearn as st
import pathlib as pathlib

st.settings.set_figure_params(dpi=120)

# Ignore all warnings
import warnings

warnings.filterwarnings("ignore")

### Loading the data
For this tutorial purpose, we don't perform the clustering. Please run the clustering method with the clustering tutorial or a part of spatial trajectory inference tutorial.

In [None]:
# Setup download directory and get data
st.settings.datasetdir =  pathlib.Path.cwd().parent / "data"
library_id = "Xenium_FFPE_Human_Breast_Cancer_Rep1"
data_dir = st.settings.datasetdir / "Xenium_FFPE_Human_Breast_Cancer_Rep1"

In [None]:
st.datasets.xenium_sge(library_id=library_id, include_hires_tiff=True)

In [None]:
adata = st.ReadXenium(feature_cell_matrix_file=data_dir / "cell_feature_matrix.h5",
                      cell_summary_file=data_dir / "cells.csv.gz",
                      library_id=library_id,
                      image_path=data_dir / "he_image.ome.tif",
                      scale=1,
                      spot_diameter_fullres=15,
                      alignment_matrix_file=data_dir / "he_imagealignment.csv",
                      experiment_xenium_file=data_dir / "experiment.xenium",
                      )

In [None]:
# QC - Filter genes and cells with at least 10 counts
st.pp.filter_genes(adata, min_counts=10)
st.pp.filter_cells(adata, min_counts=10)

In [None]:
adata.X.toarray()

In [None]:
# Store the raw data for using PSTS
adata.raw = adata

In [None]:
# Run PCA, neighbors and clustering.
st.em.run_pca(adata, n_comps=50, random_state=0)
st.pp.neighbors(adata, n_neighbors=25, use_rep='X_pca', random_state=0)
st.tl.clustering.louvain(adata, random_state=0)

In [None]:
st.pl.cluster_plot(adata, use_label="louvain", image_alpha=0, size=4, figsize=(10, 10))

## Note on normalisation

No log1p or shrinking to make genes of similar expression range. In our case, for calling hotspots, we want genes to be more separate, since we select background genes with similar expression levels to detect hotspots.

In [None]:
#### Normalize total...
st.pp.normalize_total(adata)

In [None]:
adata.X.toarray()

## Gridding

Now performing the gridding. The resolution chosen here may effect the results. The higher resolution, the better this represents the single cell data but the longer the computation takes.

To summarise the gene expression across cells in a grid, we sum the library size normalised gene expression. Summing allows for representing the fact there are multiple cells in a given spot.

In [None]:
### Calculating the number of grid spots we will generate
n_ = 125
print(f'{n_} by {n_} has this many spots:\n', n_ * n_)

By providing 'use_label' to the function below, the cell type information is saved as deconvolution information
per spot, and also the dominant cell annotation. That way we can perform stLearn CCI with the cell type information!

In [None]:
### Gridding.
grid = st.tl.cci.grid(adata, n_row=n_, n_col=n_, use_label='louvain')
print(grid.shape)  # Slightly less than the above calculation, since we filter out spots with 0 cells.

### Checking the gridding

Comparing the gridded data to the original, to make sure it makes sense.

It's recommend to visualise the dominant cell types per spot, in order to gauge whether the tissue structure is adequately maintained after gridding (i.e. to make sure it is not too low resolution!).

In [None]:
fig, axes = plt.subplots(ncols=2, figsize=(20, 8))
st.pl.cluster_plot(grid, use_label='louvain', size=10, ax=axes[0], show_plot=False)
st.pl.cluster_plot(adata, use_label='louvain', ax=axes[1], show_plot=False)
axes[0].set_title(f'Grid louvain dominant spots')
axes[1].set_title(f'Cell louvain labels')
plt.show()

In [None]:
groups = list(grid.obs['louvain'].cat.categories)
for group in groups[0:2]:
    fig, axes = plt.subplots(ncols=3, figsize=(20, 8))
    group_props = grid.uns['louvain'][group].values
    grid.obs['group'] = group_props
    st.pl.feat_plot(grid, feature='group', ax=axes[0], show_plot=False, vmax=1, show_color_bar=False)
    st.pl.cluster_plot(grid, use_label='louvain', list_clusters=[group], ax=axes[1], show_plot=False)
    st.pl.cluster_plot(adata, use_label='louvain', list_clusters=[group], ax=axes[2], show_plot=False)
    axes[0].set_title(f'Grid {group} proportions (max = 1)')
    axes[1].set_title(f'Grid {group} max spots')
    axes[2].set_title(f'Individual cell {group}')
    plt.show()

In [None]:
fig, axes = plt.subplots(ncols=2, figsize=(20, 5))
st.pl.gene_plot(grid, gene_symbols='CXCL12', ax=axes[0], show_color_bar=False, show_plot=False)
st.pl.gene_plot(adata, gene_symbols='CXCL12', ax=axes[1], show_color_bar=False, show_plot=False, vmax=80)
axes[0].set_title(f'Grid CXCL12 expression')
axes[1].set_title(f'Cell CXLC12 expression')
plt.show()

## LR Permutation Test

Running the LR permutation test, to determine regions of high LR co-expression.

In [None]:
# Loading the LR databases available within stlearn (from NATMI)
lrs = st.tl.cci.load_lrs(['connectomeDB2020_lit'], species='human')
print(len(lrs))

In [None]:
# Running the analysis #
st.tl.cci.run(grid, lrs,
              min_spots=20,  # Filter out any LR pairs with no scores for less than min_spots
              distance=250,  # None defaults to spot+immediate neighbours; distance=0 for within-spot mode
              n_pairs=1000,  # Number of random pairs to generate; low as example, recommend ~10,000
              n_cpus=None,   # Number of CPUs for parallel. If None, detects & use all available.
              )

In [None]:
lr_info = grid.uns['lr_summary']  # A dataframe detailing the LR pairs ranked by number of significant spots.
print(lr_info.shape)
print(lr_info)

In [None]:
# Showing the rankings of the LR from a global and local perspective.
# Ranking based on number of significant hotspots.
st.pl.lr_summary(grid, n_top=500)
st.pl.lr_summary(grid, n_top=50, figsize=(10, 3))

In [None]:
### Can adjust significance thresholds.
st.tl.cci.adj_pvals(grid, correct_axis='spot',
                    pval_adj_cutoff=0.05, adj_method='fdr_bh')

### Downstream visualisations from LR analsyis

For more downstream visualisations, please see the stLearn CCI tutorial:

https://stlearn.readthedocs.io/en/latest/tutorials/stLearn-CCI.html

In [None]:
best_lr = grid.uns['lr_summary'].index.values[0]  # Just choosing one of the top from lr_summary

In [None]:
stats = ['lr_scores', '-log10(p_adjs)', 'lr_sig_scores']
fig, axes = plt.subplots(ncols=len(stats), figsize=(12, 6))
for i, stat in enumerate(stats):
    st.pl.lr_result_plot(grid, use_result=stat, use_lr=best_lr, show_color_bar=False, ax=axes[i])
    axes[i].set_title(f'{best_lr} {stat}')

## Predicting Significant Cell-Cell Interactions

For this analysis, we are using within-spot mode, with the deconvolution information which is stored in grid.uns, as automatically determined by st.tl.cci.grid. The permutation is performed to permute the cell proportions associated with each spot, to determine spots located in LR hotspots that have certain cell types over-represented.

In [None]:
grid

In [None]:
st.tl.cci.run_cci(grid, 'louvain',  # Spot cell information either in data.obs or data.uns
                  min_spots=2,  # Minimum number of spots for LR to be tested.
                  spot_mixtures=True,  # If True will use the deconvolution data,
                  # so spots can have multiple cell types if score>cell_prop_cutoff
                  cell_prop_cutoff=0.1,  # Spot considered to have cell type if score>0.1
                  sig_spots=True,  # Only consider neighbourhoods of spots which had significant LR scores.
                  n_perms=100,  # Permutations of cell information to get background, recommend ~1000
                  n_cpus=None,
                  )

In [None]:
st.tl.cci.run_cci(grid, 'louvain',  # Spot cell information either in data.obs or data.uns
                  min_spots=2,  # Minimum number of spots for LR to be tested.
                  spot_mixtures=True,  # If True will use the deconvolution data,
                  # so spots can have multiple cell types if score>cell_prop_cutoff
                  cell_prop_cutoff=0.1,  # Spot considered to have cell type if score>0.1
                  sig_spots=True,  # Only consider neighbourhoods of spots which had significant LR scores.
                  n_perms=100,  # Permutations of cell information to get background, recommend ~1000
                  n_cpus=10,
                  )

## Diagnostic: checking for interaction and cell type frequency correlation

Should be little to no correlation; indicating the permutation has adequately controlled for cell type frequency.

In [None]:
st.pl.cci_check(grid, 'louvain', figsize=(16, 5))

## CCI Visualisations

### CCI Network Plot

In [None]:
# Visualising the no. of interactions between cell types across all LR pairs #
pos_1 = st.pl.ccinet_plot(grid, 'louvain', return_pos=True, min_counts=30)

# Just examining the cell type interactions between selected pairs #
lrs = grid.uns['lr_summary'].index.values[0:3]
for best_lr in lrs[0:2]:
    st.pl.ccinet_plot(grid, 'louvain', best_lr, min_counts=2,
                      figsize=(10, 7.5), pos=pos_1,
                      )

### CCI Chord Plot

In [None]:
st.pl.lr_chord_plot(grid, 'louvain')

for lr in lrs[0:2]:
    st.pl.lr_chord_plot(grid, 'louvain', lr)

### For additional visualisations and visualisation tips, please see:

   https://stlearn.readthedocs.io/en/latest/tutorials/stLearn-CCI.html 
   
<b>Tutorial by [Brad Balderson](https://github.com/BradBalderson)</b>

### Acknowledgement
We would like to thank Soo Hee Lee and 10X Genomics team for their able support, contribution and feedback to this tutorial