## Example integrated analysis of seqFISH data and force inference & morphometrics for mouse E8.5 embryo dorsal region

Import required modules

In [1]:
import sys
sys.path.append('../')

import matplotlib.colors
import numpy as np
import matplotlib.pyplot as plt
import skimage
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale, StandardScaler
import seaborn as sns
import umap
import scanpy as sc
import os

First, import data;`vmsi_res` contains the force inference and morhpometrics results, `gex` contains
the normalised and log-transformed expression values derived from seqFISH, and `neighbours` is a cell-cell adjacency graph
where non-zero values denote junctions between cells and the value denotes the tension at that junction.

Note: cells in the `vmsi_res` do not correspond exactly to the force inference output from `mouse_dorsal_seqfish_inferece.ipynb`
as some cells were filtered out to maintain consistency with seqFISH data.

In [2]:
curr_wd = os.getcwd()

gex = pd.read_csv(f'{curr_wd}/../example_data/mouse_dorsal_seqFISH/gex.csv', index_col=0)
neighbours = pd.read_csv(f'{curr_wd}/../example_data/mouse_dorsal_seqFISH/neighbours.csv', index_col=0)
vmsi_res = pd.read_csv(f'{curr_wd}/../example_data/mouse_dorsal_seqFISH/vmsi_res_merged.csv', index_col=0)

#### Integrated clustering with MUSE

Now that we have both force inferece/morphometrics and gene expression, we would like to see if
combining these modalities results in different celltype annotations compared to annotations from gene
expression alone.

We can do this using MUSE (*Bao el al., 2022*) to combine these modalities into a single joint latent representation
from which clusters can be obtained.

MUSE requires initial cluster labels for both modalities as input for the NN. We derive these from `phenograph`,
which is less sensitive to hyperparameter values.

To perform clustering, we first perform PCA on both modalities separately.

In [3]:
gex_norm = StandardScaler().fit_transform(gex.T)
pca = PCA()
gex_pca = pca.fit_transform(gex_norm)

vmsi_res_norm = StandardScaler().fit_transform(vmsi_res.drop(['centroid_x','centroid_y','celltype','celltype_conf'], axis=1))
pca = PCA()
vmsi_pca = pca.fit_transform(vmsi_res_norm)

Then, we obtain cluster labels with `phenograph`.

In [4]:
import phenograph

# assign initial cluster labels to morphometric data using phenograph
vmsi_labels,_,_ = phenograph.cluster(vmsi_pca)
# use gex clusters from Nat Biotech as initial cluster labels
#gex_labels = gex_metadata['cluster'].values
gex_labels,_,_ = phenograph.cluster(gex_pca)

Finding 30 nearest neighbors using minkowski metric and 'auto' algorithm
Neighbors computed in 0.19954323768615723 seconds
Jaccard graph constructed in 2.349684238433838 seconds
Wrote graph to binary file in 0.09792399406433105 seconds
Running Louvain modularity optimization
After 1 runs, maximum modularity is Q = 0.672642
After 6 runs, maximum modularity is Q = 0.67372
After 22 runs, maximum modularity is Q = 0.674926
Louvain completed 42 runs in 1.9907920360565186 seconds
Sorting communities by size, please wait ...
PhenoGraph completed in 5.68855094909668 seconds
Finding 30 nearest neighbors using minkowski metric and 'auto' algorithm
Neighbors computed in 0.22296905517578125 seconds
Jaccard graph constructed in 1.7323558330535889 seconds
Wrote graph to binary file in 0.29431581497192383 seconds
Running Louvain modularity optimization
After 1 runs, maximum modularity is Q = 0.77979
Louvain completed 21 runs in 0.8665270805358887 seconds
Sorting communities by size, please wait ...
P

Finally, run MUSE:

In [None]:
import muse_sc as muse

muse_features, reconstruct_x, reconstruct_y, latent_x, latent_y = muse.muse_fit_predict(vmsi_res_norm, gex_norm,
                                                                                        vmsi_labels, gex_labels,
                                                                                        latent_dim=100, n_epochs=500,
                                                                                        lambda_regul=5, lambda_super=5)

Instructions for updating:
non-resource variables are not supported in the long term
Instructions for updating:
Use `tf.cast` instead.
++++++++++ MUSE for multi-modality single-cell analysis ++++++++++
MUSE initialization


2022-07-30 19:33:31.923328: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


epoch: 0, 	 total loss: 9353.91211,	 reconstruction loss: 8357.23340,	 sparse penalty: 199.33569
epoch: 50, 	 total loss: 6246.11719,	 reconstruction loss: 5334.14258,	 sparse penalty: 182.39494
epoch: 100, 	 total loss: 5967.86133,	 reconstruction loss: 5144.58252,	 sparse penalty: 164.65573
epoch: 150, 	 total loss: 5757.92383,	 reconstruction loss: 5020.92822,	 sparse penalty: 147.39917
epoch: 0, 	 total loss: 5721.95166,	 reconstruction loss: 4954.50049,	 sparse penalty: 131.06985,	 x triplet: 13.04897,	 y triplet: 9.37152
epoch: 50, 	 total loss: 5518.62402,	 reconstruction loss: 4864.42090,	 sparse penalty: 115.98343,	 x triplet: 9.08492,	 y triplet: 5.77231
epoch: 100, 	 total loss: 5382.58838,	 reconstruction loss: 4802.82959,	 sparse penalty: 101.65481,	 x triplet: 8.84021,	 y triplet: 5.45672
epoch: 150, 	 total loss: 5261.72559,	 reconstruction loss: 4750.02832,	 sparse penalty: 88.30347,	 x triplet: 8.73445,	 y triplet: 5.30148
Finding 30 nearest neighbors using minkowski m

We can re-cluster with phenograph to obtain new clusters from the joint features computed by MUSE:

In [None]:
# Cluster on MUSE features
muse_labels,_,_ = phenograph.cluster(muse_features)

Compare these MUSE-derived clusters with the original gene expression-derived celltypes:

In [None]:
vmsi_res['muse_labels'] = muse_labels

cm = pd.DataFrame(np.zeros((len(np.unique(vmsi_res['muse_labels'])), len(np.unique(vmsi_res['celltype'].values)))))
cm.index = np.unique(vmsi_res['muse_labels'])
cm.columns = np.unique(vmsi_res['celltype'].values)

for i in range(len(vmsi_res['muse_labels'])):
    cm.loc[vmsi_res['muse_labels'].values[i], vmsi_res['celltype'].values[i]] += 1
cm = cm.divide(cm.sum(axis=1), axis=0)

g = sns.heatmap(cm, cbar_kws={'label': 'Overlap fraction'})
g.set(xlabel='GeX celltype', ylabel='MUSE label')
g.set_xticklabels(g.get_xticklabels(), rotation=45, horizontalalignment='right')
plt.show()