# Learning Spatial Relationships with MISTy

While Moran's R provides a sound summary of spatial clustering, it is limited to two variables at a time and is thus not fit for **complex, or non-linear, spatial relationships** between variables. 

Here, we show how to use LIANA's implementation of [MISTy](https://github.com/saezlab/mistyR), a framework presented in [Tanevski et al., 2022](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02663-5).

**MISTy** is a tool that helps us better understand how different features, such as genes or cell types, interact with each other in space. MISTy does so by learning both **intra-** and **extracellular** relationships - i.e. those that occur within and between cells/spots. **A major advantage of MISTy is its flexibility**. It can model different perspectives, or "views," each describing a different way markers are related to each other. Each of these views can describe a different spatial context, i.e. define a relationship among the observed expressions of the markers, such as intracellular regulation or paracrine regulation.

**MISTy has only one fixed view** - i.e. the **intraview**, which contains the target (dependent) variables. The other views we refer to as extra views, and they contain the independent variables used to predict the intra view. **MISTy can fit any number of extra views, and each extra view can contain any number of variables.** The extra views can thus simultaneously **learn the dependencies of target variables across different modalities**, such as cell type proportions, pathways, or genes, etc.

MISTy represents each view represents as a potential source of variation in the measurements of the target  variables in the intra view. MISTy further analyzes each view to determine how it contributes to the overall expression or abundance of each target variable. It explains this contribution by identifying the interactions between measurements that led to the observed results.

### Import packages

In [None]:
import scanpy as sc
import decoupler as dc
import plotnine as p9
import liana as li
import os

datadir = '../../datasets/Hands_on_2_LIANA_MistY/'

### Import functions needed to create MISTy objects.

In [None]:
from liana.method import MistyData, genericMistyData, lrMistyData

### Import Pre-defined Single view models

In [None]:
from liana.method.sp import RandomForestModel, LinearModel, RobustLinearModel

### Load and Normalize Data

We still use "kuppe_heart19.h5ad"

In [None]:
adata = sc.read(os.path.join(datadir, "kuppe_heart19.h5ad"))

In [None]:
adata.layers['counts'] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

##### Extract Cell type Composition
This slide comes with estimated cell type proportions using cell2location; See [Kuppe et al., 2022](https://www.nature.com/articles/s41586-022-05060-x). Let's extract from .obsm them to an independent AnnData object.

In [None]:
# Rename to more informative names
full_names = {'Adipo': 'Adipocytes',
              'CM': 'Cardiomyocytes',
              'Endo': 'Endothelial',
              'Fib': 'Fibroblasts',
              'PC': 'Pericytes',
              'prolif': 'Proliferating',
              'vSMCs': 'Vascular_SMCs',
              }
# but only for the ones that are in the data
adata.obsm['compositions'].columns = [full_names.get(c, c) for c in adata.obsm['compositions'].columns]

In [None]:
comps = li.ut.obsm_to_adata(adata, 'compositions')

In [None]:
comps.var

In [None]:
# check key cell types
sc.pl.spatial(comps,
              color=['Vascular_SMCs','Cardiomyocytes',
                     'Endothelial', 'Fibroblasts'],
              size=1.3, ncols=2, alpha_img=0
              )

## Formatting & Running MISTy


The implementation of MISTy in LIANA relies on [MuData](https://github.com/scverse/mudata) objects [(Bredikhin et al., 2022)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02577-8) and extends them to a very simple child class we call **"MistyData"**. 
To make it easier to use, we provide functions to construct "MistyData" objects that transform the data into a format that MISTy can use.

Briefly, a **"MistyData"** object is just a MuData object with **intra** as one of the modalities - this is the view in which the (**target**) variables explained by all other views are stored. 
MISTy is flexible to any other view that is appended, provided it also contains a spatial neighbors graph.


Let's use `genericMistyData` to construct a MuData object with the intra view and the cell type proportions as the first view.
Then it additionally build a 'juxta' view for the spots that are neighbors of each other, and a 'para' view for all surrounding spots within a certain radius, or bandwidth.

Here, we use the cell type proportions as the intra view.

In [None]:
misty = genericMistyData(intra=comps, cutoff=0.05, bandwidth=200, n_neighs=6)

In [None]:
misty

## Learn Relationships with MISTy


In [None]:

misty(model=LinearModel, k_cv=10, seed=1337, verbose = True)

#misty(model=RandomForestModel, n_jobs=-1, verbose = True) # you can also use RandomForestModel but it takes longer time

Specifically, we will use the `LinearModel` to fit an linear model for each target in the intra view, using the juxta and para views as predictors. It is a bit more simplistic but much faster and more interpretable.
You can also use `RandomForestModel` but it takes longer time

MISTy returns two DataFrames:
* `target_metrics` - the metrics that describe the target variables from the intra view, including R-squared across different views as well as the estimated contributions to the predictive performance of each view per target.
* `interactions` - feature importances per view

if `inplace` is true (Default), these are appended to the MuData object.


check the variance explained when predicting each target variables in the intra view, with other variables (predictors) in the intra view itself


In [None]:
misty.uns['target_metrics']

R² (coefficient of determination) measures how well the model explains the variability of the target variable.

intra_R2：R² from the intra-view model

multi_R2：R² from the multi-view (full) model

gain_R2：multi_R2 - intra_R2, or in other words the performance gain when we additionally consider the other views (in addition to intra). 

intra, juxta, para: Relative importance (weight) of the intra-view, juxta-view, and para-view.

In [None]:
li.pl.target_metrics(misty, stat='intra_R2', return_fig=True)

When we look at the variance explained by the other views in `gain_R2`, we see that they explain less (as expected). There is only a slight gain in Adipocytes.

In [None]:
li.pl.target_metrics(misty, stat='gain_R2', return_fig=True)

We can also check the contribution to the predictive performance of each view per target:

In [None]:
li.pl.contributions(misty, return_fig=True)

Using the information above we know which cell types are best explained by our model, and we know which view explains them best.

### Add pathway activities as extra view

We can also use estimated pathway activities as extra views to make the data a bit more interpretable.
We will use [decoupler-py](https://academic.oup.com/bioinformaticsadvances/article/2/1/vbac016/6544613) with pathways genesets from [PROGENy](https://www.nature.com/articles/s41467-017-02391-6). See [this tutorial](https://decoupler-py.readthedocs.io/en/latest/notebooks/spatial.html) for details.

In [None]:
# obtain genesets
progeny = dc.op.progeny(organism='human', top=100)

In [None]:
# use univariate linear model to estimate activity
dc.mt.ulm(
    adata,
    net=progeny,
    tmin=5,
    verbose=True,
    bsize = 256,
    raw=False
)

In [None]:
# extract progeny activities as an AnnData object
acts_progeny = li.ut.obsm_to_adata(adata, 'score_ulm')

In [None]:
# Check how the pathway activities look like
sc.pl.spatial(acts_progeny, color=['Hypoxia', 'JAK-STAT'], cmap='RdBu_r', size=1.3)

In this case, we will use cell type compositions per spot as the intra view, and we will use the PROGENy pathway activities to define the juxta and para views:

In [None]:
misty = genericMistyData(intra=comps, extra=acts_progeny, cutoff=0.05, bandwidth=200, n_neighs=6)

In [None]:
misty

#### Run MISTy with pathway views

 we will **bypass** predicting the intraview with features within the intraview features (`bypass_intra`).
This will allow us to see how well the other views explain the intraview, excluding the intraview itself.

In [None]:
misty(model=LinearModel, k_cv=10, seed=1337, bypass_intra=True, verbose = True)

Let's check the joined R-squared for views:

In [None]:
li.pl.target_metrics(misty, stat='gain_R2', return_fig=True)

Using the information above we know which variables are best explained by our model, and we know which view explains them best. 
So, we can now also see what are the specific variables that explain each target best:

In [None]:
# this information is stored here:
misty.uns['interactions'].head()

and their contributions per target:

In [None]:
li.pl.contributions(misty, return_fig=True)

Since this is a linear model, the coefficients would not be directly comparable (as are importances in a Random Forest). Thus, we use the coefficients' t-values, as calculated by Ordinary Least Squares, which are signed and directly comparable.

Let's explore the t-values for each target-prediction interaction:

In [None]:
(
    li.pl.interactions(misty, view='juxta', return_fig=True, figure_size=(7,5)) + 
    p9.scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
)

<div class="alert alert-info">
    
<h4> Feature importances </h4>

By default, we use a random forest, so the feature importances are the mean decrease in Gini impurity of the features. On the other hand, when we use a linear model, the feature importances are the t-values of the model coefficients.


</div>  

## Build Custom Misty Views

As we previously mentioned, one can build any view structure that they deem relevant for their data.
So, let's explore how to build custom views.
Here, we will just use two distinct prior knowledge sources to check which one achieves better predictive performance.

So, let's also estimate Transcription Factor activities with decoupler:

In [None]:
# get TF prior knowledge
net = dc.op.collectri(organism='human', remove_complexes=True, license='academic', verbose=False)

In [None]:
# Estimate activities
dc.mt.ulm(
    data=adata,
    net=net,
    bsize=128,  
    tmin = 50,
    verbose=True,
    raw=False
)

In [None]:
# extract activities
acts_tfs = li.ut.obsm_to_adata(adata, 'score_ulm')

In [None]:
# or load acts_tf
#acts_tfs = sc.read('acts_tfs.h5ad')

In [None]:
# Calculate spatial neighbors
li.ut.spatial_neighbors(acts_tfs, cutoff=0.1, bandwidth=200, set_diag=True) # you can also set set_diag=False

Visualize the weights for a specific spot:

In [None]:
li.pl.connectivity(acts_tfs, idx=0, figure_size=(6,5))

In [None]:
# transfer spatial information to progeny activities
# NOTE: spatial connectivities can differ between views, but in this case we will use the same
acts_progeny.obsm['spatial'] = acts_tfs.obsm['spatial']
acts_progeny.obsp['spatial_connectivities'] = acts_tfs.obsp['spatial_connectivities']

Build an object with custom views:

In [None]:
misty = MistyData(data={"intra": comps, "TFs": acts_tfs, "Pathways": acts_progeny})

In [None]:
misty

Run Misty as before:

In [None]:
misty(model=LinearModel, k_cv=5, seed=1337, bypass_intra=True, verbose = True)

We can see that Cardiomyocytes and Fibroblasts are relatively well explained by TFs & Pathways.

In [None]:
li.pl.target_metrics(misty, stat='gain_R2')

We also see that both views explain the targets (cell types) to some extent.

In [None]:
li.pl.contributions(misty, return_fig=True)

Plot cell type x Trascription factor interactions

In [None]:
(
    li.pl.interactions(misty, view='TFs', top_n=20) + 
    p9.labs(x='Transcription Factor', y='Cell type') +
    p9.theme_bw(base_size=14) +
    p9.theme(axis_text_x=p9.element_text(rotation=90, size=13)) +
    # change to blue-red
    p9.scale_fill_gradient2(low='blue', mid='white', high='red')+
    p9.theme(figure_size=(8, 5)) 
)


# Practice

Use the use `kuppe_heart19.h5ad` dataset to explore the spatial relationships between cell types and ligand–receptor (LR) interactions, and investigate how different spatial connectivity patterns affect the results.
