# Part 4: Spatial Statistics with `squidpy`

**Tutor:** Anthony Christidis
**Time:** 45 minutes

---

In the previous notebook, we identified several distinct cell clusters. Now, we want to move beyond simple visualization and ask quantitative questions about their spatial organization. 

For this, we will use `squidpy`, a powerful library for spatial analysis that is part of the `scverse` ecosystem.

**Goals:**
1.  Identify genes that are spatially organized using **Moran's I**.
2.  Analyze cell-cell interactions by calculating **neighborhood enrichment**.
3.  Quantify how often different cell types appear near each other using **co-occurrence** analysis.

### Setup

We will start by loading the **fully processed and clustered `AnnData` object** that we saved at the end of the previous notebook. This allows us to jump straight into the spatial analysis.

In [None]:
%load_ext jupyter_black

import spatialdata as sd
import anndata as ad
import scanpy as sc
import squidpy as sq
import spatialdata_plot as sdp

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

# Load the original SpatialData object to get access to the images
sdata = sd.read_zarr("../data/xenium_lung_cancer_subset.zarr")

# Load the processed AnnData object we saved from the last notebook
# Note: In a real workflow, you would save this with adata.write('processed_adata.h5ad')
# For this workshop, we will just re-run the processing steps quickly.
adata = sdata.tables["table"].copy()
sc.pp.filter_cells(adata, min_counts=50)
sc.pp.filter_cells(adata, min_genes=20)
adata = adata[adata.obs.pct_counts_mt < 25, :].copy()
sc.pp.filter_genes(adata, min_cells=10)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, resolution=0.5, key_added="clusters")

### Building the Spatial Graph

All of `squidpy`'s tools rely on a **spatial neighborhood graph**. This graph connects cells that are physically close to each other in the tissue. We build it once from the cell coordinates, and then we can use it for many different analyses.

In [None]:
# First, get the spatial coordinates from the original sdata object for the cells in our adata
adata.obsm['spatial'] = sdata.shapes['cell_circles'][adata.obs_names].centroid.apply(lambda p: (p.x, p.y)).to_numpy()

# Now, build the graph
sq.gr.spatial_neighbors(adata, coord_type="generic", delaunay=True)

# We can visualize the graph on top of the cells
sq.pl.spatial_scatter(
    adata, 
    color="clusters", 
    library_id="spatial",
    shape=None,
    connectivity_key="spatial_connectivities", # tells squidpy to plot the graph
    size=10,
    figsize=(8,8)
)

### Analysis 1: Spatially Variable Genes (Moran's I)

Our first biological question is: **Which genes' expression levels are not random, but are clustered in specific regions?**

We can answer this by calculating Moran's I, a score for spatial autocorrelation. A high score indicates a gene is spatially clustered.

In [None]:
# Calculate Moran's I for all genes using the graph we just built
# This can be slow, so we'll run it on a subset of highly variable genes
sc.pp.highly_variable_genes(adata, n_top_genes=1000, flavor='seurat_v3')
sq.gr.spatial_autocorr(adata, mode="moran", genes=adata.var.highly_variable, n_perms=100, n_jobs=4)

# Let's see the top spatially variable genes
top_genes_df = adata.uns["moranI"].sort_values(by="I", ascending=False)
top_genes_df.head(5)

Now, let's create a high-quality static plot to visualize the expression of the top-scoring gene directly on the H&E image using `spatialdata-plot`.

In [None]:
top_gene = top_genes_df.index[0]

# We need to update our sdata object with the processed adata
sdata_processed = sd.SpatialData(images=sdata.images, shapes=sdata.shapes, tables={'table': adata})

sdp.plot(sdata_processed).render_images().render_shapes(
    element="cell_boundaries",
    color=top_gene,
    fill_alpha=0.7
).show(title=f"Expression of {top_gene} (Spatially Variable)", figsize=(8,8))

### Analysis 2: Neighborhood Enrichment

Let's move from genes to cell types. A powerful question to ask is: **Which cell types are enriched or depleted in each other's neighborhoods?** `squidpy` can calculate an enrichment score to see which cluster adjacencies occur more or less often than expected by chance.

In [None]:
sq.gr.nhood_enrichment(adata, cluster_key="clusters")

The result is a z-score indicating the strength of the enrichment. We can visualize this as a heatmap, where bright yellow indicates strong co-localization (they are neighbors) and dark purple indicates avoidance.

In [None]:
sq.pl.nhood_enrichment(adata, cluster_key="clusters", method="ward", cmap="viridis")

### Analysis 3: Co-occurrence

Finally, we can ask a similar question in a different way. The co-occurrence score calculates the probability of finding two clusters near each other across increasing physical distances.

Let's investigate the relationship between two clusters that showed strong enrichment in the plot above (e.g., find a bright yellow square and pick those two clusters).

In [None]:
# This can be slow, so let's run it on a small subsample of the data
adata_subsample = sc.pp.subsample(adata, fraction=0.2, copy=True)
# We need to re-compute the graph for the subsample
sq.gr.spatial_neighbors(adata_subsample)

# Calculate co-occurrence across different distances
sq.gr.co_occurrence(adata_subsample, cluster_key="clusters")

# Plot the result for two example clusters (e.g., 0 and 1)
sq.pl.co_occurrence(
    adata_subsample,
    cluster_key="clusters",
    clusters="0",
    figsize=(7, 5)
)

This plot shows the probability of observing other clusters near our cluster of interest (`0`). A score above 1 (the dashed line) means the interaction happens more than random chance.

### Workshop Conclusion

Congratulations! You have now completed a full spatial analysis workflow, from loading data and identifying cell types to quantifying their complex spatial relationships. You are now equipped with the core `scverse` tools to begin exploring your own spatial datasets.