# Notebook 4: Spatial Statistics with Squidpy

**Tutor:** Anthony Christidis  
**Time:** 45 minutes

Welcome to spatial statistics analysis! In this notebook, we'll use squidpy to ask sophisticated questions about spatial organization in Visium data.

Spatial statistics go beyond simple visualization - they quantify patterns, test hypotheses, and reveal biological insights about tissue organization.

## Goals:
- Identify spatially variable genes using Moran's I spatial autocorrelation
- Analyze neighborhood enrichment to understand tissue organization
- Perform spatial co-occurrence analysis

In [None]:
import spatialdata as sd
import spatialdata_plot as sdp
import scanpy as sc
import squidpy as sq
import matplotlib.pyplot as plt
from pathlib import Path

# For cleaner output
import warnings
warnings.filterwarnings("ignore")

# Define the path to our data directory
# Note: This path is relative to the repository's root directory
_DATA_DIR_PATH = Path("../data/")
_VISIUM_PATH = _DATA_DIR_PATH / "visium_glioblastoma_subset.zarr"
_XENIUM_PATH = _DATA_DIR_PATH / "xenium_lung_cancer_subset.zarr"

# Print versions for reproducibility
for p in [sd, sdp, sc, sq]:
    print(f"{p.__name__}: {p.__version__}")

## Loading Visium Glioblastoma Data

In [None]:
sdata_visium = sd.read_zarr("../data/visium_glioblastoma_subset.zarr")

# adata_visium = sdata_visium.tables["table"].copy()

print("Loaded {sdata_visium.tables['table'].n_obs} spots with {sdata_visium.tables['table'].n_vars} genes")

## Part 1: Data Preparation and Basic Processing

Before diving into spatial statistics, we need to preprocess our data and run basic clustering to identify tissue regions.

In [None]:
# Calculate QC metrics
sc.pp.calculate_qc_metrics(sdata_visium.tables["table"], percent_top=(20, 50), inplace=True)

# Filter low-quality spots and rare genes
print("Starting with: {sdata_visium.tables['table'].n_obs} spots")
sc.pp.filter_cells(sdata_visium.tables["table"], min_counts=500)
sc.pp.filter_genes(sdata_visium.tables["table"], min_cells=10)
print("After filtering: {sdata_visium.tables['table'].n_obs} spots")

# Standard preprocessing pipeline
sc.pp.normalize_total(sdata_visium.tables["table"], inplace=True)
sc.pp.log1p(sdata_visium.tables["table"])
sc.pp.highly_variable_genes(sdata_visium.tables["table"])
sc.pp.pca(sdata_visium.tables["table"], use_highly_variable=True)
sc.pp.neighbors(sdata_visium.tables["table"])
sc.tl.leiden(sdata_visium.tables["table"], key_added="leiden_clusters")
sc.tl.umap(sdata_visium.tables["table"])

n_clusters = len(sdata_visium.tables["table"].obs['leiden_clusters'].unique())
print("Identified {n_clusters} tissue regions/clusters")

In [None]:
(
    sdata_visium
    .pl.render_shapes(color="leiden_clusters", shape="visium_hex")
    .pl.show("downscaled_hires", title="Leiden clusters")
)

## Part 2: Spatial Neighbors Graph

The foundation of spatial analysis is building a graph that connects spatially adjacent spots. This graph structure allows us to quantify spatial relationships and define "neighborhoods" in tissue space. Each spot is connected to its immediate spatial neighbors, creating a network that preserves the tissue's geometric structure.

In [None]:
# Build spatial neighborhood graph
# This connects each spot to its spatial neighbors
sq.gr.spatial_neighbors(sdata_visium.tables["table"])

print("Built spatial graph with {sdata_visium.tables['table'].obsp['spatial_connectivities'].nnz} connections")
print("Average neighbors per spot: {sdata_visium.tables['table'].obsp['spatial_connectivities'].nnz / sdata_visium.tables['table'].n_obs:.1f}")

## Part 3: Spatially Variable Genes - Moran's I Analysis

**Biological Question:** Which genes show non-random spatial expression patterns?

**Moran's I** is a spatial autocorrelation statistic that measures how similar neighboring spots are in their gene expression. It ranges from -1 to +1:

- **High positive values**: Genes with spatially coherent expression (neighboring spots have similar expression)
- **Values near 0**: Random spatial distribution
- **Negative values**: Spatially anti-correlated expression (checkerboard-like patterns)

Genes with high Moran's I often mark anatomical structures, functional domains, or pathological regions.

In [None]:
# Calculate Moran's I for spatially variable gene detection
# We'll test highly variable genes for computational efficiency
hvg_genes = sdata_visium.tables["table"].var_names[sdata_visium.tables["table"].var['highly_variable']]

sq.gr.spatial_autocorr(
    sdata_visium.tables["table"],
    mode="moran",
    genes=hvg_genes,
    n_perms=100,  # Number of permutations for statistical testing
    n_jobs=4      # Parallel processing
)

# Display the top spatially variable genes
moran_results = sdata_visium.tables["table"].uns["moranI"].sort_values(by="I", ascending=False)
print("\nTop 10 spatially variable genes:")
print(moran_results.head(10)[['I', 'pval_sim']].round(4))

print("\nBottom 10 spatially variable genes (most random):")
print(moran_results.tail(10)[['I', 'pval_sim']].round(4))

In [None]:
# Visualize top spatially variable genes
top_genes = moran_results.head(3).index.tolist()
bottom_genes = moran_results.tail(3).index.tolist()
genes_to_plot = top_genes + bottom_genes

fig, axs = plt.subplots(2, 3, figsize=(18, 12))
axs = axs.flatten()

for idx, gene in enumerate(genes_to_plot):
    moran_score = moran_results.loc[gene, 'I']
    p_value = moran_results.loc[gene, 'pval_sim']
    
    # Get gene expression values
    # gene_expr = adata_visium[:, gene].X.toarray().flatten() if hasattr(adata_visium.X, 'toarray') else adata_visium[:, gene].X.flatten()

    (
        sdata_visium
        .pl.render_shapes(
            color=gene,
            shape="visium_hex",
        )
        .pl.show(
            "downscaled_hires",
            title=f'{gene}\nMoran\'s I = {moran_score:.3f} (p = {p_value:.3f})',
            ax=axs[idx],
            colorbar=False,
        )
    )

fig.suptitle('Spatially Variable Genes\nTop row: High spatial coherence, Bottom row: Random patterns', fontsize=16)
fig.tight_layout()
fig.show()

print("High Moran's I (top row) = genes with spatially coherent expression")
print("Low Moran's I (bottom row) = genes with spatially random expression")

## Part 4: Neighborhood Enrichment Analysis

**Biological Question:** Which tissue regions tend to be spatially adjacent to each other?

**Neighborhood enrichment** reveals the "social network" of tissue regions by comparing observed spatial adjacencies to what we'd expect by random chance. This analysis uses permutation testing to calculate Z-scores that indicate:

- **Positive enrichment**: Regions are neighbors more often than expected (spatial attraction)
- **Negative enrichment**: Regions avoid each other spatially (spatial segregation)
- **No enrichment**: Random spatial association

This can reveal important biological patterns like tumor-stroma boundaries or immune cell exclusion zones.

In [None]:
# Calculate neighborhood enrichment between tissue regions
sq.gr.nhood_enrichment(sdata_visium.tables["table"], cluster_key="leiden_clusters")

# Visualize the enrichment matrix
fig, ax = plt.subplots(figsize=(10, 8))
sq.pl.nhood_enrichment(
    sdata_visium.tables["table"], 
    cluster_key="leiden_clusters",
    method="ward",  # Hierarchical clustering to group similar patterns
    cmap="RdBu_r",  # Red-blue colormap (red=enriched, blue=depleted)
    ax=ax
)
plt.title('Neighborhood Enrichment Between Tissue Regions')
plt.show()

print("\nInterpretation:")
print("Red (positive Z-score): Regions are neighbors more often than expected by chance")
print("Blue (negative Z-score): Regions avoid each other spatially") 
print("White (Z-score â‰ˆ 0): Random spatial association")

## Part 5: Co-occurrence Analysis

**Biological Question:** How does the spatial association between regions change with distance?

**Co-occurrence analysis** extends beyond immediate neighbors to examine spatial relationships across multiple distance scales. It calculates the conditional probability of finding specific tissue regions together at increasing distances, revealing:

- **Short-range interactions**: Direct cell-cell contacts
- **Medium-range patterns**: Tissue architecture and zoning
- **Long-range organization**: Organ-level structure

This helps distinguish between direct cellular interactions and broader architectural organization.

In [None]:
# Calculate co-occurrence across spatial dimensions
sq.gr.co_occurrence(sdata_visium.tables["table"], cluster_key="leiden_clusters")

# Visualize co-occurrence for the most abundant cluster
cluster_counts = sdata_visium.tables["table"].obs['leiden_clusters'].value_counts()
most_abundant_cluster = cluster_counts.index[0]

# Don't pass ax parameter - let squidpy handle the plotting
plt.figure(figsize=(12, 6))
sq.pl.co_occurrence(
    sdata_visium.tables["table"],
    cluster_key="leiden_clusters", 
    clusters=most_abundant_cluster,
    figsize=(12, 6)  # Use figsize parameter instead of ax
)
plt.suptitle(f'Co-occurrence Analysis: Region {most_abundant_cluster}')
plt.show()

print(f"Analyzed co-occurrence for region {most_abundant_cluster} ({cluster_counts[most_abundant_cluster]} spots)")
print("Co-occurrence score = conditional probability of finding regions together at different distances")

## Part 6: Interactive Spatial Analysis (Instructor Demo)

**Note:** This section demonstrates interactive analysis using napari. For Docker users, this requires graphics setup, so follow along on the instructor's screen.

**Instructor will demonstrate:**
1. Loading spatial data in napari
2. Overlaying gene expression on tissue images
3. Interactive exploration of spatially variable genes
4. Manual annotation of regions of interest

In [None]:
# Interactive napari demonstration (instructor will run this live)
# Uncomment for live interactive session:
# import napari_spatialdata as nsd
# viewer = nsd.Interactive(sdata_visium)
# print("Napari viewer launched - follow along on instructor's screen")

## Part 7: Workshop Summary and Key Takeaways

### ðŸ”¬ Spatial Statistics Methods Learned:
- **Spatial Neighbors Graph**: Foundation for all spatial analyses
- **Moran's I**: Identifies genes with spatial expression patterns
- **Neighborhood Enrichment**: Reveals tissue region adjacency preferences
- **Co-occurrence Analysis**: Quantifies spatial associations across distances

### ðŸ§¬ Biological Insights:
- Spatial gene expression patterns reveal tissue architecture
- Tissue regions show non-random spatial organization
- Quantitative spatial analysis complements visual inspection
- Molecular mechanisms can be inferred from spatial patterns

### ðŸš€ Next Steps for Your Research:
- Apply these methods to your own spatial datasets
- Compare spatial organization between conditions (healthy vs. disease)
- Integrate with single-cell RNA-seq for deeper cellular insights
- Explore advanced methods like spatial domain detection

### ðŸ“š Resources:
- [Squidpy documentation](https://squidpy.readthedocs.io/)
- [SpatialData ecosystem](https://spatialdata.scverse.org/)
- [scverse community](https://discourse.scverse.org/)