# Notebook 4: Xenium Analysis and Spatial Statistics

**Tutor:** Anthony Christidis
**Time:** 45 minutes

---

In this notebook, we will apply our analysis skills to the high-resolution **Xenium data**. We will perform a full analysis from QC to clustering, and then use `squidpy` to ask advanced biological questions about the spatial organization of genes and cell types.

**Goals:**
1.  Perform QC and clustering on the Xenium dataset.
2.  Identify spatially organized genes using **Moran's I**.
3.  Analyze cell community structures using **neighborhood enrichment**.

### Part 1: Data Preparation and QC

We will start by loading the raw Xenium `SpatialData` object and then creating a boolean mask to identify high-quality cells for our analysis. This is the most robust way to ensure all our data components stay perfectly aligned.

In [None]:
%load_ext jupyter_black

import spatialdata as sd
import scanpy as sc
import squidpy as sq
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from matplotlib.lines import Line2D

import warnings
warnings.filterwarnings("ignore")

print("--- Loading and Preparing Xenium Data ---")
# 1. Load Original Data
sdata_xenium = sd.read_zarr("../data/xenium_lung_cancer_subset.zarr")
adata_full = sdata_xenium.tables["table"].copy()
shapes_full = sdata_xenium.shapes['cell_circles'].copy()

# 2. Calculate QC Metrics on the full dataset
sc.pp.calculate_qc_metrics(adata_full, percent_top=(20, 50), inplace=True)

# 3. Create a boolean mask based on QC thresholds
# This mask has the same length as the original number of cells.
cells_passing_qc_mask = (
    (adata_full.obs.n_genes_by_counts >= 20) &
    (adata_full.obs.total_counts >= 50)
)
print(f"Identified {cells_passing_qc_mask.sum()} high-quality cells out of {adata_full.n_obs}.")

### Part 2: Analysis of Filtered Data

Now we apply our boolean mask to create a new, clean `AnnData` object and run the standard `scanpy` workflow.

In [None]:
# 4. Filter the AnnData object using the boolean mask
adata_xenium = adata_full[cells_passing_qc_mask].copy()

# 5. Run the clustering workflow
sc.pp.normalize_total(adata_xenium)
sc.pp.log1p(adata_xenium)
sc.pp.highly_variable_genes(adata_xenium, n_top_genes=2000, flavor='seurat')
sc.pp.pca(adata_xenium, use_highly_variable=True)
sc.pp.neighbors(adata_xenium)
sc.tl.leiden(adata_xenium, key_added="clusters")
sc.tl.umap(adata_xenium)

print("Processing and clustering complete.")

#### Visualization of Clusters

First, we visualize the clusters in UMAP space.

In [None]:
sc.pl.umap(adata_xenium, color="clusters", title="Xenium Cell Clusters (UMAP)")

Next, we add the spatial coordinates to our processed `AnnData` object. We use our boolean mask to select the coordinates for only the cells that passed QC.

In [None]:
# Use the boolean mask to filter the original shapes by position.
shapes_filtered = shapes_full[cells_passing_qc_mask.to_numpy()]

# Get the coordinates from the filtered shapes and add them to our AnnData object.
adata_xenium.obsm['spatial'] = shapes_filtered.centroid.apply(lambda p: [p.x, p.y]).to_numpy()

print("Spatial coordinates added to the processed AnnData object.")

Now we can visualize our clusters in physical space using the robust `matplotlib` method.

In [None]:
# Create a robust spatial scatter plot
coords = adata_xenium.obsm['spatial']
x_coords = coords[:, 0]
y_coords = coords[:, 1]
cluster_codes = adata_xenium.obs['clusters'].astype('category').cat.codes

fig, ax = plt.subplots(1, 1, figsize=(10, 8))
scatter = ax.scatter(x_coords, y_coords, c=cluster_codes, cmap='tab20', s=5, alpha=0.8)
ax.set_title('Xenium Cell Clusters (Spatial View)')
ax.set_aspect('equal')

# Add a legend
unique_clusters = sorted(adata_xenium.obs['clusters'].unique())
colors = plt.cm.tab20(np.linspace(0, 1, len(unique_clusters)))
legend_elements = [Line2D([0], [0], marker='o', color='w', markerfacecolor=colors[i], 
                         markersize=8, label=f'Cluster {cluster}') 
                  for i, cluster in enumerate(unique_clusters)]
ax.legend(handles=legend_elements, bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

### Part 3: Spatial Statistics with `squidpy`

With our processed, clustered, and spatially-aware `AnnData` object, we can now run the advanced `squidpy` analyses.

In [None]:
# Build the spatial neighborhood graph. This will now work.
print("Building spatial graph...")
sq.gr.spatial_neighbors(adata_xenium, coord_type="generic", delaunay=True)
print("Spatial graph built successfully.")

#### Analysis 1: Neighborhood Enrichment
**Biological Question:** Which cell types are found together more often than expected by chance?

In [None]:
sq.gr.nhood_enrichment(adata_xenium, cluster_key="clusters")
sq.pl.nhood_enrichment(adata_xenium, cluster_key="clusters", method="ward", cmap="RdBu_r")