# Notebook 3: The `scanpy` Workflow & Spatial QC with Visium

**Tutor:** Anthony Christidis
**Time:** 45 minutes

---

Welcome to the computational analysis part of the workshop! Before we dive into our high-resolution Xenium data, let's learn the fundamental analysis workflow on a classic spatial technology: **10x Visium**.

In this notebook, we'll use `scanpy` for clustering and `squidpy` to demonstrate a best-practice Quality Control (QC) workflow that leverages spatial information. We will then apply a streamlined version of this workflow to our Xenium data to prepare it for the next notebook.

**Goals:**
1.  Perform a standard unsupervised clustering workflow on Visium data (`scanpy`).
2.  Visualize QC metrics (like total counts) in their physical space (`squidpy`).
3.  Visualize the final spot clusters on the tissue image.

### Setup and Data Loading

First, let's import our libraries and load the Visium Glioblastoma dataset.

In [None]:
%load_ext jupyter_black

import spatialdata as sd
import scanpy as sc
import squidpy as sq

import warnings
warnings.filterwarnings("ignore")

sdata_visium = sd.read_zarr("../data/visium_glioblastoma_subset.zarr")
adata_visium = sdata_visium.tables["table"]

### Part 1: Visium Analysis - Spatial Quality Control

For spot-based data, visualizing QC metrics spatially is a critical first step. It can reveal technical issues like tissue detachment or slide artifacts.

First, we calculate the standard QC metrics using `scanpy`.

In [None]:
sc.pp.calculate_qc_metrics(adata_visium, percent_top=(20, 50), inplace=True)

Now, we can use `squidpy`'s `spatial_scatter` function to plot these metrics. This function is perfectly optimized for Visium data, as it can automatically find the background image and scale factors from the `AnnData` object's metadata.

In [None]:
sq.pl.spatial_scatter(
    adata_visium,
    color=["total_counts", "n_genes_by_counts"],
    cmap="viridis",
    size=0.8, # This scales the spot size relative to the image
    ncols=2,
    figsize=(12,5)
)

These plots are essential. On the left, we can see areas with low `total_counts` which might correspond to regions with poor tissue quality. On the right, we see the gene complexity across the tissue.

### Part 2: Visium Analysis - `scanpy` Clustering

Based on our QC, let's filter the data and run the standard `scanpy` workflow to find transcriptionally distinct groups of spots, which should correspond to different tissue regions.

In [None]:
print(f"Spots before filtering: {adata_visium.n_obs}")
sc.pp.filter_cells(adata_visium, min_counts=500)
sc.pp.filter_genes(adata_visium, min_cells=10)
print(f"Spots after filtering: {adata_visium.n_obs}")

In [None]:
# Normalization and log-transformation
sc.pp.normalize_total(adata_visium, inplace=True)
sc.pp.log1p(adata_visium)

In [None]:
# Find highly variable genes
sc.pp.highly_variable_genes(adata_visium)

In [None]:
# Run PCA on the highly variable genes
sc.pp.pca(adata_visium, use_highly_variable=True)

In [None]:
# Build neighborhood graph and run Leiden clustering
sc.pp.neighbors(adata_visium)
sc.tl.leiden(adata_visium, key_added="clusters")

In [None]:
# Compute UMAP for visualization in abstract space
sc.tl.umap(adata_visium)

### Part 3: Visualizing the Visium Results

Let's visualize the clusters we found, both in UMAP space and back on the tissue.

In [None]:
sc.pl.umap(adata_visium, color="clusters", title="Spot Clusters (UMAP)")

In [None]:
# Use squidpy again to plot the clusters spatially. This works perfectly for Visium.
sq.pl.spatial_scatter(
    adata_visium,
    color="clusters",
    size=0.8, # This makes the spots slightly smaller than their true size to see borders
    title="Spot Clusters (Spatial View)",
    frameon=False
)

Excellent! We have successfully run a full analysis on Visium data and can see that the clusters `scanpy` found correspond to distinct histological regions.

### Part 4: Preparing the Xenium Data for the Next Notebook

Now that we've mastered the workflow, we will apply the same steps to our Xenium data in a single block. We will then save the processed object so we can dive straight into advanced spatial statistics in the next notebook.

In [None]:
print("--- Processing Xenium Data ---")

# 1. Load Data
sdata_xenium = sd.read_zarr("../data/xenium_lung_cancer_subset.zarr")
adata_xenium = sdata_xenium.tables["table"].copy()

# 2. Filter
sc.pp.filter_genes(adata_xenium, min_cells=10)

# 3. Normalize and Log-transform
sc.pp.normalize_total(adata_xenium)
sc.pp.log1p(adata_xenium)

# 4. HVGs, PCA, Neighbors, Leiden, UMAP
sc.pp.highly_variable_genes(adata_xenium, n_top_genes=2000, flavor='seurat')
sc.pp.pca(adata_xenium, use_highly_variable=True)
sc.pp.neighbors(adata_xenium)
sc.tl.leiden(adata_xenium, key_added="clusters")
sc.tl.umap(adata_xenium)

print(f"Xenium data processed. Found {len(adata_xenium.obs['clusters'].unique())} clusters.")

In [None]:
# 5. Save the processed AnnData object
import os
os.makedirs("../data/processed", exist_ok=True)

print("Saving processed Xenium AnnData object...")
adata_xenium.write("../data/processed/adata_xenium_processed.h5ad")
print("Done. We are ready for Notebook 4.")