# Part 4: Cell Typing and Spatial Community Analysis

**Tutor:** Anthony Christidis
**Time:** 40 minutes

---

Now that we've seen how to explore and annotate our data, let's perform a complete downstream analysis. Our goal is to take the raw gene counts from a high-resolution Xenium experiment and turn them into meaningful biological insights: cell type clusters and their spatial organization.

**Goals:**
1. Perform a standard unsupervised clustering workflow using `scanpy`.
2. Visualize the identified cell clusters in both UMAP space and physical space.
3. Use `squidpy` to analyze the spatial organization of these clusters (e.g., neighborhood enrichment).

In [None]:
%load_ext jupyter_black

import spatialdata as sd
import spatialdata_plot as sdp
import scanpy as sc
import squidpy as sq
import matplotlib.pyplot as plt

data_path = "../data/"
sdata_xenium = sd.read_zarr(data_path + "xenium_lung_cancer_subset.zarr")

### Step 1: Unsupervised Clustering with `scanpy`

We'll start with the `AnnData` table from our Xenium `SpatialData` object. We will apply a standard `scanpy` workflow to cluster the cells based on their gene expression profiles. This is the same workflow you would use for non-spatial single-cell RNA-seq data.

In [None]:
adata = sdata_xenium.tables["table"]

# 1. QC and Filtering
# In a real analysis, we would do more extensive QC. For the workshop, we'll use pre-filtered data.
# But let's calculate some metrics for inspection.
sc.pp.calculate_qc_metrics(adata, percent_top=(20, 50), inplace=True)

# 2. Normalization and Log-transformation
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)

# 3. Feature selection (Optional for workshop speed)
sc.pp.highly_variable_genes(adata, flavor="seurat", n_top_genes=2000)

# 4. Dimensionality Reduction
sc.pp.pca(adata)

# 5. Clustering
sc.pp.neighbors(adata)
sc.tl.leiden(adata, resolution=0.5)
sc.tl.umap(adata)

### Step 2: Visualizing the Clusters

Now that we have our `leiden` clusters, let's visualize them. First, we'll look at them in the abstract UMAP space.

In [None]:
sc.pl.umap(adata, color="leiden", legend_loc="on data")

This is useful, but the real power comes from projecting these clusters back into physical space. We can use `spatialdata-plot` to color the cell shapes by our new `leiden` annotation.

In [None]:
# The leiden clusters are now in adata.obs, which is sdata_xenium.tables["table"].obs
# We can directly use this column to color the cell shapes.

sdata_xenium.pl.render_shapes(
    element="cell_boundaries",
    color="leiden",
    fill_alpha=0.6,
).pl.show(figsize=(8, 8))

### Step 3: Analyzing Spatial Organization with `squidpy`

We can clearly see that the clusters are not randomly distributed. Some clusters form distinct neighborhoods. `squidpy` provides tools to quantify these relationships.

#### Neighborhood Enrichment
A key question is: **"Are certain cell types more likely to be neighbors than others?"** We can answer this by calculating the neighborhood enrichment score.

In [None]:
# First, we need to build a spatial graph, just like we did for the Visium data
# Here, we connect cells based on their physical coordinates.
sq.gr.spatial_neighbors(adata, coord_type="generic", delaunay=True)

# Calculate the neighborhood enrichment score
sq.gr.nhood_enrichment(adata, cluster_key="leiden")

Let's visualize the result as a heatmap. A high positive score means two clusters are found together more often than expected by chance (they are 'enriched' neighbors). A negative score means they 'avoid' each other.

In [None]:
sq.pl.nhood_enrichment(adata, cluster_key="leiden", method="ward")

#### Co-occurrence

Another way to look at spatial organization is co-occurrence, which asks how often we see one cluster within a certain radius of another.

Let's see how often cluster `3` co-occurs with other clusters.

In [None]:
sq.gr.co_occurrence(adata, cluster_key="leiden")

sq.pl.co_occurrence(adata, cluster_key="leiden", clusters="3")

We've now gone from raw counts to clustered cell types and have started to quantify their spatial organization. 

In the final notebook, we will tackle one of the most powerful applications of `SpatialData`: integrating data from two completely different technologies.