# Class 1 - Notebook 3 Part 2: Spatial Transcriptomics

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duttaprat/BMI_503/blob/main/Class1_Genomics/notebook3_part2_spatial.ipynb)

**Course**: BMI 503  
**Instructors**: Prof. Ramana Davuluri & Prof. Fusheng Wang  
**Institution**: Stony Brook University

## Learning Objectives
1. Understand spatial transcriptomics technology
2. Load and explore Visium data
3. Visualize gene expression spatially
4. Identify spatial domains

## What is Spatial Transcriptomics?

### The Problem with Traditional RNA-seq

```
Tissue ‚Üí Grind up ‚Üí Extract RNA ‚Üí Sequence
           ‚ùå LOSE SPATIAL INFO!
```

We don't know:
- WHERE genes were expressed
- HOW cells interact spatially
- WHAT spatial patterns exist

### The Solution

```
Tissue ‚Üí Place on Array ‚Üí Image + Sequence
              ‚Üì
    PRESERVE LOCATION!
```

## 10x Genomics Visium

**How it works:**
1. Tissue section on special slide
2. ~5,000 spots with unique barcodes
3. Each spot = 55 Œºm diameter (~1-10 cells)
4. mRNA binds to spots
5. Sequence with spatial info!
6. Take H&E image

**Result**: Gene expression + location + image

In [None]:
!pip install scanpy squidpy -q
print("‚úÖ Installed!")

In [None]:
import warnings
warnings.filterwarnings('ignore')
import scanpy as sc
import squidpy as sq
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sc.set_figure_params(dpi=80, facecolor='white')
print("üì¶ Libraries loaded!")

## 1. Load Data

In [None]:
adata = sq.datasets.visium_hne_adata()
print("üì¶ Loaded!\n")
print(f"Spots: {adata.n_obs}")
print(f"Genes: {adata.n_vars}")
print(f"Tissue: Mouse brain")

## 2. Data Structure

In [None]:
print("üîç Data components:\n")
print(f"adata.X: Expression matrix ({adata.X.shape})")
print(f"adata.obs: Spot metadata ({adata.obs.shape})")
print(f"adata.var: Gene metadata ({adata.var.shape})")
print(f"adata.obsm['spatial']: Coordinates ({adata.obsm['spatial'].shape})")
print(f"\nFirst 3 spots:")
print(adata.obs.head(3))

## 3. Visualize Tissue

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

sq.pl.spatial_scatter(adata, img=True, size=0, ax=axes[0])
axes[0].set_title('H&E Image', fontsize=14, fontweight='bold')

sq.pl.spatial_scatter(adata, img=True, size=1.5, ax=axes[1])
axes[1].set_title('H&E + Spots', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()
print("üß† Mouse brain sagittal section")

## 4. Quality Control

In [None]:
sc.pp.calculate_qc_metrics(adata, inplace=True)

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

sq.pl.spatial_scatter(adata, color='total_counts', size=1.5, ax=axes[0])
axes[0].set_title('Total UMI Counts')

sq.pl.spatial_scatter(adata, color='n_genes_by_counts', size=1.5, ax=axes[1])
axes[1].set_title('Number of Genes')

axes[2].scatter(adata.obs['total_counts'], adata.obs['n_genes_by_counts'], 
                alpha=0.5, s=10)
axes[2].set_xlabel('Total UMI')
axes[2].set_ylabel('Number of Genes')
axes[2].set_title('QC Scatter')

plt.tight_layout()
plt.show()

## 5. Preprocessing

In [None]:
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor='seurat', n_top_genes=2000)
print(f"‚úÖ Normalized")
print(f"‚úÖ Log-transformed")
print(f"‚úÖ Found {adata.var['highly_variable'].sum()} variable genes")

## 6. Visualize Gene Expression

In [None]:
genes = ['Gfap', 'Snap25', 'Mbp', 'Hbb-bs']

fig, axes = plt.subplots(2, 2, figsize=(16, 16))
axes = axes.flatten()

for i, gene in enumerate(genes):
    sq.pl.spatial_scatter(
        adata, 
        color=gene, 
        size=1.5,
        ax=axes[i],
        title=f'{gene} Expression'
    )

plt.tight_layout()
plt.show()

print("üß¨ Gene markers:")
print("  Gfap: Astrocytes")
print("  Snap25: Neurons")
print("  Mbp: Oligodendrocytes")
print("  Hbb-bs: Blood vessels")

## 7. Dimensionality Reduction

In [None]:
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

sc.pl.umap(adata, color='total_counts', ax=axes[0], show=False)
axes[0].set_title('UMAP: UMI Counts')

sc.pl.umap(adata, color='n_genes_by_counts', ax=axes[1], show=False)
axes[1].set_title('UMAP: Gene Counts')

plt.tight_layout()
plt.show()

## 8. Spatial Clustering

In [None]:
sc.tl.leiden(adata, resolution=0.5)

fig, axes = plt.subplots(1, 2, figsize=(16, 8))

sq.pl.spatial_scatter(adata, color='leiden', size=1.5, ax=axes[0])
axes[0].set_title('Spatial Domains', fontsize=14, fontweight='bold')

sc.pl.umap(adata, color='leiden', ax=axes[1], show=False)
axes[1].set_title('UMAP Clusters', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print(f"üéØ Found {adata.obs['leiden'].nunique()} spatial domains")

## 9. Find Marker Genes

In [None]:
sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')
sc.pl.rank_genes_groups(adata, n_genes=10, sharey=False)
plt.show()

print("üî¨ Top markers per cluster:")
markers = sc.get.rank_genes_groups_df(adata, group='0')
print(markers.head())

## 10. Spatial Patterns

In [None]:
sq.gr.spatial_neighbors(adata)
sq.gr.spatial_autocorr(adata, mode='moran')

top_spatial = adata.uns['moranI'].head(10)
print("üìä Top spatially variable genes:")
print(top_spatial[['I', 'pval_norm_fdr_bh']])

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

for i, gene in enumerate(top_spatial.index[:6]):
    sq.pl.spatial_scatter(
        adata,
        color=gene,
        size=1.5,
        ax=axes[i],
        title=f"{gene} (Moran's I={top_spatial.loc[gene, 'I']:.3f})"
    )

plt.tight_layout()
plt.show()

## Summary

### What We Learned
1. ‚úÖ Spatial transcriptomics preserves tissue context
2. ‚úÖ Visium technology captures ~5,000 spots
3. ‚úÖ Can visualize gene expression spatially
4. ‚úÖ Find spatial domains and patterns
5. ‚úÖ Identify marker genes for regions

### Applications
- üß† Brain anatomy and function
- ü©∫ Cancer tumor microenvironment
- üî¨ Tissue development
- üß¨ Disease progression

### Key Concepts
- **Spatial domains**: Regions with similar gene expression
- **Moran's I**: Measure of spatial autocorrelation
- **Marker genes**: Genes defining specific regions

## Exercises

1. **Explore different genes**: Pick 4 new genes and visualize
2. **Change clustering**: Try `resolution=0.3` and `resolution=1.0`
3. **Compare clusters**: Use `sc.tl.rank_genes_groups()` to compare cluster 0 vs 1
4. **Spatial neighborhoods**: Use `sq.gr.nhood_enrichment()` to find neighboring clusters
5. **Co-expression**: Plot two genes together using `sq.pl.spatial_scatter(color=[gene1, gene2])`