# Notebook 7: Biomedical Image Analysis

**Pathology and radiology image analysis with Python**

Prerequisites: Notebooks 1-6 (sequences, genomics, transcriptomics, protein structure, RNAseq, clinical)

This notebook builds:
1. Digital pathology fundamentals (IHC staining, stain deconvolution)
2. Nuclei detection and counting
3. Texture features (GLCM, Haralick)
4. Radiology image concepts (MRI windowing, tissue segmentation)
5. Image segmentation (thresholding, watershed)
6. Feature extraction for ML classification
7. Simulated whole-slide image analysis pipeline
8. ROI (Region of Interest) analysis

**Data sources**: Real immunohistochemistry (IHC) colon tissue image (`skimage.data.immunohistochemistry()`), real human mitosis image (`skimage.data.human_mitosis()`) for cell counting and watershed segmentation. Real brain MRI (`skimage.data.brain()`) for windowing and tissue segmentation. Synthetic WSI for tumor/normal tiling narrative (real WSIs are gigapixels).

**Data setup**: No downloads required â€” `scikit-image` includes built-in sample images. Install: `pip install scikit-image`.

Estimated runtime: ~3 minutes on a laptop

**Key learning outcomes:**
1. Understand IHC staining and digital pathology workflows
2. Segment and count cell nuclei computationally
3. Apply MRI windowing to visualize different tissue types
4. Extract texture and shape features for machine learning
5. Build a complete image analysis pipeline from pixels to features

---

## Section 0: Setup

We use **scikit-image** for image processing (including built-in real tissue and brain images), **scipy** for morphological operations, and **matplotlib** for visualization. Real images from `skimage.data` are used for pathology, cell segmentation, and brain MRI; a synthetic phantom illustrates WSI tiling concepts.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import ndimage
from scipy.spatial.distance import cdist
from skimage import data, filters, segmentation, measure, morphology, feature, color
from skimage.draw import disk, ellipse
from skimage.feature import graycomatrix, graycoprops
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import warnings
warnings.filterwarnings('ignore')
print("Ready -- numpy, scipy, scikit-image, sklearn, matplotlib")

---

## Section 1: Digital Pathology Fundamentals

Digital pathology converts glass slides into high-resolution digital images.
Common staining protocols include:

- **H&E** (Hematoxylin & Eosin): the gold standard for general histology
- **IHC** (Immunohistochemistry): hematoxylin counterstain + DAB for specific proteins

In both protocols, **hematoxylin** (blue/purple) stains nuclei and **eosin/DAB** stains
cytoplasm or specific protein targets. We use a **real IHC colon tissue image** from
scikit-image (`skimage.data.ihc()`) and apply **HED color deconvolution** to separate
the hematoxylin, eosin, and DAB stain components computationally.

A typical whole-slide image is 50,000 x 100,000 pixels at 40x magnification.
Analysis happens on tiles (patches) extracted from regions of interest.

Reference [[Hierarchical Composition]] -- tissue has a multi-scale hierarchy:
molecules -> organelles -> cells -> tissues -> organs.

The computational challenge: extract biologically meaningful features from pixels.
This bridges [[Information Compression in Biology]] -- reducing millions of pixels
to a handful of diagnostic features.

In [None]:
# Load real IHC-stained colon tissue image from scikit-image
ihc_rgb = data.immunohistochemistry()  # Real IHC image (512x512x3, uint8)
image = ihc_rgb / 255.0  # Normalize to [0, 1] float
size = image.shape[0]

# Display the real tissue image
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
ax.imshow(image)
ax.set_title(f"Real IHC Colon Tissue ({size}x{size})")
ax.axis('off')
plt.tight_layout()
plt.show()

print(f"Image shape: {image.shape}")
print(f"Source: scikit-image built-in (skimage.data.immunohistochemistry)")
print(f"Staining: Hematoxylin (blue nuclei) + DAB (brown protein marker)")

In [None]:
# Stain deconvolution using HED color space (standard method for IHC)
# HED separates three stain components from the RGB image
ihc_hed = color.rgb2hed(image)
hematoxylin = ihc_hed[:, :, 0]  # Nuclei (blue/purple)
eosin = ihc_hed[:, :, 1]        # Cytoplasm/stroma (pink)
dab = ihc_hed[:, :, 2]          # DAB protein marker (brown)

fig, axes = plt.subplots(1, 4, figsize=(18, 4))
axes[0].imshow(image)
axes[0].set_title("Original IHC")
axes[0].axis('off')

axes[1].imshow(hematoxylin, cmap='Blues')
axes[1].set_title("Hematoxylin (nuclei)")
axes[1].axis('off')

axes[2].imshow(eosin, cmap='Reds')
axes[2].set_title("Eosin (cytoplasm)")
axes[2].axis('off')

axes[3].imshow(dab, cmap='Oranges')
axes[3].set_title("DAB (protein marker)")
axes[3].axis('off')

plt.suptitle("HED Stain Deconvolution (Real IHC Tissue)", fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# Threshold the hematoxylin channel to find nuclei (Otsu's method)
threshold = filters.threshold_otsu(hematoxylin)
nuclei_mask = hematoxylin > threshold
nuclei_mask = morphology.remove_small_objects(nuclei_mask, min_size=30)
nuclei_mask = ndimage.binary_fill_holes(nuclei_mask)
labeled = measure.label(nuclei_mask)
regions = measure.regionprops(labeled, intensity_image=hematoxylin)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].imshow(hematoxylin, cmap='Blues')
axes[0].set_title(f"Hematoxylin (threshold={threshold:.2f})")
axes[0].axis('off')

axes[1].imshow(nuclei_mask, cmap='gray')
axes[1].set_title("Binary nuclei mask")
axes[1].axis('off')

axes[2].imshow(image)
axes[2].imshow(labeled > 0, cmap='Greens', alpha=0.4)
axes[2].set_title(f"Detected nuclei ({labeled.max()} objects)")
axes[2].axis('off')

plt.suptitle("Nuclear Segmentation on Real IHC Tissue", fontsize=14)
plt.tight_layout()
plt.show()

print(f"Otsu threshold: {threshold:.3f}")
print(f"Nuclei detected: {labeled.max()}")

---

## Section 2: Nuclear Morphometry

Nuclear morphometry measures the size, shape, and texture of cell nuclei.
Abnormal nuclear features are hallmarks of cancer:

- Enlarged nuclei (increased DNA content)
- Irregular shape (pleomorphism)
- Increased N:C ratio (nucleus:cytoplasm)

Key features used in pathology grading:

$$\text{Circularity} = \frac{4\pi \cdot \text{Area}}{\text{Perimeter}^2}$$

A perfect circle has circularity = 1.0. Cancer nuclei typically have lower values.

Reference [[Quality Control in Living Systems]] -- normal cells maintain strict size
and shape constraints; cancer cells lose this quality control.

In [None]:
# Extract features for each detected nucleus
features_list = []
for region in regions:
    features_list.append({
        'area': region.area,
        'perimeter': region.perimeter,
        'eccentricity': region.eccentricity,
        'solidity': region.solidity,
        'mean_intensity': region.mean_intensity,
        'circularity': 4 * np.pi * region.area / (region.perimeter ** 2 + 1e-10),
    })

nuc_df = pd.DataFrame(features_list)
print(f"Detected {len(nuc_df)} nuclei")
if len(nuc_df) > 0:
    print(f"\nNuclear morphometry summary:")
    print(nuc_df.describe().round(3).to_string())
else:
    print("No nuclei detected -- check threshold or image generation.")

---

## Section 3: Radiology Image Analysis

Radiology images (CT, MRI, X-ray) use different physics than pathology:

- **CT**: X-ray attenuation in Hounsfield Units (HU). Air = -1000, Water = 0, Bone = +1000
- **MRI**: Magnetic resonance signal intensity (arbitrary units, tissue-dependent contrast)
- **X-ray**: Projection imaging (2D shadow of 3D anatomy)

"Windowing" adjusts contrast to visualize specific tissues. The same raw data
reveals completely different anatomy depending on the window settings. We demonstrate
this with **real brain MRI data** from `skimage.data.brain()` (T1-weighted, 10 axial slices).

Reference [[Signal Processing in Biological Systems]] -- windowing is a form of
signal filtering that extracts task-relevant information from a shared data source.

In [None]:
def apply_window(img, center, width):
    """Apply intensity window (center/width) to radiological image.
    Works for both CT (Hounsfield Units) and MRI (signal intensity)."""
    low = center - width / 2
    high = center + width / 2
    windowed = np.clip(img, low, high)
    windowed = (windowed - low) / (high - low)
    return windowed

# Load real brain MRI from scikit-image
brain_volume = data.brain()  # T1-weighted MRI, (10, 256, 256), uint16
mri_slice = brain_volume[5].astype(float)  # Representative axial slice
size_mri = mri_slice.shape[0]

print(f"Brain MRI volume: {brain_volume.shape} (dtype: {brain_volume.dtype})")
print(f"Selected slice: {mri_slice.shape}")
print(f"Intensity range: [{mri_slice.min():.0f}, {mri_slice.max():.0f}]")
print(f"Source: scikit-image built-in (skimage.data.brain)")

In [None]:
# MRI windowing: same concept as CT windowing, different intensity ranges
# Adjusting center/width reveals different brain structures
max_val = mri_slice.max()

mri_windows = {
    'Full Range': (max_val * 0.5, max_val),
    'Brain Tissue': (max_val * 0.35, max_val * 0.4),
    'CSF/Ventricles': (max_val * 0.08, max_val * 0.15),
    'High Contrast': (max_val * 0.45, max_val * 0.25),
}

fig, axes = plt.subplots(1, 4, figsize=(16, 4))
for ax, (name, (center, width)) in zip(axes, mri_windows.items()):
    windowed = apply_window(mri_slice, center, width)
    ax.imshow(windowed, cmap='gray')
    ax.set_title(f"{name}\nC={center:.0f}, W={width:.0f}")
    ax.axis('off')
plt.suptitle("MRI Windowing: Same Data, Different Views (Real Brain)", fontsize=14)
plt.tight_layout()
plt.show()

print(f"MRI image shape: {mri_slice.shape}")
print(f"Key insight: same raw data, 4 completely different views")
print(f"Windowing selects which intensity range to stretch across the display")

In [None]:
# Tissue classification from MRI intensities
# Multi-Otsu thresholding segments brain into tissue classes
brain_mask = mri_slice > mri_slice.max() * 0.03  # Exclude background
brain_pixels = mri_slice[brain_mask]
thresholds = filters.threshold_multiotsu(brain_pixels, classes=3)

tissue_map_mri = np.zeros(mri_slice.shape, dtype=int)
tissue_map_mri[~brain_mask] = 0                                        # Background
tissue_map_mri[brain_mask & (mri_slice <= thresholds[0])] = 1          # CSF / dark
tissue_map_mri[brain_mask & (mri_slice > thresholds[0]) & (mri_slice <= thresholds[1])] = 2  # Gray matter
tissue_map_mri[brain_mask & (mri_slice > thresholds[1])] = 3           # White matter

tissue_names = ['Background', 'CSF / Dark', 'Gray Matter', 'White Matter']

fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(apply_window(mri_slice, mri_slice.max()*0.35, mri_slice.max()*0.5), cmap='gray')
axes[0].set_title("Brain MRI (windowed)")
axes[0].axis('off')

cmap_tissue = plt.cm.get_cmap('tab10', 4)
im = axes[1].imshow(tissue_map_mri, cmap=cmap_tissue, vmin=0, vmax=3)
axes[1].set_title("Tissue Classification (Multi-Otsu)")
axes[1].axis('off')
cbar = plt.colorbar(im, ax=axes[1], ticks=[0, 1, 2, 3])
cbar.ax.set_yticklabels(tissue_names)

plt.tight_layout()
plt.show()

# Print pixel counts per tissue type
total = mri_slice.size
for i, name in enumerate(tissue_names):
    count = (tissue_map_mri == i).sum()
    print(f"  {name}: {count:,} pixels ({100*count/total:.1f}%)")
print(f"\nMulti-Otsu thresholds: {thresholds[0]:.0f}, {thresholds[1]:.0f}")

---

## Section 4: Image Segmentation Pipeline

Segmentation separates structures of interest from background. Methods:

- **Thresholding**: simple intensity cutoff (Otsu's method)
- **Watershed**: separates touching objects using topology
- **Deep learning**: U-Net, nnU-Net (not covered here, requires GPU)

The watershed algorithm treats the image as a topographic surface. It "floods"
from local minima (markers) and builds barriers where different flood basins meet.

Reference [[Figure-Ground Decomposition]] -- segmentation is the computational
analog of perceptual figure-ground separation.

In [None]:
# Load real human mitosis fluorescence microscopy image
mitosis_image = data.human_mitosis()  # Grayscale, uint8, 512x512
mitosis_float = mitosis_image / 255.0

fig, ax = plt.subplots(1, 1, figsize=(6, 6))
ax.imshow(mitosis_float, cmap='gray')
ax.set_title(f"Real Human Mitosis ({mitosis_image.shape[0]}x{mitosis_image.shape[1]})")
ax.axis('off')
plt.tight_layout()
plt.show()

print(f"Image shape: {mitosis_image.shape}")
print(f"Source: scikit-image built-in (skimage.data.human_mitosis)")
print(f"Content: Fluorescence microscopy -- human cells undergoing mitosis")

In [None]:
# Watershed segmentation on real human mitosis image
binary = mitosis_float > filters.threshold_otsu(mitosis_float)
binary = morphology.remove_small_objects(binary, min_size=50)
binary = ndimage.binary_fill_holes(binary)
distance = ndimage.distance_transform_edt(binary)
local_max = feature.peak_local_max(distance, min_distance=7, labels=binary)
markers = np.zeros_like(binary, dtype=int)
for i, (y_m, x_m) in enumerate(local_max):
    markers[y_m, x_m] = i + 1
labels_ws = segmentation.watershed(-distance, markers, mask=binary)

# Visualize the watershed result
fig, axes = plt.subplots(1, 4, figsize=(16, 4))
axes[0].imshow(mitosis_float, cmap='gray')
axes[0].set_title("Original (human mitosis)")
axes[0].axis('off')

axes[1].imshow(distance, cmap='hot')
axes[1].set_title("Distance transform")
axes[1].axis('off')

axes[2].imshow(markers > 0, cmap='gray')
axes[2].set_title(f"Markers ({markers.max()} seeds)")
axes[2].axis('off')

axes[3].imshow(labels_ws, cmap='nipy_spectral')
axes[3].set_title(f"Watershed ({labels_ws.max()} cells)")
axes[3].axis('off')

plt.suptitle("Watershed Segmentation (Real Cell Image)", fontsize=14)
plt.tight_layout()
plt.show()

print(f"Cells detected by watershed: {labels_ws.max()}")

In [None]:
# Brain ventricle segmentation: detect CSF-filled spaces within brain parenchyma
# Ventricles are low-intensity regions surrounded by brain tissue
csf_mask = (tissue_map_mri == 1)  # CSF class from multi-Otsu
# Clean up: remove small noise, fill holes, keep only interior structures
csf_clean = morphology.remove_small_objects(csf_mask, min_size=100)
csf_clean = ndimage.binary_fill_holes(csf_clean)

# Label connected components (ventricle structures)
ventricle_labels = measure.label(csf_clean)
ventricle_regions = measure.regionprops(ventricle_labels, intensity_image=mri_slice)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(apply_window(mri_slice, mri_slice.max()*0.35, mri_slice.max()*0.5), cmap='gray')
axes[0].set_title("Brain MRI")
axes[0].axis('off')

axes[1].imshow(csf_clean, cmap='gray')
axes[1].set_title("CSF Regions (cleaned)")
axes[1].axis('off')

axes[2].imshow(apply_window(mri_slice, mri_slice.max()*0.35, mri_slice.max()*0.5), cmap='gray')
axes[2].imshow(np.ma.masked_where(ventricle_labels == 0, ventricle_labels),
              cmap='hot', alpha=0.6)
axes[2].set_title(f"Detected Structures ({ventricle_labels.max()} regions)")
axes[2].axis('off')

plt.suptitle("Brain Structure Segmentation (Real MRI)", fontsize=14)
plt.tight_layout()
plt.show()

for i, region in enumerate(ventricle_regions[:5]):  # Show top 5 by size
    print(f"  Region {i+1}: area={region.area} px, "
          f"mean intensity={region.mean_intensity:.0f}, "
          f"centroid=({region.centroid[0]:.0f}, {region.centroid[1]:.0f})")

---

## Section 5: Feature Extraction for ML

Machine learning on medical images requires features. Two approaches:

1. **Handcrafted**: texture (GLCM), shape (morphometry), intensity statistics -- interpretable
2. **Learned**: CNN features from pretrained networks -- more powerful but less interpretable

The Gray-Level Co-occurrence Matrix (GLCM) captures texture by counting how often
pairs of pixel intensities occur at specific spatial relationships. Haralick features:

- **Contrast**: intensity difference between neighboring pixels
- **Homogeneity**: closeness of pixel pair distribution to the GLCM diagonal
- **Energy**: sum of squared GLCM elements (uniformity)
- **Correlation**: linear dependency between neighboring pixels

Reference [[Information Compression in Biology]] -- feature extraction compresses
images into biologically meaningful representations.

In [None]:
def compute_texture_features(patch):
    """Compute GLCM-based texture features from a grayscale patch."""
    patch_uint8 = (patch * 255).astype(np.uint8)
    glcm = graycomatrix(patch_uint8, distances=[1, 3],
                        angles=[0, np.pi / 4, np.pi / 2],
                        levels=256, symmetric=True, normed=True)
    features = {}
    for prop in ['contrast', 'dissimilarity', 'homogeneity', 'energy', 'correlation']:
        values = graycoprops(glcm, prop)
        features[prop] = values.mean()
    return features

# Compute texture features on patches from the real IHC tissue image
gray_image = np.mean(image, axis=2)  # Convert IHC to grayscale
patch_size = 64
texture_results = []
for y_p in range(0, size - patch_size, patch_size):
    for x_p in range(0, size - patch_size, patch_size):
        patch = gray_image[y_p:y_p + patch_size, x_p:x_p + patch_size]
        tex = compute_texture_features(patch)
        tex['y'] = y_p
        tex['x'] = x_p
        texture_results.append(tex)

tex_df = pd.DataFrame(texture_results)
print("GLCM texture features across real IHC tissue patches:")
print(tex_df[['contrast', 'homogeneity', 'energy', 'correlation']].describe().round(4).to_string())

---

## Section 6: Simulated Whole-Slide Image Analysis

Whole-slide images (WSI) are too large to process at once (50,000 x 100,000 pixels).
The standard approach:

1. **Tissue detection**: find tissue regions at low magnification
2. **Tiling**: extract overlapping patches at high magnification
3. **Per-tile analysis**: run feature extraction or deep learning on each tile
4. **Aggregation**: combine tile-level results into slide-level predictions

This is a natural application of [[Hierarchical Composition]] -- information flows
from pixels to patches to slide-level diagnosis.

Reference [[Tissue Topology]] -- the spatial arrangement of cell types within
tissue carries diagnostic information beyond individual cell features.

In [None]:
# Generate a synthetic whole-slide image (WSI) with tumor and normal regions
# Real WSIs are 50,000+ pixels; we use 512x512 as a pedagogical stand-in
np.random.seed(77)
wsi_size = 512
wsi_image = np.zeros((wsi_size, wsi_size, 3))

# Normal tissue background (light pink)
wsi_image[:, :, 0] = 0.90 + np.random.normal(0, 0.02, (wsi_size, wsi_size))
wsi_image[:, :, 1] = 0.80 + np.random.normal(0, 0.02, (wsi_size, wsi_size))
wsi_image[:, :, 2] = 0.82 + np.random.normal(0, 0.02, (wsi_size, wsi_size))

# Tumor region (denser, darker -- upper-left quadrant)
tumor_rr, tumor_cc = ellipse(150, 150, 100, 120, shape=(wsi_size, wsi_size))
wsi_image[tumor_rr, tumor_cc, 0] = 0.75 + np.random.normal(0, 0.03, len(tumor_rr))
wsi_image[tumor_rr, tumor_cc, 1] = 0.55 + np.random.normal(0, 0.03, len(tumor_rr))
wsi_image[tumor_rr, tumor_cc, 2] = 0.65 + np.random.normal(0, 0.03, len(tumor_rr))

# Scatter nuclei across the WSI (more densely in tumor region)
nuc_mask_wsi = np.zeros((wsi_size, wsi_size), dtype=bool)
tumor_mask = np.zeros((wsi_size, wsi_size), dtype=bool)
tumor_mask[tumor_rr, tumor_cc] = True

# Normal region nuclei (sparse)
for _ in range(200):
    cy, cx = np.random.randint(10, wsi_size - 10, 2)
    if not tumor_mask[cy, cx]:
        r = np.random.randint(2, 5)
        rr, cc = disk((cy, cx), r, shape=(wsi_size, wsi_size))
        nuc_mask_wsi[rr, cc] = True
        intensity = np.random.uniform(0.2, 0.4)
        wsi_image[rr, cc, 0] = intensity
        wsi_image[rr, cc, 1] = intensity * 0.4
        wsi_image[rr, cc, 2] = intensity * 1.3

# Tumor region nuclei (dense, irregular)
for _ in range(600):
    cy = np.random.randint(50, 250)
    cx = np.random.randint(30, 270)
    if tumor_mask[min(cy, wsi_size-1), min(cx, wsi_size-1)]:
        r = np.random.randint(2, 6)
        rr, cc = disk((cy, cx), r, shape=(wsi_size, wsi_size))
        nuc_mask_wsi[rr, cc] = True
        intensity = np.random.uniform(0.15, 0.30)
        wsi_image[rr, cc, 0] = intensity
        wsi_image[rr, cc, 1] = intensity * 0.3
        wsi_image[rr, cc, 2] = intensity * 1.4

wsi_image = np.clip(wsi_image, 0, 1)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(wsi_image)
axes[0].set_title(f"Synthetic WSI ({wsi_size}x{wsi_size})")
axes[0].axis('off')

axes[1].imshow(nuc_mask_wsi, cmap='gray')
axes[1].set_title("Nuclear mask")
axes[1].axis('off')

plt.tight_layout()
plt.show()

print(f"WSI shape: {wsi_image.shape}")
print(f"Nuclear pixels: {nuc_mask_wsi.sum():,} ({100*nuc_mask_wsi.mean():.2f}%)")

In [None]:
# Tile the WSI and compute nuclear density per tile
tile_size = 64
stride = 64
tile_features = []

for y_tile in range(0, wsi_size - tile_size + 1, stride):
    for x_tile in range(0, wsi_size - tile_size + 1, stride):
        tile_mask = nuc_mask_wsi[y_tile:y_tile + tile_size, x_tile:x_tile + tile_size]
        tile_rgb = wsi_image[y_tile:y_tile + tile_size, x_tile:x_tile + tile_size]
        nuc_density = tile_mask.mean()
        mean_r = tile_rgb[:, :, 0].mean()
        mean_g = tile_rgb[:, :, 1].mean()
        mean_b = tile_rgb[:, :, 2].mean()
        tile_features.append({
            'y': y_tile, 'x': x_tile,
            'nuc_density': nuc_density,
            'mean_r': mean_r, 'mean_g': mean_g, 'mean_b': mean_b,
        })

tile_df = pd.DataFrame(tile_features)

# Create a heatmap of nuclear density across tiles
n_tiles = wsi_size // tile_size
density_map = tile_df['nuc_density'].values.reshape(n_tiles, n_tiles)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(wsi_image)
# Draw grid lines for tiles
for i in range(0, wsi_size, tile_size):
    axes[0].axhline(i, color='yellow', linewidth=0.5, alpha=0.5)
    axes[0].axvline(i, color='yellow', linewidth=0.5, alpha=0.5)
axes[0].set_title(f"WSI with tile grid ({n_tiles}x{n_tiles} = {len(tile_features)} tiles)")
axes[0].axis('off')

im = axes[1].imshow(density_map, cmap='hot', interpolation='nearest')
axes[1].set_title("Nuclear density per tile")
axes[1].set_xlabel("Tile column")
axes[1].set_ylabel("Tile row")
plt.colorbar(im, ax=axes[1], label="Density")

plt.suptitle("Whole-Slide Image Tiling Pipeline", fontsize=14)
plt.tight_layout()
plt.show()

# Summary statistics
print(f"Total tiles: {len(tile_features)}")
print(f"Mean nuclear density: {tile_df['nuc_density'].mean():.4f}")
print(f"Max nuclear density:  {tile_df['nuc_density'].max():.4f}")
high_density = tile_df[tile_df['nuc_density'] > tile_df['nuc_density'].quantile(0.9)]
print(f"High-density tiles (top 10%): {len(high_density)} tiles")

---

## Section 7: ROI Analysis and Feature Aggregation

Region of Interest (ROI) analysis combines spatial information with feature
extraction. Aggregation strategies for slide-level prediction:

- **Mean pooling**: average features across all tiles
- **Max pooling**: take the "worst" (most abnormal) tile
- **Multiple Instance Learning (MIL)**: learn which tiles matter

This connects to [[Selective Catabolism]] -- the system selectively processes
the most informative regions rather than treating all tissue equally.

---

## Summary

| Concept | What you built | Why it matters |
|---------|---------------|----------------|
| Real IHC tissue | skimage.data.immunohistochemistry() | [[Hierarchical Composition]] in tissue |
| Stain deconvolution | HED color space separation | Isolate nuclear vs cytoplasmic signal |
| Nuclear segmentation | Otsu + morphology on real tissue | Count and characterize cells |
| Morphometry | Area, shape, intensity features | Cancer grading criteria |
| MRI windowing | Real brain MRI (skimage.data.brain) | Same data, different views |
| Tissue classification | Multi-Otsu brain segmentation | Automated anatomy parsing |
| Watershed | Real cell separation (human mitosis) | [[Figure-Ground Decomposition]] |
| GLCM texture | Statistical texture features | ML input for classification |
| WSI pipeline | Tile-based analysis (synthetic) | Scalable digital pathology |
| ROI analysis | Feature aggregation | [[Selective Catabolism]] in diagnostics |

**The complete series:**
- [[01_Sequence_Analysis_Fundamentals]] -- Biopython, the Central Dogma
- [[02_Genomic_Variant_Analysis]] -- Population genetics, GWAS
- [[03_Single_Cell_Transcriptomics]] -- scanpy, cell type discovery
- [[04_Protein_Structure_Drug_Discovery]] -- Structure and drug design
- [[05_Bulk_RNAseq_Differential_Expression]] -- Differential expression analysis
- [[06_Clinical_Biomedical_Informatics]] -- Clinical data and survival analysis
- [[07_Biomedical_Image_Analysis]] -- Pathology and radiology (this notebook)

**Next**: [[08_Plant_Biology_Agricultural_Genomics]] -- crop science and agricultural applications