# Medical Image Analysis Tutorial: Dimensionality Reduction, Clustering & Outlier Detection

This tutorial demonstrates various techniques for analyzing medical image embeddings using the SLAKE dataset.

## Overview
1. **Dataset**: SLAKE medical VQA dataset
2. **Embeddings**: BiomedCLIP (medical vision-language model)
3. **Clustering**: DBSCAN, HDBSCAN, K-means, FINCH
4. **Dimensionality Reduction**: PCA, t-SNE, UMAP, h-NNE
5. **Visualization**: Comparative analysis of methods

## 1. Setup & Installation

Install all required packages for this tutorial.

In [1]:
# Install required packages
!pip install -q open-clip-torch torch pillow numpy scikit-learn
!pip install -q hdbscan umap-learn hnne
!pip install -q matplotlib seaborn
!pip install -q gdown  # For Google Drive downloads
!pip install -q huggingface_hub  # For downloading from HuggingFace

In [2]:
# Import libraries
import os
import json
import numpy as np
from pathlib import Path
from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

# Deep learning
import torch
import open_clip
from huggingface_hub import hf_hub_download

# Clustering
from sklearn.cluster import DBSCAN, KMeans
import hdbscan
try:
    from finch import FINCH
except ImportError:
    !pip install -q finch-clust
    from finch import FINCH

# Dimensionality reduction
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import umap
from hnne import HNNE

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

  from .autonotebook import tqdm as notebook_tqdm


Using device: cpu


## 2. Dataset Download & Loading

Download the SLAKE dataset from Google Drive and parse its structure.

In [3]:
# Download SLAKE dataset from Google Drive
# Note: You may need to manually download if gdown fails due to Google Drive restrictions
# Link: https://drive.google.com/file/d/1EZ0WpO5Z6BJUqC3iPBQJJS1INWSMsh7U/view

dataset_path = Path('./Slake1.0')

if not dataset_path.exists():
    print("Downloading SLAKE dataset...")
    !gdown --fuzzy 'https://drive.google.com/file/d/1EZ0WpO5Z6BJUqC3iPBQJJS1INWSMsh7U/view'
    
    # Extract the archive
    import zipfile
    with zipfile.ZipFile('Slake1.0.zip', 'r') as zip_ref:
        zip_ref.extractall('.')
    print("Dataset downloaded and extracted!")
else:
    print("Dataset already exists!")

Dataset already exists!


In [4]:
# Parse dataset structure
def load_slake_dataset(dataset_path):
    """
    Load SLAKE dataset structure.
    Returns: List of dicts with image_path, qa_pairs, and sample_id
    """
    imgs_path = dataset_path / 'imgs'
    samples = []
    
    for sample_dir in sorted(imgs_path.iterdir()):
        if not sample_dir.is_dir():
            continue
            
        source_img = sample_dir / 'source.jpg'
        question_file = sample_dir / 'question.json'
        
        if source_img.exists() and question_file.exists():
            with open(question_file, 'r') as f:
                qa_data = json.load(f)
            
            samples.append({
                'sample_id': sample_dir.name,
                'image_path': str(source_img),
                'qa_pairs': qa_data
            })
    
    return samples

# Load dataset
slake_data = load_slake_dataset(dataset_path)
print(f"Loaded {len(slake_data)} samples from SLAKE dataset")

# Display sample
print("\nSample data structure:")
print(f"Sample ID: {slake_data[0]['sample_id']}")
print(f"Image path: {slake_data[0]['image_path']}")
print(f"Number of QA pairs: {len(slake_data[0]['qa_pairs'])}")

Loaded 642 samples from SLAKE dataset

Sample data structure:
Sample ID: xmlab0
Image path: Slake1.0/imgs/xmlab0/source.jpg
Number of QA pairs: 20


## 3. Embedding Generation with BiomedCLIP

We'll use the BiomedCLIP model to generate embeddings for:
- **Images**: Using the vision encoder
- **Text (QA pairs)**: Using the text encoder

Embeddings will be saved as numpy files for quick reloading.

In [5]:
# Load BiomedCLIP model using open_clip
print("Loading BiomedCLIP model...")

# Download model files from HuggingFace
model_path = hf_hub_download(
    repo_id="microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224",
    filename="open_clip_pytorch_model.bin"
)

# Load model and preprocessing
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
tokenizer = open_clip.get_tokenizer('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')

model = model.to(device)
model.eval()

print("Model loaded successfully!")
print(f"Model architecture: {model.__class__.__name__}")
print(f"Image input size: {model.visual.image_size if hasattr(model.visual, 'image_size') else 'N/A'}")

Loading BiomedCLIP model...
Model loaded successfully!
Model architecture: CustomTextCLIP
Image input size: (224, 224)


In [6]:
# Create output directory for embeddings
embeddings_dir = Path('./embeddings')
embeddings_dir.mkdir(exist_ok=True)

image_embeddings_path = embeddings_dir / 'image_embeddings.npy'
text_embeddings_path = embeddings_dir / 'text_embeddings.npy'
metadata_path = embeddings_dir / 'metadata.json'

In [7]:
# Generate image embeddings
def generate_image_embeddings(data, model, preprocess, device, batch_size=32):
    """
    Generate embeddings for all images in the dataset using open_clip.
    """
    embeddings = []
    
    with torch.no_grad():
        for i in tqdm(range(0, len(data), batch_size), desc="Generating image embeddings"):
            batch = data[i:i+batch_size]
            
            # Load and process images
            images = []
            for sample in batch:
                img = Image.open(sample['image_path']).convert('RGB')
                img_tensor = preprocess(img)
                images.append(img_tensor)
            
            # Stack images into batch
            images_batch = torch.stack(images).to(device)
            
            # Get image embeddings
            image_features = model.encode_image(images_batch)
            # Normalize embeddings
            image_features = image_features / image_features.norm(dim=-1, keepdim=True)
            embeddings.append(image_features.cpu().numpy())
    
    return np.vstack(embeddings)

# Generate or load image embeddings
if image_embeddings_path.exists():
    print("Loading existing image embeddings...")
    image_embeddings = np.load(image_embeddings_path)
else:
    print("Generating image embeddings...")
    image_embeddings = generate_image_embeddings(slake_data, model, preprocess_val, device)
    np.save(image_embeddings_path, image_embeddings)
    print(f"Image embeddings saved to {image_embeddings_path}")

print(f"Image embeddings shape: {image_embeddings.shape}")

Generating image embeddings...


Generating image embeddings: 100%|██████████| 21/21 [00:42<00:00,  2.01s/it]

Image embeddings saved to embeddings/image_embeddings.npy
Image embeddings shape: (642, 512)





In [None]:
# Generate text embeddings from QA pairs
def generate_text_embeddings(data, model, tokenizer, device, batch_size=32):
    """
    Generate embeddings for all QA pairs in the dataset using open_clip.
    Each image may have multiple QA pairs - we'll create separate embeddings.
    """
    all_texts = []
    text_to_image_map = []  # Maps text embedding index to image index
    
    # Collect all QA pairs
    for img_idx, sample in enumerate(data):
        for qa in sample['qa_pairs']:
            question = qa.get('question', '')
            answer = qa.get('answer', '')
            # Combine question and answer
            text = f"{question} {answer}"
            all_texts.append(text)
            text_to_image_map.append(img_idx)
    
    embeddings = []
    
    with torch.no_grad():
        for i in tqdm(range(0, len(all_texts), batch_size), desc="Generating text embeddings"):
            batch_texts = all_texts[i:i+batch_size]
            
            # Tokenize texts
            text_tokens = tokenizer(batch_texts).to(device)
            
            # Get text embeddings
            text_features = model.encode_text(text_tokens)
            # Normalize embeddings
            text_features = text_features / text_features.norm(dim=-1, keepdim=True)
            embeddings.append(text_features.cpu().numpy())
    
    return np.vstack(embeddings), text_to_image_map

# Generate or load text embeddings
if text_embeddings_path.exists():
    print("Loading existing text embeddings...")
    text_embeddings = np.load(text_embeddings_path)
    with open(metadata_path, 'r') as f:
        metadata = json.load(f)
    text_to_image_map = metadata['text_to_image_map']
else:
    print("Generating text embeddings...")
    text_embeddings, text_to_image_map = generate_text_embeddings(slake_data, model, tokenizer, device)
    np.save(text_embeddings_path, text_embeddings)
    
    # Save metadata
    metadata = {'text_to_image_map': text_to_image_map}
    with open(metadata_path, 'w') as f:
        json.dump(metadata, f)
    print(f"Text embeddings saved to {text_embeddings_path}")

print(f"Text embeddings shape: {text_embeddings.shape}")
print(f"Number of QA pairs: {len(text_to_image_map)}")

Generating text embeddings...


Generating text embeddings:  69%|██████▉   | 304/439 [20:04<10:42,  4.76s/it]

## 4. Clustering Methods

We'll apply four different clustering algorithms to the **image embeddings**:
1. **DBSCAN**: Density-based clustering
2. **HDBSCAN**: Hierarchical DBSCAN
3. **K-means**: Centroid-based clustering
4. **FINCH**: Parameter-free hierarchical clustering

In [None]:
# Use image embeddings for clustering
X = image_embeddings.copy()

# Normalize embeddings (recommended for cosine-based models)
from sklearn.preprocessing import normalize
X_normalized = normalize(X, norm='l2')

print(f"Working with {X_normalized.shape[0]} image embeddings of dimension {X_normalized.shape[1]}")

In [None]:
# 1. DBSCAN Clustering
print("Running DBSCAN...")
dbscan = DBSCAN(eps=0.5, min_samples=5, metric='euclidean')
dbscan_labels = dbscan.fit_predict(X_normalized)

n_clusters_dbscan = len(set(dbscan_labels)) - (1 if -1 in dbscan_labels else 0)
n_noise_dbscan = list(dbscan_labels).count(-1)
print(f"DBSCAN: {n_clusters_dbscan} clusters, {n_noise_dbscan} noise points")

In [None]:
# 2. HDBSCAN Clustering
print("Running HDBSCAN...")
hdbscan_clusterer = hdbscan.HDBSCAN(min_cluster_size=5, min_samples=3)
hdbscan_labels = hdbscan_clusterer.fit_predict(X_normalized)

n_clusters_hdbscan = len(set(hdbscan_labels)) - (1 if -1 in hdbscan_labels else 0)
n_noise_hdbscan = list(hdbscan_labels).count(-1)
print(f"HDBSCAN: {n_clusters_hdbscan} clusters, {n_noise_hdbscan} noise points")

In [None]:
# 3. K-means Clustering
print("Running K-means...")
n_clusters_kmeans = 8  # Default reasonable value
kmeans = KMeans(n_clusters=n_clusters_kmeans, random_state=42, n_init=10)
kmeans_labels = kmeans.fit_predict(X_normalized)

print(f"K-means: {n_clusters_kmeans} clusters")

In [None]:
# 4. FINCH Clustering
print("Running FINCH...")
finch_labels, _, _ = FINCH(X_normalized, verbose=False)
# FINCH returns multiple partitions; we'll use the first one
finch_labels = finch_labels[:, 0]

n_clusters_finch = len(set(finch_labels))
print(f"FINCH: {n_clusters_finch} clusters")

In [None]:
# Summary of clustering results
print("\n" + "="*60)
print("CLUSTERING SUMMARY")
print("="*60)
print(f"DBSCAN:   {n_clusters_dbscan:3d} clusters, {n_noise_dbscan:3d} noise points")
print(f"HDBSCAN:  {n_clusters_hdbscan:3d} clusters, {n_noise_hdbscan:3d} noise points")
print(f"K-means:  {n_clusters_kmeans:3d} clusters")
print(f"FINCH:    {n_clusters_finch:3d} clusters")
print("="*60)

## 5. Dimensionality Reduction

Reduce the image embeddings to 2D using four different methods:
1. **PCA**: Linear dimensionality reduction
2. **t-SNE**: Non-linear, preserves local structure
3. **UMAP**: Non-linear, preserves both local and global structure
4. **h-NNE**: Hierarchical nearest neighbor embedding

In [None]:
# 1. PCA
print("Running PCA...")
pca = PCA(n_components=2, random_state=42)
pca_embeddings = pca.fit_transform(X_normalized)
print(f"PCA explained variance: {pca.explained_variance_ratio_.sum():.2%}")

In [None]:
# 2. t-SNE
print("Running t-SNE...")
tsne = TSNE(n_components=2, random_state=42, perplexity=30, n_iter=1000)
tsne_embeddings = tsne.fit_transform(X_normalized)
print("t-SNE completed!")

In [None]:
# 3. UMAP
print("Running UMAP...")
umap_reducer = umap.UMAP(n_components=2, random_state=42, n_neighbors=15, min_dist=0.1)
umap_embeddings = umap_reducer.fit_transform(X_normalized)
print("UMAP completed!")

In [None]:
# 4. h-NNE
print("Running h-NNE...")
hnne_reducer = HNNE(n_components=2)
hnne_embeddings = hnne_reducer.fit_transform(X_normalized)
print("h-NNE completed!")

## 6. Visualization: Clustering Comparison on t-SNE

Compare the four clustering methods by overlaying their labels on t-SNE embeddings in a 2x2 grid.

In [None]:
# Create 2x2 grid comparing clustering methods on t-SNE
fig, axes = plt.subplots(2, 2, figsize=(16, 14))
fig.suptitle('Clustering Methods Comparison (t-SNE Embeddings)', fontsize=16, fontweight='bold')

clustering_results = [
    (dbscan_labels, 'DBSCAN', n_clusters_dbscan),
    (hdbscan_labels, 'HDBSCAN', n_clusters_hdbscan),
    (kmeans_labels, 'K-means', n_clusters_kmeans),
    (finch_labels, 'FINCH', n_clusters_finch)
]

for idx, (labels, method_name, n_clusters) in enumerate(clustering_results):
    ax = axes[idx // 2, idx % 2]
    
    # Use different colors for each cluster
    unique_labels = set(labels)
    colors = plt.cm.tab20(np.linspace(0, 1, len(unique_labels)))
    
    for label, color in zip(unique_labels, colors):
        mask = labels == label
        if label == -1:
            # Noise points in black
            ax.scatter(tsne_embeddings[mask, 0], tsne_embeddings[mask, 1], 
                      c='black', s=20, alpha=0.3, label='Noise')
        else:
            ax.scatter(tsne_embeddings[mask, 0], tsne_embeddings[mask, 1], 
                      c=[color], s=30, alpha=0.6, label=f'Cluster {label}')
    
    ax.set_title(f'{method_name} ({n_clusters} clusters)', fontsize=14, fontweight='bold')
    ax.set_xlabel('t-SNE 1', fontsize=11)
    ax.set_ylabel('t-SNE 2', fontsize=11)
    ax.grid(True, alpha=0.3)
    
    # Only show legend if not too many clusters
    if len(unique_labels) <= 10:
        ax.legend(loc='best', fontsize=8, ncol=2)

plt.tight_layout()
plt.savefig('clustering_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("Clustering comparison visualization saved as 'clustering_comparison.png'")

## 7. Sample Images per Cluster (K-means)

Display 5 sample images from each K-means cluster. If a cluster has fewer than 5 images, fill with black placeholders.

In [None]:
def create_cluster_sample_grid(labels, image_paths, n_samples=5, img_size=(64, 64)):
    """
    Create a grid showing sample images from each cluster.
    
    Args:
        labels: Cluster labels for each image
        image_paths: List of paths to images
        n_samples: Number of samples to show per cluster
        img_size: Size to resize images to
    """
    unique_labels = sorted(set(labels))
    # Filter out noise label if present
    unique_labels = [l for l in unique_labels if l != -1]
    
    n_clusters = len(unique_labels)
    
    fig, axes = plt.subplots(n_clusters, n_samples, figsize=(n_samples * 2, n_clusters * 2))
    fig.suptitle(f'Sample Images per Cluster (K-means, {n_clusters} clusters)', 
                 fontsize=14, fontweight='bold')
    
    # Handle case of single cluster
    if n_clusters == 1:
        axes = axes.reshape(1, -1)
    
    for cluster_idx, label in enumerate(unique_labels):
        # Get indices of images in this cluster
        cluster_mask = labels == label
        cluster_indices = np.where(cluster_mask)[0]
        
        # Sample up to n_samples images
        n_available = len(cluster_indices)
        sampled_indices = np.random.choice(cluster_indices, 
                                          size=min(n_samples, n_available), 
                                          replace=False)
        
        for sample_idx in range(n_samples):
            ax = axes[cluster_idx, sample_idx]
            
            if sample_idx < len(sampled_indices):
                # Load and display image
                img_path = image_paths[sampled_indices[sample_idx]]
                img = Image.open(img_path).convert('RGB')
                img = img.resize(img_size)
                ax.imshow(img)
            else:
                # Black filler image
                ax.imshow(np.zeros((*img_size, 3), dtype=np.uint8))
            
            ax.axis('off')
            
            # Add cluster label on first image
            if sample_idx == 0:
                ax.set_ylabel(f'Cluster {label}\n({n_available} imgs)', 
                            fontsize=10, fontweight='bold')
    
    plt.tight_layout()
    plt.savefig('cluster_samples.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("Cluster sample grid saved as 'cluster_samples.png'")

# Create image paths list
image_paths = [sample['image_path'] for sample in slake_data]

# Generate grid
create_cluster_sample_grid(kmeans_labels, image_paths, n_samples=5)

## 8. Dimensionality Reduction Comparison

Compare all four dimensionality reduction methods in a 2x2 grid, colored by K-means labels.

In [None]:
# Create 2x2 grid comparing dimensionality reduction methods (colored by K-means)
fig, axes = plt.subplots(2, 2, figsize=(16, 14))
fig.suptitle('Dimensionality Reduction Methods Comparison (K-means Labels)', 
             fontsize=16, fontweight='bold')

reduction_results = [
    (pca_embeddings, 'PCA'),
    (tsne_embeddings, 't-SNE'),
    (umap_embeddings, 'UMAP'),
    (hnne_embeddings, 'h-NNE')
]

# Use same colors for K-means across all plots
unique_kmeans = sorted(set(kmeans_labels))
colors = plt.cm.tab10(np.linspace(0, 1, len(unique_kmeans)))

for idx, (embeddings, method_name) in enumerate(reduction_results):
    ax = axes[idx // 2, idx % 2]
    
    # Plot each cluster with consistent colors
    for label, color in zip(unique_kmeans, colors):
        mask = kmeans_labels == label
        ax.scatter(embeddings[mask, 0], embeddings[mask, 1], 
                  c=[color], s=30, alpha=0.6, label=f'Cluster {label}')
    
    ax.set_title(f'{method_name}', fontsize=14, fontweight='bold')
    ax.set_xlabel(f'{method_name} 1', fontsize=11)
    ax.set_ylabel(f'{method_name} 2', fontsize=11)
    ax.grid(True, alpha=0.3)
    ax.legend(loc='best', fontsize=8, ncol=2)

plt.tight_layout()
plt.savefig('dimensionality_reduction_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("Dimensionality reduction comparison saved as 'dimensionality_reduction_comparison.png'")

## 9. Outlier Detection (Bonus)

We can leverage the clustering results for outlier detection:
- **DBSCAN/HDBSCAN**: Noise points (label = -1) are outliers
- **K-means**: Points far from cluster centers are potential outliers
- **Isolation Forest**: Dedicated outlier detection method

In [None]:
from sklearn.ensemble import IsolationForest

# Isolation Forest for outlier detection
print("Running Isolation Forest...")
iso_forest = IsolationForest(contamination=0.1, random_state=42)
outlier_labels = iso_forest.fit_predict(X_normalized)
# Convert to binary: -1 (outlier) and 1 (inlier)

n_outliers_iso = list(outlier_labels).count(-1)
print(f"Isolation Forest detected {n_outliers_iso} outliers")

# Visualize outliers on t-SNE
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
fig.suptitle('Outlier Detection Methods', fontsize=16, fontweight='bold')

# 1. DBSCAN outliers
ax = axes[0]
inliers = dbscan_labels != -1
outliers = dbscan_labels == -1
ax.scatter(tsne_embeddings[inliers, 0], tsne_embeddings[inliers, 1], 
          c='lightblue', s=30, alpha=0.5, label='Inliers')
ax.scatter(tsne_embeddings[outliers, 0], tsne_embeddings[outliers, 1], 
          c='red', s=50, alpha=0.8, label='Outliers', marker='x')
ax.set_title(f'DBSCAN ({n_noise_dbscan} outliers)', fontsize=12, fontweight='bold')
ax.set_xlabel('t-SNE 1')
ax.set_ylabel('t-SNE 2')
ax.legend()
ax.grid(True, alpha=0.3)

# 2. HDBSCAN outliers
ax = axes[1]
inliers = hdbscan_labels != -1
outliers = hdbscan_labels == -1
ax.scatter(tsne_embeddings[inliers, 0], tsne_embeddings[inliers, 1], 
          c='lightblue', s=30, alpha=0.5, label='Inliers')
ax.scatter(tsne_embeddings[outliers, 0], tsne_embeddings[outliers, 1], 
          c='red', s=50, alpha=0.8, label='Outliers', marker='x')
ax.set_title(f'HDBSCAN ({n_noise_hdbscan} outliers)', fontsize=12, fontweight='bold')
ax.set_xlabel('t-SNE 1')
ax.set_ylabel('t-SNE 2')
ax.legend()
ax.grid(True, alpha=0.3)

# 3. Isolation Forest outliers
ax = axes[2]
inliers = outlier_labels == 1
outliers = outlier_labels == -1
ax.scatter(tsne_embeddings[inliers, 0], tsne_embeddings[inliers, 1], 
          c='lightblue', s=30, alpha=0.5, label='Inliers')
ax.scatter(tsne_embeddings[outliers, 0], tsne_embeddings[outliers, 1], 
          c='red', s=50, alpha=0.8, label='Outliers', marker='x')
ax.set_title(f'Isolation Forest ({n_outliers_iso} outliers)', fontsize=12, fontweight='bold')
ax.set_xlabel('t-SNE 1')
ax.set_ylabel('t-SNE 2')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('outlier_detection.png', dpi=300, bbox_inches='tight')
plt.show()

print("Outlier detection visualization saved as 'outlier_detection.png'")

## 10. Summary & Conclusions

This tutorial demonstrated:

### Embeddings
- Generated medical image embeddings using BiomedCLIP
- Created separate text embeddings for QA pairs
- Saved embeddings as numpy files for reusability

### Clustering Methods
- **DBSCAN**: Density-based, identifies noise points
- **HDBSCAN**: Hierarchical version, more robust to parameter choices
- **K-means**: Simple, fast, requires predefined number of clusters
- **FINCH**: Parameter-free, hierarchical

### Dimensionality Reduction
- **PCA**: Fast, linear, good for initial exploration
- **t-SNE**: Excellent for visualization, preserves local structure
- **UMAP**: Faster than t-SNE, preserves more global structure
- **h-NNE**: Hierarchical, fast, structure-aware

### Outlier Detection
- Leveraged clustering noise points (DBSCAN/HDBSCAN)
- Applied Isolation Forest for dedicated outlier detection

### Key Takeaways
1. Different methods reveal different aspects of data structure
2. Clustering results vary significantly between methods
3. Dimensionality reduction choice affects visual interpretation
4. Medical imaging benefits from domain-specific embeddings (BiomedCLIP)

## Additional Experiments

Try these experiments to deepen your understanding:

1. **Different clustering parameters**: Adjust eps, min_samples for DBSCAN
2. **Different K for K-means**: Try k=5, 10, 15 and compare
3. **Text embeddings analysis**: Repeat clustering/reduction on QA embeddings
4. **Combined embeddings**: Concatenate image + text embeddings
5. **Other metrics**: Try clustering with cosine distance
6. **Hierarchical visualization**: Explore FINCH hierarchy levels