# Dimensionality Reduction

High-dimensional data is common in machine learning but difficult to visualize and understand. Dimensionality reduction techniques project high-dimensional data into lower dimensions while preserving important structure. We'll explore three popular methods: metric MDS, t-SNE, and UMAP.

Each method makes different tradeoffs:
- **Metric MDS**: Preserves pairwise distances exactly
- **t-SNE**: Reveals local clusters and neighborhoods
- **UMAP**: Balances local and global structure preservation

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_swiss_roll, make_circles
from sklearn.manifold import MDS, TSNE
from sklearn.preprocessing import StandardScaler
import umap

plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 4)
np.random.seed(42)

## Metric MDS (Multidimensional Scaling)

Metric MDS finds a low-dimensional embedding that preserves pairwise distances as closely as possible. Given a distance matrix, it minimizes stress:

$$\text{Stress} = \sqrt{\frac{\sum_{i<j} (d_{ij} - \hat{d}_{ij})^2}{\sum_{i<j} d_{ij}^2}}$$

where $d_{ij}$ are original distances and $\hat{d}_{ij}$ are embedded distances.

In [None]:
# Generate 3D Swiss roll data
X_swiss, color_swiss = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)

# Apply Metric MDS
mds = MDS(n_components=2, metric=True, random_state=42)
X_mds = mds.fit_transform(X_swiss)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Original 3D data
ax1 = fig.add_subplot(121, projection='3d')
ax1.scatter(X_swiss[:, 0], X_swiss[:, 1], X_swiss[:, 2], c=color_swiss, cmap='viridis', alpha=0.7)
ax1.set_title('Original Swiss Roll (3D)')
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
ax1.set_zlabel('Z')

# MDS projection
ax2 = fig.add_subplot(122)
scatter = ax2.scatter(X_mds[:, 0], X_mds[:, 1], c=color_swiss, cmap='viridis', alpha=0.7)
ax2.set_title(f'Metric MDS (Stress: {mds.stress_:.3f})')
ax2.set_xlabel('Component 1')
ax2.set_ylabel('Component 2')

plt.tight_layout()
plt.show()

print(f"MDS preserves distances with stress = {mds.stress_:.3f}")
print(f"Lower stress indicates better distance preservation")

## t-SNE (t-Distributed Stochastic Neighbor Embedding)

t-SNE converts similarities between data points to joint probabilities and tries to minimize the KL divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. It's particularly good at revealing clusters and local structure.

In [None]:
# Apply t-SNE to Swiss roll
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_tsne = tsne.fit_transform(X_swiss)

# Also create a more complex dataset
X_circles, y_circles = make_circles(n_samples=800, noise=0.1, factor=0.3, random_state=42)
X_tsne_circles = TSNE(n_components=2, perplexity=30, random_state=42).fit_transform(X_circles)

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Swiss roll t-SNE
axes[0,0].scatter(X_tsne[:, 0], X_tsne[:, 1], c=color_swiss, cmap='viridis', alpha=0.7)
axes[0,0].set_title('t-SNE: Swiss Roll')
axes[0,0].set_xlabel('t-SNE 1')
axes[0,0].set_ylabel('t-SNE 2')

# Original circles
axes[0,1].scatter(X_circles[:, 0], X_circles[:, 1], c=y_circles, cmap='tab10', alpha=0.7)
axes[0,1].set_title('Original Circles (2D)')
axes[0,1].set_xlabel('Feature 1')
axes[0,1].set_ylabel('Feature 2')

# Circles t-SNE
axes[1,0].scatter(X_tsne_circles[:, 0], X_tsne_circles[:, 1], c=y_circles, cmap='tab10', alpha=0.7)
axes[1,0].set_title('t-SNE: Circles')
axes[1,0].set_xlabel('t-SNE 1')
axes[1,0].set_ylabel('t-SNE 2')

# Remove bottom right subplot
axes[1,1].axis('off')

plt.tight_layout()
plt.show()

print("t-SNE excels at separating clusters and preserving local neighborhoods")
print("Perplexity parameter controls focus on local vs global structure")

## UMAP (Uniform Manifold Approximation and Projection)

UMAP constructs a topological representation of the data and finds a low-dimensional embedding that has the closest possible equivalent representation. It's faster than t-SNE and better preserves global structure while still revealing local patterns.

In [None]:
# Apply UMAP to both datasets
umap_swiss = umap.UMAP(n_components=2, random_state=42)
X_umap_swiss = umap_swiss.fit_transform(X_swiss)

umap_circles = umap.UMAP(n_components=2, random_state=42)
X_umap_circles = umap_circles.fit_transform(X_circles)

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Swiss roll UMAP
axes[0,0].scatter(X_umap_swiss[:, 0], X_umap_swiss[:, 1], c=color_swiss, cmap='viridis', alpha=0.7)
axes[0,0].set_title('UMAP: Swiss Roll')
axes[0,0].set_xlabel('UMAP 1')
axes[0,0].set_ylabel('UMAP 2')

# Circles UMAP
axes[0,1].scatter(X_umap_circles[:, 0], X_umap_circles[:, 1], c=y_circles, cmap='tab10', alpha=0.7)
axes[0,1].set_title('UMAP: Circles')
axes[0,1].set_xlabel('UMAP 1')
axes[0,1].set_ylabel('UMAP 2')

# Remove bottom subplots
axes[1,0].axis('off')
axes[1,1].axis('off')

plt.tight_layout()
plt.show()

print("UMAP preserves both local and global structure")
print("Generally faster than t-SNE and more stable across runs")

## Method Comparison

Let's compare all three methods on the same dataset to see their different strengths.

In [None]:
# Use Swiss roll for comparison
# We already have X_mds, X_tsne, X_umap_swiss from above

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# MDS
axes[0].scatter(X_mds[:, 0], X_mds[:, 1], c=color_swiss, cmap='viridis', alpha=0.7)
axes[0].set_title('Metric MDS\n(Distance Preservation)')
axes[0].set_xlabel('Component 1')
axes[0].set_ylabel('Component 2')

# t-SNE
axes[1].scatter(X_tsne[:, 0], X_tsne[:, 1], c=color_swiss, cmap='viridis', alpha=0.7)
axes[1].set_title('t-SNE\n(Local Structure)')
axes[1].set_xlabel('t-SNE 1')
axes[1].set_ylabel('t-SNE 2')

# UMAP
axes[2].scatter(X_umap_swiss[:, 0], X_umap_swiss[:, 1], c=color_swiss, cmap='viridis', alpha=0.7)
axes[2].set_title('UMAP\n(Local + Global)')
axes[2].set_xlabel('UMAP 1')
axes[2].set_ylabel('UMAP 2')

plt.tight_layout()
plt.show()

print("Key Differences:")
print("• MDS: Linear method, preserves all pairwise distances")
print("• t-SNE: Non-linear, emphasizes local neighborhoods and clusters")
print("• UMAP: Non-linear, preserves both local clustering and global structure")