# **AI TECH INSTITUTE** ¬∑ *Intermediate AI & Data Science*
### Dimensionality Reduction - Visualizing High-Dimensional Data
**Instructor:** Amir Charkhi | **Dataset:** MNIST Handwritten Digits

---

## üìö What You'll Learn

**Theory:**
- The curse of dimensionality
- Principal Component Analysis (PCA)
- t-SNE for visualization
- Manifold learning (Isomap, LLE)
- When to use each method

**Practice:**
- Reduce 784 dimensions to 2D
- Interactive Plotly visualizations
- Compare all methods
- Real-world applications

---

## 1. Theory: The Curse of Dimensionality

### ‚ùì What is the Problem?

**Imagine you have data with 784 features...**
```
Each image = 28√ó28 pixels = 784 dimensions

Problem:
- Can't visualize 784 dimensions!
- Models become slow
- Need huge amounts of data
- Distance metrics break down
```

---

### üìä Why Does This Happen?

**The Volume Problem:**
```
1D: Line of length 10
    Volume = 10

2D: Square 10√ó10
    Volume = 100

3D: Cube 10√ó10√ó10
    Volume = 1,000

784D: Hypercube 10^784
    Volume = ASTRONOMICAL!

Result: Data becomes extremely sparse
```

**The Distance Problem:**
```
In high dimensions:
- All points seem equally far
- Near and far become meaningless
- Clustering becomes difficult
```

---

### üí° The Solution: Dimensionality Reduction

**Goal: Compress information**
```
784 dimensions ‚Üí 2 or 3 dimensions

While preserving:
- Important patterns
- Relationships
- Structure
```

**Benefits:**
- ‚úÖ Visualize data
- ‚úÖ Faster training
- ‚úÖ Remove noise
- ‚úÖ Better performance
- ‚úÖ Easier interpretation

---

## 2. Setup and Data Loading

In [None]:
# Core libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Plotly for visualizations
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

In [None]:
# Dimensionality reduction methods
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE, Isomap, LocallyLinearEmbedding
from sklearn.datasets import load_digits

### üìä Load MNIST Digits Dataset

**Dataset:** Handwritten digits (0-9)
- 8√ó8 pixel images = 64 features
- 1,797 samples
- Perfect for visualization!

In [None]:
# Load data
digits = load_digits()
X = digits.data
y = digits.target

print(f"üìä Dataset loaded: {X.shape[0]} samples, {X.shape[1]} features")

In [None]:
# What do the images look like?
print(f"Feature range: {X.min():.1f} to {X.max():.1f}")
print(f"Labels: {np.unique(y)}")

### üñºÔ∏è Visualize Sample Images

In [None]:
# Show first 10 digits
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='gray')
    ax.set_title(f'Label: {y[i]}')
    ax.axis('off')
plt.tight_layout()
plt.show()

### üîß Prepare Data

In [None]:
# Scale features (important for distance-based methods!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("‚úÖ Data scaled (mean=0, std=1)")

In [None]:
# Use subset for faster computation (optional)
n_samples = 1000  # Use 1000 samples
indices = np.random.choice(len(X_scaled), n_samples, replace=False)
X_subset = X_scaled[indices]
y_subset = y[indices]

print(f"‚úÖ Working with {n_samples} samples")

---
## 3. Method 1: PCA (Principal Component Analysis)

### üìñ Theory: How PCA Works

**Goal:** Find directions of maximum variance

**The Idea:**
```
Imagine a 3D cloud of points:

      ‚óè
    ‚óè ‚óè ‚óè
  ‚óè ‚óè ‚óè ‚óè ‚óè
    ‚óè ‚óè ‚óè
      ‚óè

PCA finds:
PC1: Direction with most spread ‚Üí
PC2: Second most spread (perpendicular) ‚Üë
PC3: Third most spread (perpendicular to both)
```

---

### üéØ What Are Principal Components?

**Principal Components = New axes**
```
PC1: Captures most variance
PC2: Captures second most
PC3: Captures third most
...

Together: Capture total variance
```

**Key Properties:**
- Perpendicular (orthogonal)
- Ordered by importance
- Linear combinations of original features

---

### üí™ PCA Advantages

‚úÖ **Fast** - Linear algebra only

‚úÖ **Interpretable** - Can see which features matter

‚úÖ **Scalable** - Works on large datasets

‚úÖ **Removes correlation** - New features independent

‚ùå **Linear only** - Can't capture complex patterns

---

### üéØ Practice: Apply PCA

In [None]:
# Create PCA model (reduce to 2D)
pca = PCA(n_components=2, random_state=42)

print("‚úÖ PCA model created (2 components)")

In [None]:
# Fit and transform
X_pca = pca.fit_transform(X_subset)

print(f"‚úÖ Reduced from {X_subset.shape[1]}D to {X_pca.shape[1]}D")

In [None]:
# How much variance captured?
variance_explained = pca.explained_variance_ratio_

print(f"PC1 captures: {variance_explained[0]:.1%} of variance")
print(f"PC2 captures: {variance_explained[1]:.1%} of variance")
print(f"Total: {variance_explained.sum():.1%}")

### üìä Visualize PCA Results

In [None]:
# Create DataFrame for plotting
df_pca = pd.DataFrame({
    'PC1': X_pca[:, 0],
    'PC2': X_pca[:, 1],
    'Digit': y_subset.astype(str)
})

In [None]:
# Interactive PCA plot
fig = px.scatter(
    df_pca,
    x='PC1',
    y='PC2',
    color='Digit',
    title=f'PCA: 64D ‚Üí 2D ({variance_explained.sum():.1%} variance preserved)',
    labels={'PC1': f'PC1 ({variance_explained[0]:.1%})', 
            'PC2': f'PC2 ({variance_explained[1]:.1%})'},
    template='plotly_white',
    width=900,
    height=700,
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.show()

üí° **Interpretation:**
- Similar digits cluster together
- Linear projection - some overlap
- Notice smooth transitions

---

### üìà Scree Plot: How Many Components?

In [None]:
# Fit PCA with all components
pca_full = PCA()
pca_full.fit(X_subset)

print("‚úÖ Full PCA computed")

In [None]:
# Calculate cumulative variance
cumulative_variance = np.cumsum(pca_full.explained_variance_ratio_)

# Find number of components for 95% variance
n_components_95 = np.argmax(cumulative_variance >= 0.95) + 1

print(f"Components needed for 95% variance: {n_components_95}")

In [None]:
# Plot scree plot
fig = go.Figure()

# Individual variance
fig.add_trace(go.Bar(
    x=list(range(1, 21)),
    y=pca_full.explained_variance_ratio_[:20],
    name='Individual',
    marker_color='lightblue'
))

# Cumulative variance
fig.add_trace(go.Scatter(
    x=list(range(1, 21)),
    y=cumulative_variance[:20],
    name='Cumulative',
    mode='lines+markers',
    line=dict(color='red', width=3),
    marker=dict(size=8),
    yaxis='y2'
))

fig.update_layout(
    title='Scree Plot: Variance Explained by Component',
    xaxis_title='Principal Component',
    yaxis_title='Variance Explained',
    yaxis2=dict(title='Cumulative Variance', overlaying='y', side='right', range=[0, 1]),
    template='plotly_white',
    width=900,
    height=500,
    showlegend=True
)

fig.show()

üí° **How to use:**
- Look for "elbow" in the curve
- Or choose components to reach target variance (e.g., 95%)

---

## 4. Method 2: t-SNE

### üìñ Theory: t-Distributed Stochastic Neighbor Embedding

**Goal:** Preserve local structure (keep similar points close)

**The Idea:**
```
High-dimensional space:    Low-dimensional space:
    A‚îÄB                        A‚îÄB
    ‚îÇ ‚îÇ                        ‚îÇ ‚îÇ  
    C D                        C D
    ‚Üì ‚Üì                        (same relationships!)
    E F

Keep neighbors together
Distant points can move
```

---

### üéØ How t-SNE Works

**Step 1:** Measure similarity in high-D
```
"How similar is point A to point B?"
Use Gaussian distribution
```

**Step 2:** Create random low-D layout
```
Start with random positions
```

**Step 3:** Iteratively adjust
```
Move points to preserve similarities
Minimize difference between high-D and low-D similarities
Use gradient descent
```

**Step 4:** Use t-distribution in low-D
```
Heavy tails ‚Üí Better separation of clusters
```

---

### üí™ t-SNE Advantages

‚úÖ **Non-linear** - Captures complex patterns

‚úÖ **Beautiful visualizations** - Clear clusters

‚úÖ **Preserves local structure** - Neighbors stay close

‚ùå **Slow** - Computationally expensive

‚ùå **Non-deterministic** - Different runs give different results

‚ùå **Can't transform new data** - Need to rerun

---

### ‚öôÔ∏è Key Parameters

**perplexity:**
```
"How many neighbors to consider?"
Range: 5-50 (default 30)
Larger dataset ‚Üí Higher perplexity
```

**learning_rate:**
```
Step size for optimization
Range: 10-1000 (default 200)
```

**n_iter:**
```
Number of optimization steps
Default: 1000
More = better (but slower)
```

---

### üéØ Practice: Apply t-SNE

In [None]:
# Create t-SNE model
tsne = TSNE(
    n_components=2,
    perplexity=30,
    learning_rate=200,
    n_iter=1000,
    random_state=42
)

print("‚úÖ t-SNE model created")

In [None]:
# Fit and transform (this takes a moment...)
print("üîÑ Running t-SNE (this takes ~30 seconds)...")
X_tsne = tsne.fit_transform(X_subset)

print("‚úÖ t-SNE complete!")

### üìä Visualize t-SNE Results

In [None]:
# Create DataFrame
df_tsne = pd.DataFrame({
    'Dim1': X_tsne[:, 0],
    'Dim2': X_tsne[:, 1],
    'Digit': y_subset.astype(str)
})

In [None]:
# Interactive t-SNE plot
fig = px.scatter(
    df_tsne,
    x='Dim1',
    y='Dim2',
    color='Digit',
    title='t-SNE: 64D ‚Üí 2D (Non-linear)',
    labels={'Dim1': 't-SNE Dimension 1', 'Dim2': 't-SNE Dimension 2'},
    template='plotly_white',
    width=900,
    height=700,
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.show()

üí° **Notice the difference from PCA:**
- Much clearer clusters!
- Better separation between digits
- Non-linear transformation preserves local structure

---

## 5. Method 3: Isomap (Manifold Learning)

### üìñ Theory: Isometric Mapping

**Goal:** Preserve geodesic distances (distances along manifold)

**The Manifold Hypothesis:**
```
High-dimensional data often lies on
a lower-dimensional curved surface
(a manifold)

Example: Swiss Roll
    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
    ‚îÇ     ‚îÇ  
    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚Üê Unroll to 2D!
```

---

### üéØ How Isomap Works

**Step 1:** Build neighborhood graph
```
Connect each point to K nearest neighbors
```

**Step 2:** Compute shortest paths
```
Find geodesic distances along graph
(not straight-line distances!)
```

**Step 3:** Classical MDS
```
Create low-D embedding that preserves distances
```

---

### üí™ Isomap Advantages

‚úÖ **Non-linear** - Follows curved manifolds

‚úÖ **Global structure** - Preserves overall geometry

‚úÖ **Deterministic** - Same result every time

‚ùå **Sensitive to neighbors** - K matters a lot

‚ùå **Doesn't handle holes** - Needs connected manifold

---

### üéØ Practice: Apply Isomap

In [None]:
# Create Isomap model
isomap = Isomap(n_components=2, n_neighbors=10)

print("‚úÖ Isomap model created (10 neighbors)")

In [None]:
# Fit and transform
print("üîÑ Running Isomap...")
X_isomap = isomap.fit_transform(X_subset)

print("‚úÖ Isomap complete!")

### üìä Visualize Isomap Results

In [None]:
# Create DataFrame
df_isomap = pd.DataFrame({
    'Dim1': X_isomap[:, 0],
    'Dim2': X_isomap[:, 1],
    'Digit': y_subset.astype(str)
})

In [None]:
# Interactive Isomap plot
fig = px.scatter(
    df_isomap,
    x='Dim1',
    y='Dim2',
    color='Digit',
    title='Isomap: 64D ‚Üí 2D (Manifold Learning)',
    labels={'Dim1': 'Isomap Dimension 1', 'Dim2': 'Isomap Dimension 2'},
    template='plotly_white',
    width=900,
    height=700,
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.show()

üí° **Isomap characteristics:**
- Preserves global structure
- Curved paths respected
- Middle ground between PCA and t-SNE

---

## 6. Method 4: LLE (Locally Linear Embedding)

### üìñ Theory: Local Linear Structure

**Goal:** Preserve local linear relationships

**The Idea:**
```
Each point can be reconstructed from neighbors

Point X = w1√óA + w2√óB + w3√óC
          (weighted sum of neighbors)

Find low-D embedding with same weights!
```

---

### üéØ How LLE Works

**Step 1:** Find K nearest neighbors
```
For each point, identify local neighborhood
```

**Step 2:** Compute reconstruction weights
```
How to express point as weighted sum of neighbors?
```

**Step 3:** Find low-D embedding
```
Keep the same weights in lower dimensions
Solve eigenvalue problem
```

---

### üí™ LLE Advantages

‚úÖ **Non-linear** - Handles curved manifolds

‚úÖ **Preserves local geometry** - Neighborhood structure

‚úÖ **Single optimization** - No iterations needed

‚ùå **Sensitive to neighbors** - K choice critical

‚ùå **Can have instabilities** - Regularization needed

---

### üéØ Practice: Apply LLE

In [None]:
# Create LLE model
lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10, random_state=42)

print("‚úÖ LLE model created (10 neighbors)")

In [None]:
# Fit and transform
print("üîÑ Running LLE...")
X_lle = lle.fit_transform(X_subset)

print("‚úÖ LLE complete!")

### üìä Visualize LLE Results

In [None]:
# Create DataFrame
df_lle = pd.DataFrame({
    'Dim1': X_lle[:, 0],
    'Dim2': X_lle[:, 1],
    'Digit': y_subset.astype(str)
})

In [None]:
# Interactive LLE plot
fig = px.scatter(
    df_lle,
    x='Dim1',
    y='Dim2',
    color='Digit',
    title='LLE: 64D ‚Üí 2D (Local Linear Embedding)',
    labels={'Dim1': 'LLE Dimension 1', 'Dim2': 'LLE Dimension 2'},
    template='plotly_white',
    width=900,
    height=700,
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.show()

---
## 7. Compare All Methods

### üìä Side-by-Side Visualization

In [None]:
# Create 2x2 subplot comparison
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('PCA (Linear)', 't-SNE (Non-linear)', 
                    'Isomap (Manifold)', 'LLE (Local Linear)'),
    horizontal_spacing=0.1,
    vertical_spacing=0.15
)

# PCA
for digit in range(10):
    mask = df_pca['Digit'] == str(digit)
    fig.add_trace(
        go.Scatter(
            x=df_pca[mask]['PC1'],
            y=df_pca[mask]['PC2'],
            mode='markers',
            name=str(digit),
            marker=dict(size=5),
            showlegend=False
        ),
        row=1, col=1
    )

# t-SNE
for digit in range(10):
    mask = df_tsne['Digit'] == str(digit)
    fig.add_trace(
        go.Scatter(
            x=df_tsne[mask]['Dim1'],
            y=df_tsne[mask]['Dim2'],
            mode='markers',
            name=str(digit),
            marker=dict(size=5),
            showlegend=False
        ),
        row=1, col=2
    )

# Isomap
for digit in range(10):
    mask = df_isomap['Digit'] == str(digit)
    fig.add_trace(
        go.Scatter(
            x=df_isomap[mask]['Dim1'],
            y=df_isomap[mask]['Dim2'],
            mode='markers',
            name=str(digit),
            marker=dict(size=5),
            showlegend=False
        ),
        row=2, col=1
    )

# LLE
for digit in range(10):
    mask = df_lle['Digit'] == str(digit)
    fig.add_trace(
        go.Scatter(
            x=df_lle[mask]['Dim1'],
            y=df_lle[mask]['Dim2'],
            mode='markers',
            name=str(digit),
            marker=dict(size=5),
            showlegend=False
        ),
        row=2, col=2
    )

fig.update_layout(
    title_text="Dimensionality Reduction Methods Comparison",
    height=900,
    width=1200,
    template='plotly_white',
    showlegend=False
)

fig.show()

üí° **Notice the differences:**
- **PCA:** Linear, some overlap
- **t-SNE:** Best cluster separation
- **Isomap:** Preserves global structure
- **LLE:** Local linear relationships

---

## 8. Method Comparison Table

### üìã When to Use Each Method

In [None]:
# Create comparison DataFrame
comparison = pd.DataFrame({
    'Method': ['PCA', 't-SNE', 'Isomap', 'LLE'],
    'Type': ['Linear', 'Non-linear', 'Non-linear', 'Non-linear'],
    'Speed': ['Fast', 'Slow', 'Medium', 'Medium'],
    'Deterministic': ['Yes', 'No', 'Yes', 'Yes'],
    'Preserves': ['Global variance', 'Local structure', 'Geodesic distance', 'Local linearity'],
    'Best For': ['Quick EDA', 'Visualization', 'Manifolds', 'Local structure']
})

comparison

---
## 9. Choosing the Right Method

### üéØ Decision Guide

**Use PCA when:**
```
‚úÖ Need interpretable components
‚úÖ Want to remove correlation
‚úÖ Have large dataset (scalable)
‚úÖ Need to transform new data
‚úÖ Linear relationships sufficient
‚úÖ Want to know variance explained
```

**Use t-SNE when:**
```
‚úÖ Creating visualizations only
‚úÖ Want clear cluster separation
‚úÖ Local structure matters most
‚úÖ Have time for computation
‚úÖ Don't need to transform new data
```

**Use Isomap when:**
```
‚úÖ Data lies on curved manifold
‚úÖ Global structure important
‚úÖ Need deterministic results
‚úÖ Have well-connected data
```

**Use LLE when:**
```
‚úÖ Local geometry critical
‚úÖ Data has local linear structure
‚úÖ Need single optimization
```

---

### üíº Real-World Workflow

**Step 1:** Start with PCA
```
- Fast baseline
- See variance explained
- Understand data structure
```

**Step 2:** If PCA unsatisfactory ‚Üí Try t-SNE
```
- Better cluster visualization
- Validate PCA results
```

**Step 3:** For manifold data ‚Üí Try Isomap/LLE
```
- When data has known structure
- Swiss roll, S-curve, etc.
```

**Step 4:** Compare multiple methods
```
- Different methods reveal different aspects
- No single "best" method
```

---

## 10. Practical Applications

### üåç Where Is This Used?

**1. Data Visualization**
```
Problem: 1000-dimensional data
Solution: Reduce to 2D/3D for plotting
Example: Gene expression analysis
```

**2. Feature Engineering**
```
Problem: Too many correlated features
Solution: PCA to create independent components
Example: Image preprocessing
```

**3. Noise Reduction**
```
Problem: Noisy measurements
Solution: Keep only top principal components
Example: Signal processing
```

**4. Preprocessing for ML**
```
Problem: Curse of dimensionality
Solution: Reduce dimensions before training
Example: Text classification (TF-IDF ‚Üí PCA)
```

**5. Anomaly Detection**
```
Problem: Outliers in high dimensions
Solution: Project to low-D, find outliers
Example: Fraud detection
```

**6. Exploratory Data Analysis**
```
Problem: Understanding data structure
Solution: Visualize in 2D/3D
Example: Customer segmentation
```

---

## 11. Key Takeaways

### ‚úÖ What We Learned

**1. The Curse of Dimensionality:**
- High dimensions ‚Üí Sparse data
- Distance metrics break down
- Need massive amounts of data
- Solution: Dimensionality reduction

**2. Four Key Methods:**

**PCA (Principal Component Analysis):**
- Linear transformation
- Finds directions of maximum variance
- Fast and interpretable
- Good for initial exploration

**t-SNE (t-Distributed Stochastic Neighbor Embedding):**
- Non-linear, preserves local structure
- Beautiful cluster visualizations
- Slow, non-deterministic
- Best for visualization only

**Isomap (Isometric Mapping):**
- Non-linear, preserves geodesic distances
- Good for manifold data
- Deterministic, moderate speed
- Preserves global structure

**LLE (Locally Linear Embedding):**
- Non-linear, preserves local linearity
- Single optimization step
- Good for locally linear manifolds

**3. Choosing Method:**
- Start with PCA (fast baseline)
- Use t-SNE for visualization
- Try Isomap/LLE for manifolds
- Compare multiple methods

**4. Practical Tips:**
- Always scale features first
- PCA for features, t-SNE for visualization
- No single "best" method
- Domain knowledge crucial

---

### üí° Best Practices

**Always:**
- ‚úÖ Scale features (StandardScaler)
- ‚úÖ Try PCA first
- ‚úÖ Visualize results
- ‚úÖ Check explained variance (PCA)
- ‚úÖ Tune hyperparameters

**Never:**
- ‚ùå Skip scaling
- ‚ùå Use t-SNE for feature engineering
- ‚ùå Trust one method only
- ‚ùå Ignore original features
- ‚ùå Over-interpret distances in t-SNE

**Remember:**
```
PCA = Fast exploration
t-SNE = Beautiful plots
Isomap/LLE = Manifold data

Different methods = Different insights!
```

---

**Congratulations! You can now visualize high-dimensional data!** üéâ

**Next Steps:**
- Apply to your own data
- Try different parameters
- Combine with clustering
- Explore UMAP (modern alternative)

---

**AI Tech Institute** | *Building Tomorrow's AI Engineers Today*