# LDA, t-SNE, and UMAP for Dimensionality Reduction and Visualization

## üìö Learning Objectives

By completing this notebook, you will:
- Implement Linear Discriminant Analysis (LDA) for classification improvement
- Visualize high-dimensional data using t-SNE
- Apply UMAP for dimensionality reduction and visualization
- Compare LDA, t-SNE, and UMAP with PCA

## üîó Prerequisites

- ‚úÖ Understanding of PCA and dimensionality reduction
- ‚úÖ Python 3.8+ installed

---

## Official Structure Reference

This notebook covers practical activities from **Course 04, Unit 4**:
- Implementing LDA for improving classification performance
- Visualizing transformed data using tools like t-SNE and UMAP
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 4 Practical Content

---

## Introduction

**LDA**: Supervised dimensionality reduction that maximizes class separation
**t-SNE**: Non-linear technique for visualization in 2D/3D
**UMAP**: Modern dimensionality reduction preserving local and global structure


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets 
import make_classification, load_iris
from sklearn.discriminant_analysis 
import LinearDiscriminantAnalysis
from sklearn.decomposition 
import PCA
from sklearn.model_selection 
import train_test_split
from sklearn.ensemble 
import RandomForestClassifier
from sklearn.metrics 
import accuracy_score

# Try importing t-SNE and UMAP
try:
    from sklearn.manifold 
import TSNE_HAS_TSNE = True
except ImportError:
    HAS_TSNE = False_
print("‚ö†Ô∏è  t-SNE not available (install scikit-learn)")

try:
    import umap_HAS_UMAP = True
except ImportError:
    HAS_UMAP = False_
print("‚ö†Ô∏è  UMAP not available (install with: pip install umap-learn)")

print("‚úÖ Libraries imported!")


## Part 1: Linear Discriminant Analysis (LDA)


In [None]:
# Load Iris dataset for LDA demonstration_iris = load_iris()
X, y = iris.data, iris.target

# Compare: PCA vs LDA for classification
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# PCA (unsupervised)
pca = PCA(n_components=2)_X_train_pca =  pca.fit_transform(X_train)
X_train_pca = pca.fit_transform(X_train)_X_test_pca =  pca.transform(X_test)
X_test_pca = pca.transform(X_test)

# LDA (supervised - uses class labels)
lda = LinearDiscriminantAnalysis(n_components=2)_X_train_lda =  lda.fit_transform(X_train, y_train)
X_train_lda = lda.fit_transform(X_train, y_train)_X_test_lda =  lda.transform(X_test)
X_test_lda = lda.transform(X_test)

# Train classifiers on reduced dimensions_rf_pca = RandomForestClassifier(random_state=42)
rf_pca.fit(X_train_pca, y_train)_pca_acc =  accuracy_score(y_test, rf_pca.predict(X_test_pca))
pca_acc = accuracy_score(y_test, rf_pca.predict(X_test_pca))

rf_lda = RandomForestClassifier(random_state=42)
rf_lda.fit(X_train_lda, y_train)_lda_acc =  accuracy_score(y_test, rf_lda.predict(X_test_lda))
lda_acc = accuracy_score(y_test, rf_lda.predict(X_test_lda))

print("=" * 60)
print("LDA vs PCA for Classification:")
print("=" * 60)
print(f"PCA Accuracy: {pca_acc:.4f}")
print(f"LDA Accuracy: {lda_acc:.4f}")
print(f"LDA Improvement: {lda_acc - pca_acc:.4f}")
print("\nNote: LDA maximizes class separation, often better for classification")


## Part 2: t-SNE for Visualization


In [None]:
if HAS_TSNE:
    # t-SNE for visualization
    # Note: t-SNE is computationally expensive, use on smaller datasets_tsne = TSNE(n_components=2, random_state=42, perplexity=30) X_tsne = tsne.fit_transform(X)
print("=" * 60)
print("t-SNE Visualization:")
print("=" * 60)
print(f"Original shape: {X.shape}")
print(f"t-SNE shape: {X_tsne.shape}")
print("‚úÖ t-SNE preserves local structure (nearby points stay nearby)")
else:
    print("Note: Install scikit-learn for t-SNE functionality")


## Part 3: UMAP for Dimensionality Reduction


In [None]:
if HAS_UMAP:
    # UMAP: Faster than t-SNE, preserves both local and global structure_reducer = umap.UMAP(n_components=2, random_state=42) X_umap = reducer.fit_transform(X)
print("=" * 60)
print("UMAP Dimensionality Reduction:")
print("=" * 60)
print(f"Original shape: {X.shape}")
print(f"UMAP shape: {X_umap.shape}")
print("‚úÖ UMAP preserves both local and global structure")
print("‚úÖ Faster than t-SNE, scales better to large datasets")
else:
    print("Note: Install umap-learn for UMAP: pip install umap-learn")


## Summary

### Key Concepts:
1. **LDA**: Supervised dimensionality reduction, maximizes class separation (better for classification)
2. **t-SNE**: Non-linear visualization, preserves local structure (good for exploration)
3. **UMAP**: Modern technique, preserves local and global structure, faster than t-SNE

### When to Use:
- **LDA**: When you have class labels and want better classification
- **t-SNE**: For visualization of high-dimensional data (small datasets)
- **UMAP**: For visualization and dimensionality reduction (scales better)

**Reference:** Course 04, Unit 4: "Implementing LDA" and "Visualizing transformed data using t-SNE and UMAP"
