# 🧠 Session 6: Dimensionality Reduction & Wrap-Up

## 🕒 00:00–00:15 — PCA Explained
**Principal Component Analysis (PCA)** is a method for reducing the dimensionality of a dataset while preserving as much variability (information) as possible.

**Why use PCA?**
- To reduce noise
- To visualize high-dimensional data
- To improve performance of ML models

**Steps in PCA:**
1. Standardize the data
2. Compute covariance matrix
3. Compute eigenvectors/eigenvalues
4. Select top-k principal components

In [None]:
# Load the Iris dataset (without labels)
from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris(as_frame=True)
X = iris.data
X.head()

In [None]:
# Apply PCA to reduce to 2 dimensions
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
print("Explained variance ratio:", pca.explained_variance_ratio_)

In [None]:
# Plot PCA result
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.title('PCA Projection of Iris Data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)
plt.show()

## 🕒 00:15–00:25 — Anomaly Detection Basics
**Anomaly Detection** is the process of identifying data points that deviate significantly from the majority.

**When is it useful?**
- Fraud detection
- Server monitoring
- Defect detection

Common algorithms:
- Isolation Forest
- Local Outlier Factor (LOF)
- One-Class SVM

In [None]:
# Optional: Try Isolation Forest on Iris data
from sklearn.ensemble import IsolationForest

clf = IsolationForest(contamination=0.05, random_state=42)
outliers = clf.fit_predict(X)

# Visualize anomaly scores
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=outliers, cmap='coolwarm')
plt.title('Anomaly Detection using Isolation Forest')
plt.grid(True)
plt.show()

## 🕒 00:25–00:40 — Hands-On: PCA + Clustering (Optional)

In [None]:
# Apply KMeans after PCA (just for fun)
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(X_pca)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=clusters, cmap='Set1')
plt.title('Clustering on PCA-Reduced Iris Data')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.grid(True)
plt.show()

## 🕒 00:40–00:45 — Recap & Exit Ticket
- ✅ PCA helps visualize and simplify data
- ✅ Anomalies = rare but important data points
- ✅ Combining techniques = better insight

**Reflection Prompt:**
What concept today was most surprising or useful to you?

**Mini Quiz:**
1. What does PCA aim to preserve?
2. Name one use case of anomaly detection.
3. Why reduce dimensions before clustering?