# Lesson 15 - PCA via SVD and Variance Explained


## Objectives
- Compute PCA using SVD.
- Visualize principal components.
- Measure variance explained by top components.


## From the notes

**PCA**
- Find orthonormal directions that maximize variance.
- SVD on centered data yields principal components.

_TODO: Validate PCA derivation in the CS229 main notes PDF._


## Intuition
PCA rotates data into a new coordinate system ordered by variance, enabling dimensionality reduction with minimal reconstruction loss.


## Data
We use a 2D Gaussian dataset with correlated features.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

mean = np.array([0, 0])
cov = np.array([[2.0, 1.2], [1.2, 0.8]])
X = np.random.multivariate_normal(mean, cov, 200)
X_centered = X - X.mean(axis=0)

U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)
components = Vt
explained = (S**2) / np.sum(S**2)


## Experiments


In [None]:
explained


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.scatter(X_centered[:,0], X_centered[:,1], alpha=0.6)
for i, comp in enumerate(components[:2]):
    plt.arrow(0, 0, comp[0]*3, comp[1]*3, color="red", head_width=0.1)
plt.title("PCA directions")
plt.xlabel("x1")
plt.ylabel("x2")
plt.axis("equal")
plt.show()

plt.figure(figsize=(6,4))
plt.bar([1,2], explained[:2])
plt.title("Variance explained")
plt.xlabel("component")
plt.ylabel("fraction")
plt.show()


## Takeaways
- PCA finds orthogonal directions of maximum variance.
- Variance explained helps decide how many components to keep.


## Explain it in an interview
- Describe how PCA relates to the covariance matrix.
- Explain why centering the data matters.


## Exercises
- Reconstruct the data using only the first principal component.
- Compare PCA via eigen-decomposition vs SVD.
- Try PCA on non-centered data and observe changes.
