# Day 32 — "Principal Component Analysis (PCA): Learning the Most Informative Directions"

PCA finds the directions where data varies most. Dimensionality reduction is just a consequence.


In [1]:
# Ensure repo root is on sys.path for local imports
import sys
from pathlib import Path

repo_root = Path.cwd()
if not (repo_root / "days").exists():
    for parent in Path.cwd().resolve().parents:
        if (parent / "days").exists():
            repo_root = parent
            break

sys.path.insert(0, str(repo_root))
print(f"Using repo root: {repo_root}")


Using repo root: /media/abdul-aziz/sdb7/masters_research/math_course_dlcv


## 1. Core Intuition

PCA chooses directions that maximize projected variance. The first component captures the dominant trend; the next captures the largest residual variance, and so on.


## 2. Mathematical Goal

Given centered data X, PCA solves:

maximize ||v||=1 Var(Xv)

which becomes an eigenvalue problem of the covariance matrix.


## 3. PCA and SVD

If X = U Σ V^T, then X^T X = V Σ^2 V^T. PCA directions are columns of V.


## 4. Python — PCA from Scratch

`days/day32/code/pca_demo.py` computes PCA and explained variance from correlated data.


In [2]:
from days.day32.code.pca_demo import pca_fit, explained_variance_ratio
import numpy as np

rng = np.random.default_rng(0)
X = rng.normal(0, 1, size=(300, 2))
X[:, 1] = 0.7 * X[:, 0] + 0.3 * rng.normal(0, 1, size=300)

mean, eigvals, eigvecs = pca_fit(X)
print("Eigenvalues:", eigvals)
print("Explained variance ratio:", explained_variance_ratio(eigvals))


Eigenvalues: [1.54055372 0.05073284]
Explained variance ratio: [0.96811835 0.03188165]


## 5. Visualization — PCA Directions & Reconstruction

`days/day32/code/visualizations.py` renders PCA axes, reconstruction, and explained variance plots.


In [3]:
from days.day32.code.visualizations import (
    plot_pca_directions,
    plot_reconstruction,
    plot_explained_variance,
)

RUN_FIGURES = False

if RUN_FIGURES:
    plot_pca_directions()
    plot_reconstruction()
    plot_explained_variance()
else:
    print("Set RUN_FIGURES = True to regenerate Day 32 figures inside days/day32/outputs/.")


Set RUN_FIGURES = True to regenerate Day 32 figures inside days/day32/outputs/.


## 6. Projection & Reconstruction

Project onto k components and reconstruct:

Z = X V_k
X_hat = Z V_k^T

Small k keeps structure while removing noise.


## 7. Why PCA Matters in DL

- Reveals low-rank structure in embeddings.
- Shows correlated directions that slow optimization.
- Explains why whitening or normalization helps.


## 8. Mini Exercises

1. Plot explained variance ratio and find the elbow.
2. Add noise and see which components absorb it.
3. Compare PCA vs random projection for classification.


## 9. Key Takeaways

- PCA finds maximum-variance directions.
- PCA = SVD on centered data.
- Large components carry signal; small components capture noise.
