# HW4 — PCA (from scratch) + Collaborative Representation Classification (CRC) on LFW

**Starter Notebook** — Fill in the TODO blocks. Do not use `sklearn.decomposition.PCA`.

## What you will implement
1. Load LFW (already provided), stratified **60/15/25** split (provided).
2. PCA **from scratch** via SVD (TODO):
   - Fit on **train only**.
   - Choose **k** via a target variance ratio.
   - Project train/val/test consistently.
3. CRC (ridge-regularized collaborative representation) (TODO):
   - Precompute projection matrix $P=(D^T D + \lambda I)^{-1} D^T$.
   - Predict using per-class reconstruction residuals.
4. Experiments:
   - CRC without PCA: tune $\lambda$ on **val**, then retrain on **train+val**, test once.
   - CRC **with PCA**: tune (variance ratio, $\lambda$) on **val**, retrain on **train+val**, test once.

**Rules**
- Use the provided split; do **not** change it when comparing baselines.
- Fit PCA on **train only**; apply the same transform to val/test.
- Tune on **validation only**. Use **test once** at the very end with best hyperparams.
- You may use NumPy/Scikit-learn utilities (except PCA) for metrics/splits.

In [3]:
import time
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_lfw_people
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)
plt.rcParams['figure.dpi'] = 120
print('Imports ok.')

ModuleNotFoundError: No module named 'numpy'

## 1) Load LFW (min_faces_per_person=50) and flatten
- We use `resize=0.4` (as in the scikit-learn example).
- This starter filters identities with **≥ 50 images** (handled by `min_faces_per_person`).

In [None]:
lfw = fetch_lfw_people(min_faces_per_person=50, resize=0.4, color=False)
X_images = lfw.images                 # (n_samples, h, w)
X = lfw.data.astype(np.float32)       # (n_samples, h*w)
y = lfw.target                        # integer labels
target_names = lfw.target_names       # label -> name mapping
h, w = lfw.images.shape[1:3]

print('Images:', X_images.shape, '| Flattened:', X.shape, '| Labels:', y.shape)
print('Num classes:', len(target_names), 'Names:', list(target_names))

## 2) Visualize a few samples (optional sanity check)
Uncomment the last line to preview a small grid of faces.

In [None]:
def plot_faces(images, labels, label_names, n_row=2, n_col=6, title=None):
    plt.figure(figsize=(1.6*n_col, 2.0*n_row))
    if title: plt.suptitle(title)
    for i in range(n_row*n_col):
        ax = plt.subplot(n_row, n_col, i+1)
        ax.imshow(images[i], cmap='gray')
        ax.set_title(str(label_names[labels[i]]), fontsize=8)
        ax.set_xticks([]); ax.set_yticks([])
    plt.tight_layout(rect=[0,0,1,0.95]); plt.show()

# plot_faces(X_images[:12], y[:12], target_names, n_row=2, n_col=6, title='LFW samples')