# Math Lab — FAANG-Level Mixed Problems

This lab is a *problem set + mini-verification* notebook.

---

In [2]:
import numpy as np

def check(name: str, cond: bool):
    if not cond:
        raise AssertionError(f'Failed: {name}')
    print(f'OK: {name}')

rng = np.random.default_rng(0)

## Problem 1 — Projection Matrix Properties

Let P be a projection matrix onto a subspace.
1) Show that P^2 = P (idempotent).
2) For orthogonal projection, show P = P^T.

### TODO (code): Construct projection onto span(u) and verify properties
# HINT:
P = u u^T / (u^T u)


In [3]:
u = rng.standard_normal(5)          # generate a random 5-dimensional vector

P = np.outer(u, u) / (u @ u)        # constructing projection matrix onto the direction of vector u
check('idempotent', np.allclose(P @ P, P, atol=1e-8))   # verify idempotence property
check('symmetric', np.allclose(P, P.T, atol=1e-8))      # verify symmetry property

OK: idempotent
OK: symmetric


## Problem 2 — PSD Matrix Check

Show that for any matrix X, the matrix A = X^T X is positive semidefinite (PSD).

### TODO (code): sample random X, build A, verify v^T A v >= 0 for random v
# HINT:
v^T X^T X v = ||Xv||^2 >= 0


In [4]:
X = rng.standard_normal((10, 4))            # generate data matrix with 10 samples and 4 features
A = X.T @ X                                 # form Gram matrix which is symmetric and positive semidefinite
for _ in range(100):
    v = rng.standard_normal(4)              # Generate random test vector
    val = float(v.T @ A @ v)                # Compute quadratic form to test positive semidefiniteness
    if val < -1e-8:                         # small numerical error is allowed but reject negative values
        raise AssertionError('Not PSD?')    # AssertionError if matrix violates PSD property
print('PSD check passed')

PSD check passed


## Problem 3 — Least Squares Derivation

Derive the normal equations for minimizing ||Xw - y||^2.

### TODO (code): compare w_hat from solve vs np.linalg.lstsq
# HINT:
w = (X^T X)^{-1} X^T y


In [5]:
n,d = 200, 5                                # number of samples (n) and number of features (d)
X = rng.standard_normal((n,d))              # feature matrix with random values
w_true = rng.standard_normal(d)             # generate true weight vector
y = X@w_true + 0.1*rng.standard_normal(n)   # adding noise to w_true and generate target values as linear combination of features

w_hat = np.linalg.solve(X.T @ X, X.T @ y)   # compute closed-form least squares solution using normal equation

w_lstsq, *_ = np.linalg.lstsq(X, y, rcond=None)         # compute least squares solution using numerically stable built-in solver
check('close', np.allclose(w_hat, w_lstsq, atol=1e-6))  # verify both methods

OK: close


## Problem 4 — Bayes + Base Rate (Derivation)

Re-derive P(D|+) for the disease test scenario and explain the base-rate fallacy in 2-3 sentences.

### TODO (code): simulate and compare to analytic


In [None]:
# The base rate fallacy is ignoring how common something is (the prior probability) when interpreting new evidence. People overestimate the possibility of a disease after a positive test by focusing on test accuracy and forgetting that the disease itself may be rare.

In [6]:
P_D = 0.01                  # probability of having the disease
P_pos_given_D = 0.99        # probability of positive test given disease
P_pos_given_notD = 0.05     # Probability of positive test given no disease
P_D_given_pos = (P_pos_given_D*P_D) / (P_pos_given_D*P_D + P_pos_given_notD*(1-P_D))  # Bayes’ theorem to compute probability of disease
print(P_D_given_pos)

N = 200000                              # number of samples
disease = rng.random(N) < P_D           # simulate disease occurrence using prior probability
test_pos = np.empty(N, dtype=bool)      # allocate array to store test outcomes
test_pos[disease] = rng.random(disease.sum()) < P_pos_given_D           # positive tests for diseased individuals
test_pos[~disease] = rng.random((~disease).sum()) < P_pos_given_notD    # false positives for healthy individuals
est = disease[test_pos].mean()          # Estimate P(D|+)
print('analytic', P_D_given_pos, 'sim', est)
check('close', abs(est - P_D_given_pos) < 0.01)

0.16666666666666669
analytic 0.16666666666666669 sim 0.16930459673345682
OK: close


## Problem 5 — PCA Link

Explain why PCA components are eigenvectors of the covariance matrix.

### TODO (code): compute covariance eigenvectors and compare with SVD directions
# HINT:
- Center X
- Cov = X^T X/(n-1)
- eigenvectors of Cov align with V from SVD


In [None]:
# PCA components are eigenvectors because eigenvectors of the covariance matrix are exactly the directions of maximum variance in the data, they maximize variance under orthogonality constraints.

In [7]:
X = rng.standard_normal((500, 20))                  # generate dataset with 500 samples and 20 features
Xc = X - X.mean(axis=0, keepdims=True)              # center each feature by subtracting its mean
Cov = (Xc.T @ Xc) / (Xc.shape[0] - 1)               # compute sample covariance matrix
eigvals, eigvecs = np.linalg.eigh(Cov)              # compute eigenvalues and eigenvectors of covariance matrix
idx = np.argsort(eigvals)[::-1]                     # sort eigenvalues in descending order
eigvecs = eigvecs[:, idx]                           # reorder eigenvectors to match sorted eigenvalues

U, S, Vt = np.linalg.svd(Xc, full_matrices=False)   # SVD on centered data matrix
V = Vt.T                                            # extract right singular vectors

# Compare subspaces spanned by top-k vectors via absolute correlation
k = 5
C = np.abs(eigvecs[:, :k].T @ V[:, :k])             # Measure alignment between PCA directions from both methods
print('abs alignment matrix (should be near diagonal)', C)
check('alignment', np.all(np.max(C, axis=1) > 0.9))

abs alignment matrix (should be near diagonal) [[1.00000000e+00 3.01139914e-14 2.14267735e-15 3.90646844e-15
  8.11237365e-16]
 [3.09196500e-14 1.00000000e+00 4.24782438e-15 8.47505382e-15
  6.40337552e-15]
 [2.04648869e-15 4.20577497e-15 1.00000000e+00 4.78971369e-14
  1.08537281e-14]
 [4.04548183e-15 8.20584485e-15 4.89486685e-14 1.00000000e+00
  6.03181136e-15]
 [7.35522754e-16 6.67521594e-15 1.13381526e-14 6.34908792e-15
  1.00000000e+00]]
OK: alignment


---
## Submission Checklist
- Derivations written
- TODO code complete
- Checks pass
