# Linear Algebra for ML — FAANG-Level Lab

**Goal:** Build shape intuition + compute core linear algebra objects used in ML.

**Outcome:** You can implement least squares, projections, eigen decomposition intuition, and SVD-based PCA.


In [2]:
import numpy as np

def check(name: str, cond: bool):
    if not cond:
        raise AssertionError(f'Failed: {name}')
    print(f'OK: {name}')

rng = np.random.default_rng(0)

## Section 1 — Vectors, Dot Product, Norms

### Task 1.1: Implement dot + L2 norm (no np.linalg.norm)

# HINT:
- dot(x,y) = sum(x_i * y_i)
- ||x||_2 = sqrt(dot(x,x))

**Explain:** What does dot product measure geometrically?

it takes direction into account and measures the projection of one vector on to another vector.

In [3]:
def dot(x, y):
    # TODO
    return sum(x * y)

def l2_norm(x):
    # TODO
    return np.sqrt(sum(x * x))

x = np.array([1., 2., 3.])
y = np.array([4., 5., 6.])
check('dot', abs(dot(x,y) - 32.0) < 1e-9)
check('norm', abs(l2_norm(x) - np.sqrt(14.0)) < 1e-9)

OK: dot
OK: norm


## Section 2 — Matrix Multiplication + Shapes

### Task 2.1: Validate shapes and compute A@B

Given A (n,d) and B (d,k) compute C (n,k).

**FAANG gotcha:** Many bugs are shape bugs. Always assert shapes.

In [4]:
A = rng.standard_normal((5, 3))
B = rng.standard_normal((3, 2))

# TODO: compute C
C = A @ B # Other ways to do it are C = np.dot(A, B) or C = np.matmul(A, B)

check('C_shape', C.shape == (5, 2))
check('matmul_close', np.allclose(C, A @ B))

OK: C_shape
OK: matmul_close


## Section 3 — Projections (Least Squares Intuition)

### Task 3.1: Project vector v onto vector u

proj_u(v) = (u^T v / u^T u) * u

# HINT:
- Use your dot()

**Explain:** Why does projection show up in linear regression?

it is because projection is used to get the shortest distance which helps in finding the error. by recuding the overall error across all data points we can arraive at the best fit model for a given data. 

In [5]:
def proj(u, v):
    # TODO
    return (dot(u, v) / dot(u, u)) * u # Projection formula is directly applied here

u = np.array([1., 0., 0.])
v = np.array([2., 3., 4.])
p = proj(u, v)
check('proj', np.allclose(p, np.array([2., 0., 0.])))

OK: proj


## Section 4 — Least Squares (Closed Form)

### Task 4.1: Solve min_w ||Xw - y||^2

Use normal equation: w = (X^T X)^{-1} X^T y

# HINT:
- Use `np.linalg.solve` (more stable than explicit inverse)

**FAANG gotcha:** Don’t compute matrix inverse unless you must.

In [10]:
n, d = 50, 3
X = rng.standard_normal((n, d))
w_true = np.array([1.5, -2.0, 0.7])
y = X @ w_true + 0.01 * rng.standard_normal(n)

# TODO: compute w_hat using solve
w_hat = np.linalg.solve(X.T @ X, X.T @ y) # Using the normal equation to find w_hat as specified

err = np.linalg.norm(w_hat - w_true)
print('w_true', w_true)
print('w_hat ', w_hat)
print('L2 error', err)
check('close', err < 0.2)

w_true [ 1.5 -2.   0.7]
w_hat  [ 1.50051512 -2.00176197  0.70098954]
L2 error 0.002085441683260867
OK: close


## Section 5 — Eigenvalues & SVD Intuition

### Task 5.1: PCA via SVD

Steps:
1. Center X
2. Compute SVD: X = U S V^T
3. Take top-k components from V

# HINT:
- `U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)`

**Explain:** Why does SVD show up in embeddings and dimensionality reduction?

SVD (Singular Value Decomposition) works on basic idea of spectral decomposition or matrix decomposition(that is, any matrix can be decomposed into a series of linear transformations).

SVD identifies the most important patterns and discards the rest. by retaining important features and removing least helpful dimensions we are reducing dimentionality. 

in case of embeddings, it helps in mapping high dimentaional data (words and their context) to dense & low dimentional vectors (that are embeddings)

In [11]:
X = rng.standard_normal((200, 10))
Xc = X - X.mean(axis=0, keepdims=True)  # TODO: center columns
U, S, Vt = np.linalg.svd(Xc, full_matrices=False)  # TODO

k = 3
W = Vt.T[:, :k]  # TODO: top-k right singular vectors (10,k)
Z = Xc @ W  # TODO: projection (200,k)
X_recon = Z @ W.T  # TODO: reconstruct from top-k

recon_err = np.linalg.norm(Xc - X_recon) / np.linalg.norm(Xc)
print('relative recon error', recon_err)
check('shapes', W.shape == (10, k) and Z.shape == (200, k))

relative recon error 0.784408889885354
OK: shapes


---
## Submission Checklist
- All TODOs completed
- Checks pass
- Explain prompts answered
