# Linear Regression & GLMs — Student Lab

Complete all TODOs. Avoid sklearn for core parts.

In [1]:
import numpy as np

def check(name: str, cond: bool):
    if not cond:
        raise AssertionError(f'Failed: {name}')
    print(f'OK: {name}')

rng = np.random.default_rng(0)

## Section 0 — Synthetic Dataset (with collinearity)
We generate data where features can be highly correlated to motivate ridge.

In [5]:
def make_regression(n=400, d=5, noise=0.5, collinear=True):
    X = rng.standard_normal((n, d))
    print(X.shape)
    if collinear and d >= 2:        
        X[:, 1] = X[:, 0] * 0.95 + 0.05 * rng.standard_normal(n)
        
    w_true = rng.standard_normal(d)
    print(w_true.shape)
    y = X @ w_true + noise * rng.standard_normal(n)
    print(y.shape)
    return X, y, w_true

X, y, w_true = make_regression()
n, d = X.shape
check('shapes', y.shape == (n,))
print('corr(x0,x1)=', np.corrcoef(X[:,0], X[:,1])[0,1])

(400, 5)
(5,)
(400,)
OK: shapes
corr(x0,x1)= 0.9985485848157019


## Section 1 — OLS Closed Form

### Task 1.1: Closed-form w_hat using solve

# TODO: compute w_hat using solve on (X^T X) w = X^T y
# HINT: `XtX = X.T@X`, `Xty = X.T@y`, `np.linalg.solve(XtX, Xty)`

**Checkpoint:** Why is explicit inverse discouraged?
Explicit inverse slow down the performance of the mathmetical operation.

In [6]:
# TODO
XtX = X.T@X
Xty = X.T@y
w_hat = np.linalg.solve(XtX,Xty)

check('w_shape', w_hat.shape == (d,))

OK: w_shape


### Task 1.2: Evaluate fit + residuals
Compute:
- predictions y_pred
- MSE
- residual mean and std

**Interview Angle:** What does a structured residual pattern imply (e.g., nonlinearity)?
How much noise does the predication has.


In [7]:
# TODO
y_pred = X @ w_hat
mse = float(np.mean(y - y_pred) ** 2)
resid = y_pred - y
print('mse', mse, 'resid_mean', resid.mean(), 'resid_std', resid.std())
check('finite', np.isfinite(mse))

mse 0.0003388283091339161 resid_mean 0.01840728956511295 resid_std 0.4708303038387884
OK: finite


## Section 2 — Gradient Descent

### Task 2.1: Implement MSE loss + gradient

Loss = mean((Xw-y)^2), grad = (2/n) X^T(Xw-y)

# TODO: implement `mse_loss_and_grad`

**FAANG gotcha:** shapes and constants.

In [9]:
def mse_loss_and_grad(X, y, w):
    # TODO
    res = X @ w - y
    loss = float(np.mean(res * res))
    grad = (2/X.shape[0]) * X.T @ res
    return loss, grad

w0 = np.zeros(d)
loss0, g0 = mse_loss_and_grad(X, y, w0)
check('grad_shape', g0.shape == (d,))
check('finite_loss', np.isfinite(loss0))

OK: grad_shape
OK: finite_loss


### Task 2.2: Train with GD + compare to closed-form

# TODO: implement a simple GD loop, track loss, and compare final weights to w_hat.

**Checkpoint:** How does feature scaling affect GD?

In [13]:
def train_gd(X, y, lr=0.05, steps=500):
    # TODO
    w = np.zeros(X.shape[1])
    Losses = []
    print(w.shape)
    print(X.shape)
    for i in range(steps):
        loss, grad = mse_loss_and_grad(X, y , w)
        w = w - lr * grad
        Losses.append(loss)
    print(Losses)
    print(w)
    return w, Losses
    
w_gd, losses = train_gd(X, y, lr=0.05, steps=500)
print('final_loss', losses[-1])
print('||w_gd-w_hat||', np.linalg.norm(w_gd - w_hat))
check('loss_decreases', losses[-1] <= losses[0])

(5,)
(400, 5)
[1.3079680318455007, 1.0859856075980903, 0.9097494347925305, 0.769773669812158, 0.6585500254815626, 0.5701335869476142, 0.49981572057306367, 0.44386569990997127, 0.399326561312197, 0.3638537653250385, 0.33558765404476804, 0.3130525971937641, 0.29507721934081527, 0.2807312830242583, 0.2692757348315747, 0.2601231567636425, 0.2528064452116741, 0.24695399748074096, 0.24227004690724788, 0.23851907264557748, 0.2355134352215397, 0.23310356664007703, 0.2311701841810513, 0.22961810789495843, 0.2283713494296068, 0.22736920907623598, 0.22656317268201334, 0.22591444338281902, 0.2253919773704756, 0.22497092002009272, 0.22463136016430063, 0.22435733729492888, 0.22413604993353797, 0.22395722407827415, 0.22381260908905248, 0.22369557507708307, 0.22360079118268794, 0.22352396834543817, 0.22346165352085343, 0.22341106495864843, 0.22336996027157674, 0.22333653070436063, 0.22330931634854623, 0.22328713811233902, 0.22326904310076692, 0.22325426073548066, 0.22324216748048667, 0.223232258468167

## Section 3 — Ridge Regression (L2)

### Task 3.1: Ridge closed-form
w = (X^T X + λI)^{-1} X^T y

# TODO: implement ridge_solve

**Interview Angle:** Why does ridge help under collinearity?

In [16]:
def ridge_solve(X, y, lam):
    # TODO
    d = X.shape[1]
    return np.linalg.solve(X.T @ X + lam * np.eye(d) , X.T @ y)

w_ridge = ridge_solve(X, y, lam=1.0)
check('ridge_shape', w_ridge.shape == (d,))

OK: ridge_shape


### Task 3.2: Bias/variance demo with train/test split

# TODO: split into train/test and compare MSE for multiple lambdas.

**Checkpoint:** why can test error improve even when train error worsens?

In [23]:
# TODO
idx = rng.permutation(n)
train = idx[: int(0.7*n)]
test = idx[int(0.7*n):]
Xtr, ytr = X[train], y[train]
Xte, yte = X[test], y[test]


lams = [0.0, 0.1, 1.0, 10.0]
results = []
for lam in lams:
    w = ridge_solve(Xtr, ytr, lam=lam) if lam > 0 else np.linalg.solve(Xtr.T@Xtr, Xtr.T@ytr)
    tr_mse = np.mean((Xtr@w - ytr)**2)
    te_mse = np.mean((Xte@w - yte)**2)
    results.append((lam, tr_mse, te_mse))
print('lam, train_mse, test_mse')
for r in results:
    print(r)

lam, train_mse, test_mse
(0.0, 0.22434062830819776, 0.22015868447129303)
(0.1, 0.22436070870186797, 0.22054748124275209)
(1.0, 0.22456604110932607, 0.22130065084795897)
(10.0, 0.22589656843495562, 0.218772678805243)


## Section 4 — GLM Intuition

### Task 4.1: Match tasks to (distribution, link)
Fill in a table for:
- regression
- binary classification
- count prediction

**Explain:** what changes when you go from OLS to a GLM?

| Problem | Target type | Distribution | Link | Loss |
|---|---|---|---|---|
| House price | continuous | ? | ? | ? |
| Fraud | binary | ? | ? | ? |
| Clicks per user | count | ? | ? | ? |


---
## Submission Checklist
- All TODOs completed
- Train/test results shown for ridge
- Short answers to checkpoint questions
