# Part A — Linear & Polynomial Regression (Analytic)

We will explore **linear regression** and **polynomial regression** using a synthetic dataset (`synthetic_regression.csv` with columns `x, y`). All solutions must use **analytic (closed-form)** formulas — **no gradient descent, no library `.fit()`** methods. Implement everything directly in **NumPy**.

## Tasks
1. **70/30 Train–Test Split (Unregularized)**
   - Split the data into 70% train / 30% test (random, reproducible).
   - Fit the following models:
     - Linear regression (polynomial degree 1),
     - Polynomial regression with degrees {2, 5, 10, 15}.
   - For each model, build the design matrix explicitly:  
     $\\Phi(x) = [1, x, x^2, \\dots, x^d]$.
   - Solve using the normal equations:  
     $\\hat\\theta = (\\Phi^\\top\\Phi)^{-1}\\Phi^\\top y$ (or use the pseudoinverse if singular).
   - Compute **training error** and **test error**.
   - Plot (a) the dataset points with all model fits on one figure, and (b) a **bar chart** of training vs test errors.

2. **10-Fold Cross-Validation (Unregularized)**
   - Implement 10-fold CV yourself (shuffle indices once, split into folds).
   - For each degree {1, 2, 5, 10, 15}, compute the **average test error** across folds.
   - Plot a **bar chart** comparing the average test error across all models.

3. **Repeat (1) and (2) with Ridge Regularization**
   - Use ridge regression with: $\\hat\\theta_\\lambda = (\\Phi^\\top\\Phi + \\lambda I)^{-1}\\Phi^\\top y$.
   - **Take $\\lambda = 1$** (fixed).
   - Show the same plots: fitted curves, bar chart of train/test errors, and bar chart of 10-fold average test errors.

### Notes
- Report **\"error\"** instead of MSE on your plots/prints.
- If any bar chart scale makes some bars invisible, **use a logarithmic y-axis**: `plt.yscale("log")`.
- Keep your code structured and use the provided skeleton below.


## Starter Skeleton (fill the TODOs)
Update the CSV path to where you saved `synthetic_regression.csv`.


In [None]:
import numpy as np
import csv
import matplotlib.pyplot as plt

# ---------- Utilities ----------
def design_matrix_poly_1d(x_column: np.ndarray, degree: int, include_bias: bool=True) -> np.ndarray:
    """Return Vandermonde-style design matrix [1, x, x^2, ..., x^degree]."""
    x = x_column.reshape(-1)
    start = 0 if include_bias else 1
    cols = [x**p for p in range(start, degree+1)]
    Phi = np.stack(cols, axis=1)
    return Phi

def normal_equation(Phi: np.ndarray, y: np.ndarray) -> np.ndarray:
    """Closed-form least squares: theta = (Phi^T Phi)^{-1} Phi^T y (or pseudoinverse)."""
    A = Phi.T @ Phi
    b = Phi.T @ y
    if np.linalg.matrix_rank(A) == A.shape[0]:
        return np.linalg.inv(A) @ b
    return np.linalg.pinv(Phi) @ y

def ridge_closed_form(Phi: np.ndarray, y: np.ndarray, lam: float) -> np.ndarray:
    """Closed-form ridge: theta = (Phi^T Phi + λI)^{-1} Phi^T y."""
    d = Phi.shape[1]
    A = Phi.T @ Phi + lam * np.eye(d)
    b = Phi.T @ y
    return np.linalg.inv(A) @ b

def predict(Phi: np.ndarray, theta: np.ndarray) -> np.ndarray:
    return Phi @ theta

def err(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    r = y_true - y_pred
    return float((r @ r) / r.size)

def kfold_indices(n: int, K: int, seed: int = 0):
    rng = np.random.default_rng(seed)
    idx = np.arange(n)
    rng.shuffle(idx)
    folds = np.array_split(idx, K)
    splits = []
    for k in range(K):
        val_idx = folds[k]
        train_idx = np.concatenate([folds[i] for i in range(K) if i != k])
        splits.append((train_idx, val_idx))
    return splits

def train_test_split_indices(n: int, test_ratio: float = 0.3, seed: int = 42):
    rng = np.random.default_rng(seed)
    idx = np.arange(n)
    rng.shuffle(idx)
    n_test = int(round(test_ratio * n))
    test_idx = idx[:n_test]
    train_idx = idx[n_test:]
    return train_idx, test_idx

def load_csv_xy(path: str):
    xs, ys = [], []
    with open(path, "r") as f:
        rd = csv.DictReader(f)
        for row in rd:
            xs.append(float(row["x"]))
            ys.append(float(row["y"]))
    X = np.array(xs).reshape(-1, 1)
    y = np.array(ys)
    return X, y
