# Lesson 03 - Locally Weighted Regression + Probabilistic View


## Objectives
- Implement locally weighted regression (LWR).
- Connect linear regression to a Gaussian noise model.
- Visualize how bandwidth controls bias/variance.


## From the notes

**Probabilistic view**
- Assume $y^{(i)} = \theta^T x^{(i)} + \epsilon^{(i)}$, with $\epsilon^{(i)} \sim \mathcal{N}(0, \sigma^2)$.
- Maximizing likelihood is equivalent to minimizing least squares.

**LWR**
- Weighted cost: $J(\theta) = \frac{1}{2} \sum_{i=1}^m w^{(i)}(x) (\theta^T x^{(i)} - y^{(i)})^2$.
- Weight: $w^{(i)}(x) = \exp(-\|x^{(i)} - x\|^2 / (2\tau^2))$.

_TODO: Validate the equations against the official CS229 main notes PDF._


## Intuition
LWR fits a linear model around each query point using nearby data. Smaller bandwidth τ makes the model flexible but noisy; larger τ smooths but can underfit.


## Data
We create a nonlinear 1D dataset so LWR can capture curvature that a global line cannot.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

# Nonlinear data
m = 60
x = np.linspace(-3, 3, m)
y = np.sin(x) + 0.3 * np.random.normal(size=m)
X = np.c_[np.ones(m), x]

def lwr_predict(x_query, X, y, tau):
    weights = np.exp(-((X[:,1] - x_query) ** 2) / (2 * tau**2))
    W = np.diag(weights)
    theta = np.linalg.pinv(X.T @ W @ X) @ X.T @ W @ y
    return np.array([1, x_query]) @ theta

def lwr_curve(X, y, tau):
    return np.array([lwr_predict(xq, X, y, tau) for xq in X[:,1]])

preds_tau_small = lwr_curve(X, y, tau=0.3)
preds_tau_large = lwr_curve(X, y, tau=1.0)


## Experiments


In [None]:
# Compare training error for different bandwidths
mse_small = np.mean((preds_tau_small - y) ** 2)
mse_large = np.mean((preds_tau_large - y) ** 2)
mse_small, mse_large


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.scatter(x, y, alpha=0.6, label="data")
plt.plot(x, preds_tau_small, label="tau=0.3")
plt.plot(x, preds_tau_large, label="tau=1.0")
plt.title("Locally weighted regression")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

plt.figure(figsize=(6,4))
plt.plot(x, np.sin(x), label="true function")
plt.title("Underlying nonlinear signal")
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.legend()
plt.show()


## Takeaways
- LWR adapts model complexity locally via the bandwidth τ.
- Least squares can be interpreted as maximum likelihood with Gaussian noise.


## Explain it in an interview
- Explain how LWR differs from global linear regression.
- Describe the bias/variance effect of changing τ.


## Exercises
- Try τ values that make the fit clearly underfit and overfit.
- Derive the likelihood for linear regression with Gaussian noise.
- Extend LWR to higher-dimensional inputs.
