# Lesson 01: Linear Regression Basics + Normal Equation

## Objectives
- Revisit the supervised learning setup and CS229 notation.
- Derive the least-squares objective for linear regression.
- Solve linear regression with the normal equation.
- Visualize residuals and the cost surface.

## From the notes: notation + objective
We use the CS229 notation: training data \(\{(x^{(i)}, y^{(i)})\}_{i=1}^m\), feature dimension \(n\), parameters \(	heta\), and hypothesis \(h_	heta(x) = 	heta^T x\) with \(x_0 = 1\).

Least-squares objective:
\[
J(	heta) = rac{1}{2m} \sum_{i=1}^m (h_	heta(x^{(i)}) - y^{(i)})^2.
\]
Normal equation solution:
\[
	heta = (X^T X)^{-1} X^T y.
\]

## Intuition
Linear regression finds the line (or hyperplane) that minimizes squared error. The normal equation gives the closed-form minimizer when \(X^T X\) is invertible.

## Data
We create a synthetic 1D dataset with noise so we can visualize the fitted line and residuals.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

ModuleNotFoundError: No module named 'numpy'

In [None]:
# Synthetic data
m = 60
X_raw = np.linspace(0, 10, m)
y = 3.5 * X_raw + 2.0 + np.random.normal(0, 2.0, size=m)

# Add bias term
X = np.c_[np.ones(m), X_raw]

## Implementation: normal equation

In [None]:
# Normal equation
XtX = X.T @ X
theta = np.linalg.pinv(XtX) @ X.T @ y

def predict(X, theta):
    return X @ theta

preds = predict(X, theta)

## Experiments
We compare predictions and analyze residuals.

In [None]:
residuals = y - preds
mse = np.mean(residuals**2)
print(f"MSE: {mse:.3f}")

## Visualizations

In [None]:
plt.figure(figsize=(6,4))
plt.scatter(X_raw, y, label="data", alpha=0.7)
plt.plot(X_raw, preds, color="C1", label="normal equation fit")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Linear regression fit")
plt.legend()
plt.show()

plt.figure(figsize=(6,4))
plt.scatter(X_raw, residuals)
plt.axhline(0, color="black", linewidth=1)
plt.xlabel("x")
plt.ylabel("residual")
plt.title("Residuals")
plt.show()

## Takeaways
- Least squares gives a convex objective with a closed-form solution.
- The normal equation is fast for small \(n\), but can be expensive for large feature sets.

## Explain it in an interview
- Frame supervised learning as fitting \(	heta\) to minimize squared error.
- Mention the normal equation and when it is preferable to iterative methods.

## Exercises
1. What happens to the normal equation when \(X^T X\) is singular?
2. Add a quadratic feature and compare the fit.
3. Implement feature scaling and check the effect on the conditioning of \(X^T X\).