# Lesson 01 - Intro, Supervised Learning, and Notation


## Objectives
- Set up CS229 notation and the supervised learning workflow.
- Connect data, hypothesis space, and loss minimization.
- Build a tiny linear model from scratch to establish the pattern used later.


## From the notes

**Notation (CS229)**
- $m$ = number of training examples, $n$ = number of features.
- $x^{(i)} \in \mathbb{R}^{n+1}$ with $x_0^{(i)} = 1$, $y^{(i)}$ target.
- Hypothesis: $h_\theta(x) = \theta^T x$.
- Cost: $J(\theta) = \frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2$.

**Learning loop**
1. Choose hypothesis class $\mathcal{H}$.
2. Define objective $J(\theta)$.
3. Optimize to obtain $\theta^*$.

_TODO: Validate these definitions against the official CS229 main notes PDF once available._


## Intuition
Supervised learning is about turning labeled examples into a function that generalizes. We keep the notation simple so every later algorithm reuses the same symbols and workflow.


## Data
We start with a synthetic 1D dataset so the geometry of a line fit is easy to see.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

# Synthetic data
m = 60
x = np.linspace(0, 10, m)
y = 2.5 * x + 3.0 + np.random.normal(scale=2.0, size=m)
X = np.c_[np.ones(m), x]

# Closed-form solution
theta = np.linalg.pinv(X.T @ X) @ X.T @ y
preds = X @ theta

theta


## Experiments


In [None]:
# Compare mean squared error with a random guess
mse_model = np.mean((preds - y) ** 2)
mse_baseline = np.mean((y.mean() - y) ** 2)
mse_model, mse_baseline


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.scatter(x, y, alpha=0.7, label="data")
plt.plot(x, preds, color="black", label="linear fit")
plt.title("Synthetic regression data")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

plt.figure(figsize=(6,4))
plt.hist(y - preds, bins=15, alpha=0.7, color="tab:orange")
plt.title("Residual histogram")
plt.xlabel("error")
plt.ylabel("count")
plt.show()


## Takeaways
- The CS229 notation (m, n, x^(i), y^(i), θ) is reused in every later lesson.
- Even a simple linear model demonstrates the full ML workflow: choose hθ, define J(θ), optimize.


## Explain it in an interview
- Describe supervised learning as fitting hθ that minimizes a loss over labeled examples.
- Explain the role of the bias term x0 = 1 in linear models.


## Exercises
- Show how the normal equation changes if you add L2 regularization.
- Why is the bias term included as x0 = 1?
- Create a dataset where linear regression underfits.
