# Linear Regression from Scratch

In this notebook, you'll implement linear regression with gradient descent using only NumPy.

**What you'll do:**
- Generate synthetic data and visualize the learning target
- Implement the training loop step by step (forward → loss → backward → update)
- Train the model and watch parameters converge in real-time
- Experiment with learning rates to build intuition for gradient descent

**For each exercise, PREDICT the output before running the cell.** Wrong predictions are more valuable than correct ones — they reveal gaps in your mental model.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import clear_output
import time

# Reproducibility
np.random.seed(42)

# For nice plots
plt.style.use('dark_background')
plt.rcParams['figure.figsize'] = [10, 4]

---

## Exercise 1: Generate Synthetic Data (Guided)

We'll create data that follows the pattern: `y = 1.5x + 2 + noise`

Our goal: learn `w = 1.5` and `b = 2` from the data alone.

**Before running, predict:** What will the scatter plot look like? Will the points form a perfect line, or will there be spread?

In [None]:
# True parameters (what we're trying to learn)
TRUE_W = 1.5
TRUE_B = 2.0

# Generate data
n_samples = 100

X = np.random.uniform(-5, 5, n_samples)
noise = np.random.normal(0, 1.5, n_samples)
y = TRUE_W * X + TRUE_B + noise

# Visualize
plt.scatter(X, y, alpha=0.6, label='Data')
plt.plot(X, TRUE_W * X + TRUE_B, 'g--', label=f'True: y = {TRUE_W}x + {TRUE_B}')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Our Training Data')
plt.show()

---

## Exercise 2: Initialize and Predict (Guided)

We start with `w = 0` and `b = 0`. The model knows nothing yet.

**Before running, predict:** With `w = 0` and `b = 0`, what will the model predict for any input x? (Hint: what is `0 * x + 0`?)

In [None]:
# Initialize parameters
w = 0.0
b = 0.0

# Learning rate - try changing this!
learning_rate = 0.01

print(f"Initial parameters: w = {w}, b = {b}")
print(f"Learning rate: {learning_rate}")

---

## Exercise 3: Define the Training Step (Guided)

Each step follows this pattern:

1. **Forward pass**: Compute predictions `y_hat = wx + b`
2. **Compute loss**: MSE = mean((y - y_hat)^2)
3. **Backward pass**: Compute gradients dL/dw and dL/db
4. **Update**: w = w - lr * dL/dw, b = b - lr * dL/db

**Before running, predict:** If the loss is MSE = mean((y - y_hat)^2), what is the gradient dL/dw? (Hint: apply the chain rule to differentiate with respect to w.)

In [None]:
def train_step(X, y, w, b, learning_rate):
    """
    Perform one gradient descent step.
    Returns: new_w, new_b, loss
    """
    n = len(X)
    
    # 1. Forward pass: compute predictions
    y_pred = w * X + b
    
    # 2. Compute loss (MSE)
    loss = np.mean((y - y_pred) ** 2)
    
    # 3. Backward pass: compute gradients
    # dL/dw = (1/n) * sum(-2 * x * (y - y_pred))
    # dL/db = (1/n) * sum(-2 * (y - y_pred))
    dw = np.mean(-2 * X * (y - y_pred))
    db = np.mean(-2 * (y - y_pred))
    
    # 4. Update parameters
    new_w = w - learning_rate * dw
    new_b = b - learning_rate * db
    
    return new_w, new_b, loss

---

## Exercise 4: Train the Model (Guided)

Run the training loop and watch the model learn.

**Before running, predict:** After 100 epochs with learning rate 0.01, will `w` and `b` be close to the true values (1.5 and 2.0)? Will they match exactly?

In [None]:
# Reset parameters
w = 0.0
b = 0.0

# Training settings
n_epochs = 100
learning_rate = 0.01

# Track history for plotting
loss_history = []
w_history = []
b_history = []

# Training loop
for epoch in range(n_epochs):
    w, b, loss = train_step(X, y, w, b, learning_rate)
    
    loss_history.append(loss)
    w_history.append(w)
    b_history.append(b)
    
    # Print progress every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1:3d}: loss = {loss:.4f}, w = {w:.4f}, b = {b:.4f}")

print(f"\nFinal: w = {w:.4f} (true: {TRUE_W}), b = {b:.4f} (true: {TRUE_B})")

---

## Exercise 5: Visualize the Results (Guided)

**Before running, predict:** What will the loss curve look like? Straight line down, or something else?

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Plot 1: Loss curve
axes[0].plot(loss_history, 'r-', linewidth=2)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss (MSE)')
axes[0].set_title('Loss Over Time')
axes[0].grid(alpha=0.3)

# Plot 2: Parameter convergence
axes[1].plot(w_history, label=f'w (true: {TRUE_W})', linewidth=2)
axes[1].plot(b_history, label=f'b (true: {TRUE_B})', linewidth=2)
axes[1].axhline(y=TRUE_W, color='C0', linestyle='--', alpha=0.5)
axes[1].axhline(y=TRUE_B, color='C1', linestyle='--', alpha=0.5)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Parameter Value')
axes[1].set_title('Parameters Over Time')
axes[1].legend()
axes[1].grid(alpha=0.3)

# Plot 3: Final fit
axes[2].scatter(X, y, alpha=0.6, label='Data')
x_line = np.linspace(-5, 5, 100)
axes[2].plot(x_line, TRUE_W * x_line + TRUE_B, 'g--', 
             label=f'True: y = {TRUE_W}x + {TRUE_B}', linewidth=2)
axes[2].plot(x_line, w * x_line + b, 'r-', 
             label=f'Learned: y = {w:.2f}x + {b:.2f}', linewidth=2)
axes[2].set_xlabel('X')
axes[2].set_ylabel('y')
axes[2].set_title('Final Fit')
axes[2].legend()
axes[2].grid(alpha=0.3)

plt.tight_layout()
plt.show()

---

## Exercise 6: Animated Training (Guided)

Watch the line fit the data in real-time!

**Before running, predict:** Will the line rotate into place, translate into place, or both simultaneously?

In [None]:
# Reset for animation
w = 0.0
b = 0.0
learning_rate = 0.01

x_line = np.linspace(-5, 5, 100)

for epoch in range(50):
    w, b, loss = train_step(X, y, w, b, learning_rate)
    
    # Update plot
    clear_output(wait=True)
    
    fig, ax = plt.subplots(figsize=(8, 5))
    ax.scatter(X, y, alpha=0.6, label='Data')
    ax.plot(x_line, TRUE_W * x_line + TRUE_B, 'g--', 
            label='True line', linewidth=2, alpha=0.5)
    ax.plot(x_line, w * x_line + b, 'r-', 
            label=f'Learned: y = {w:.3f}x + {b:.3f}', linewidth=2)
    ax.set_xlim(-6, 6)
    ax.set_ylim(-12, 15)
    ax.set_xlabel('X')
    ax.set_ylabel('y')
    ax.set_title(f'Epoch {epoch+1} | Loss: {loss:.4f}')
    ax.legend(loc='upper left')
    ax.grid(alpha=0.3)
    plt.show()
    
    time.sleep(0.1)

print(f"\nTraining complete!")
print(f"Learned: w = {w:.4f} (true: {TRUE_W}), b = {b:.4f} (true: {TRUE_B})")

---

## Exercise 7: Learning Rate Experiments (Supported)

Now it's your turn. Modify the training loop to test different learning rates and observe the effects.

**Task:**
1. Set `learning_rate = 0.5` and run 100 epochs. What happens to the loss?
2. Set `learning_rate = 0.001` and run 100 epochs. Does the model converge?
3. Plot both loss curves on the same chart to compare.

<details>
<summary>Solution</summary>

The key insight is that learning rate controls the step size of each gradient descent update.

```python
results = {}
for lr in [0.001, 0.01, 0.5]:
    w, b = 0.0, 0.0
    losses = []
    for epoch in range(100):
        w, b, loss = train_step(X, y, w, b, lr)
        losses.append(loss)
    results[lr] = losses

for lr, losses in results.items():
    plt.plot(losses, label=f'lr={lr}')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Learning Rate Comparison')
plt.grid(alpha=0.3)
plt.show()
```

- **lr=0.5**: Loss explodes (overshooting — each step is too big)
- **lr=0.001**: Loss decreases very slowly (steps are too small to converge in 100 epochs)
- **lr=0.01**: Loss decreases smoothly and converges (the sweet spot for this problem)

</details>

---

## Key Takeaways

1. **The training loop is universal**: forward → loss → backward → update. This pattern scales from linear regression to neural networks with millions of parameters.
2. **Gradients point toward increasing loss**, so we subtract them to move toward lower loss.
3. **Learning rate controls step size**: too big = overshoot and diverge, too small = painfully slow convergence.
4. **Parameters don't converge to exact true values** because of noise in the data — the model finds the best fit given the noisy observations.