# Linear Regression

## Concept Summary

Linear regression models the relationship between a dependent variable $y$ and one or more independent variables $X$ by fitting a linear equation:

$$\hat{y} = w \cdot X + b$$

where:
- $w$ (weights) — how much each feature contributes to the prediction
- $b$ (bias) — the baseline prediction when all features are zero

### Key Concepts

| Concept | Formula | Purpose |
|---------|---------|--------|
| **Hypothesis** | $f_{w,b}(x) = wx + b$ | Predict output given input |
| **Cost Function (MSE)** | $J(w,b) = \frac{1}{2m} \sum_{i=1}^{m}(f_{w,b}(x^{(i)}) - y^{(i)})^2$ | Measure how wrong the model is |
| **Gradient Descent** | $w = w - \alpha \frac{\partial J}{\partial w}$ | Iteratively minimize the cost |

### Learning Objectives

- **Level 1**: Implement linear regression from scratch using NumPy
- **Level 2**: Use scikit-learn to achieve the same result

---

# Level 1: Linear Regression from Scratch

In this section, you will implement:
1. The cost function
2. The gradient computation
3. Gradient descent
4. Prediction

Use only **NumPy** — no scikit-learn allowed here!

In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

## 1.1 Generate Synthetic Data

We start with a simple dataset so you can verify your implementation easily.

The true relationship is: $y = 3x + 7 + \text{noise}$

In [None]:
# Generate training data
m = 100  # number of training examples
X_train = 2 * np.random.rand(m)
y_train = 3 * X_train + 7 + np.random.randn(m) * 0.5

# Visualize
plt.scatter(X_train, y_train, alpha=0.6)
plt.xlabel("X")
plt.ylabel("y")
plt.title("Training Data")
plt.show()

print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")

## 1.2 Implement the Cost Function

The Mean Squared Error cost function:

$$J(w, b) = \frac{1}{2m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^2$$

where $f_{w,b}(x) = wx + b$

In [None]:
def compute_cost(X, y, w, b):
    """
    Compute the MSE cost for linear regression.

    Args:
        X (np.ndarray): Input features, shape (m,)
        y (np.ndarray): Target values, shape (m,)
        w (float): Weight parameter
        b (float): Bias parameter

    Returns:
        float: The MSE cost
    """
    m = len(X)

    # TODO: Compute the cost
    # Step 1: Compute predictions (f_wb) for all training examples
    # Step 2: Compute the squared errors
    # Step 3: Return the mean (divided by 2m)

    cost = 0  # Replace this
    return cost

In [None]:
# --- Validation: Cost Function ---
# With w=3, b=7 (close to the true values), cost should be small (~0.12)
# With w=0, b=0 (bad guess), cost should be large (~35)

cost_good = compute_cost(X_train, y_train, w=3, b=7)
cost_bad = compute_cost(X_train, y_train, w=0, b=0)

print(f"Cost with w=3, b=7 (good guess): {cost_good:.4f}  (expected: ~0.12)")
print(f"Cost with w=0, b=0 (bad guess):  {cost_bad:.4f}  (expected: ~35)")

## 1.3 Implement the Gradient

Compute the partial derivatives of $J$ with respect to $w$ and $b$:

$$\frac{\partial J}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)}) \cdot x^{(i)}$$

$$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})$$

In [None]:
def compute_gradient(X, y, w, b):
    """
    Compute the gradients of the cost function w.r.t. w and b.

    Args:
        X (np.ndarray): Input features, shape (m,)
        y (np.ndarray): Target values, shape (m,)
        w (float): Weight parameter
        b (float): Bias parameter

    Returns:
        tuple: (dj_dw, dj_db) — partial derivatives of J w.r.t. w and b
    """
    m = len(X)

    # TODO: Compute the gradients
    # Step 1: Compute predictions (f_wb)
    # Step 2: Compute the error (prediction - actual)
    # Step 3: Compute dj_dw (mean of error * X)
    # Step 4: Compute dj_db (mean of error)

    dj_dw = 0  # Replace this
    dj_db = 0  # Replace this
    return dj_dw, dj_db

In [None]:
# --- Validation: Gradient ---
# At w=0, b=0, gradients should be negative (need to increase w and b)

dj_dw, dj_db = compute_gradient(X_train, y_train, w=0, b=0)
print(f"dj_dw at (w=0, b=0): {dj_dw:.4f}  (expected: negative, around -11)")
print(f"dj_db at (w=0, b=0): {dj_db:.4f}  (expected: negative, around -8)")

## 1.4 Implement Gradient Descent

Repeatedly update $w$ and $b$ using the gradients:

$$w = w - \alpha \frac{\partial J}{\partial w}$$
$$b = b - \alpha \frac{\partial J}{\partial b}$$

where $\alpha$ is the learning rate.

In [None]:
def gradient_descent(X, y, w_init, b_init, learning_rate, num_iterations):
    """
    Run gradient descent to learn w and b.

    Args:
        X (np.ndarray): Input features, shape (m,)
        y (np.ndarray): Target values, shape (m,)
        w_init (float): Initial weight
        b_init (float): Initial bias
        learning_rate (float): Step size (alpha)
        num_iterations (int): Number of gradient descent steps

    Returns:
        tuple: (w, b, cost_history)
            - w: learned weight
            - b: learned bias
            - cost_history: list of cost values at each iteration
    """
    w = w_init
    b = b_init
    cost_history = []

    for i in range(num_iterations):
        # TODO: Implement one step of gradient descent
        # Step 1: Compute the gradients using your compute_gradient function
        # Step 2: Update w and b simultaneously
        # Step 3: Record the cost for plotting

        pass  # Replace this

        # Print progress every 100 iterations
        if i % 100 == 0:
            cost = compute_cost(X, y, w, b)
            cost_history.append(cost)
            print(f"Iteration {i:4d}: Cost = {cost:.6f}, w = {w:.4f}, b = {b:.4f}")

    return w, b, cost_history

In [None]:
# --- Run Gradient Descent ---
w_final, b_final, cost_history = gradient_descent(
    X_train, y_train,
    w_init=0,
    b_init=0,
    learning_rate=0.1,
    num_iterations=1000
)

print(f"\nLearned parameters: w = {w_final:.4f}, b = {b_final:.4f}")
print(f"Expected (approx):  w = 3.0000, b = 7.0000")

In [None]:
# --- Visualize Results ---

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Data and fitted line
axes[0].scatter(X_train, y_train, alpha=0.6, label="Training data")
x_line = np.linspace(0, 2, 100)
axes[0].plot(x_line, w_final * x_line + b_final, color="red", linewidth=2, label=f"Fit: y = {w_final:.2f}x + {b_final:.2f}")
axes[0].set_xlabel("X")
axes[0].set_ylabel("y")
axes[0].set_title("Linear Regression Fit")
axes[0].legend()

# Plot 2: Cost over iterations
axes[1].plot(range(0, 1000, 100), cost_history)
axes[1].set_xlabel("Iteration")
axes[1].set_ylabel("Cost")
axes[1].set_title("Cost Function Convergence")

plt.tight_layout()
plt.show()

---

# Level 2: Linear Regression with scikit-learn

Now re-implement the same task using scikit-learn. Compare the results with your Level 1 implementation.

**Key scikit-learn classes:**
- `LinearRegression` — fits a linear model using Ordinary Least Squares
- The model exposes `.coef_` (weights) and `.intercept_` (bias) after fitting

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
# scikit-learn expects X to be 2D: shape (m, n_features)
X_train_2d = X_train.reshape(-1, 1)

# TODO: Create a LinearRegression model, fit it, and extract the parameters
# Step 1: Create the model
# Step 2: Fit the model on X_train_2d and y_train
# Step 3: Get the learned weight (model.coef_[0]) and bias (model.intercept_)

sklearn_w = 0  # Replace this
sklearn_b = 0  # Replace this

print(f"scikit-learn: w = {sklearn_w:.4f}, b = {sklearn_b:.4f}")
print(f"Your Level 1: w = {w_final:.4f}, b = {b_final:.4f}")

In [None]:
# TODO: Make predictions using the sklearn model and evaluate
# Step 1: Use model.predict() on X_train_2d
# Step 2: Compute MSE using mean_squared_error(y_train, predictions)
# Step 3: Compute R² using r2_score(y_train, predictions)

y_pred = np.zeros_like(y_train)  # Replace this

mse = 0  # Replace this
r2 = 0   # Replace this

print(f"Mean Squared Error: {mse:.4f}")
print(f"R² Score: {r2:.4f}")

In [None]:
# --- Visualize: Compare Level 1 vs Level 2 ---

plt.figure(figsize=(8, 5))
plt.scatter(X_train, y_train, alpha=0.5, label="Training data")

x_line = np.linspace(0, 2, 100)
plt.plot(x_line, w_final * x_line + b_final, color="red", linewidth=2, label=f"Level 1 (scratch): y={w_final:.2f}x+{b_final:.2f}")
plt.plot(x_line, sklearn_w * x_line + sklearn_b, color="green", linewidth=2, linestyle="--", label=f"Level 2 (sklearn): y={sklearn_w:.2f}x+{sklearn_b:.2f}")

plt.xlabel("X")
plt.ylabel("y")
plt.title("Level 1 vs Level 2 Comparison")
plt.legend()
plt.show()

## Reflection Questions

After completing Level 1 and Level 2, answer these in a markdown cell below:

1. How close are your Level 1 parameters to the scikit-learn results? Why might they differ?
2. What happens if you change the learning rate to 0.01? To 1.0?
3. What happens if you increase the number of iterations to 10000?
4. scikit-learn uses **Ordinary Least Squares** (closed-form solution), not gradient descent. What are the trade-offs?

*Your answers here...*

---

# Level 3: Real-World Application

For Level 3, you will work in the `src/linear_regression/` module to build a proper Python project that:
- Loads a real dataset (California Housing from scikit-learn)
- Preprocesses and splits the data
- Trains a linear regression model
- Evaluates performance with proper metrics

Go to `src/linear_regression/` and follow the TODOs in:
- `model.py` — your model wrapper class
- `pipeline.py` — data loading, training, and evaluation

Run tests with: `pytest tests/test_linear_regression.py`