# Deep Learning 2025 — Homework 2

## Linear Regression: From Scratch and with PyTorch


Goal: Understand how the math behind linear regression translates into code — first by hand with Numpy, then using PyTorch.

## Overview
In this assignment, you will:
1. Implement a linear regression model from scratch using NumPy.
2. Implement the same model using PyTorch, and compare both approaches.
3. Apply your model to a real-world dataset (California housing data).

This will help you connect the mathematical concepts from the lectures with their practical implementation.


## Exercise 1 — Linear Regression from Scratch
You will implement all the components manually:

- Linear model: $ ( \hat{y} = Xw + b ) $
- Mean squared error loss
- Gradient descent

**Hint:** Revisit your notes from matrix calculus — the gradient of MSE with respect to w and b can be derived using the chain rule.

**Concept Reminder:**
- The model parameters `w` and `b` define a line that tries to best fit the data.
- The *loss* measures how far predictions are from true values.
- The *gradient* tells us how to move `w` and `b` to reduce that loss.

### Step 1. Generate synthetic data


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(22)

# Generate 100 samples of x and corresponding y
X = 2 * np.random.rand(100, 1)
y = 3 * X + 7 + np.random.randn(100, 1) * 0.5

plt.scatter(X, y, color='blue', label='Data')
plt.xlabel("x")
plt.ylabel("y")
plt.title("Synthetic Data: y = 3x + 7 + noise")
plt.legend()
plt.show()

### Step 2. Implement the linear model and loss
Define two Python functions:
- `predict(X, w, b)` → compute predictions \(Xw + b\)
- `mse_loss(y_pred, y_true)` → compute mean squared error

 **Concept Reminder:**  
- `predict(X, w, b)` performs a linear transformation of the input data.  
- The loss function measures how well our model’s predictions match the target values.  
- Minimizing the loss helps the model learn the best `w` and `b`.

In [None]:
# TODO: implement the predict and loss functions

def predict(X, w, b):
    # return...
    pass

def mse_loss(y_pred, y_true):
    # ...
    pass

### Step 3. Implement gradient descent
Compute gradients and update weights iteratively.

 **Hint:** We start with a small random `w` and `b = 0`. Use a learning rate around `0.1` and about `100` epochs.

 **Concept Reminder:**
- The gradient $ \frac{\partial L}{\partial w} $ indicates how much the loss changes when we slightly change `w`.  
- Gradient descent repeatedly moves `w` and `b` in the direction that reduces the loss.  
- The learning rate `η` controls how large each step is.

The gradients are:

$ \frac{\partial L}{\partial w} = \frac{2}{n} X^T (Xw + b - y) $

$ \frac{\partial L}{\partial b} = \frac{2}{n} \sum_i (Xw + b - y)_i $

(We saw them in our tutorial session as well)

In [None]:
# TODO: initialize parameters and run training loop
w = np.random.randn(1, 1)
b = np.zeros((1, 1))

# Hyperparameters
learning_rate = 0.1
epochs = 100
losses = []

for epoch in range(epochs):
    # 1. Predict
    y_pred = predict(X, w, b)

    # 2. Compute loss
    loss = mse_loss(y_pred, y)

    # 3. Compute gradients (dw, db)
    # TODO: derive and implement gradient update rules here

    # 4. Update parameters
    # TODO: update w and b

    losses.append(loss)

print(f"Learned parameters: w = {w.item():.3f}, b = {b.item():.3f}")

plt.plot(losses)
plt.xlabel("Epoch")
plt.ylabel("Loss (MSE)")
plt.title("Training Loss over Epochs")
plt.show()

### Step 4. Visualize the result
Plot the fitted line on top of your data.

 **Concept Reminder:**
- After training, your line should closely match the true relation.  
- If the learning rate is too high, the line might overshoot and fail to converge.


In [None]:
plt.scatter(X, y, label="Data", alpha=0.6)
plt.plot(X, predict(X, w, b), color='red', label="Fitted Line")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.title("Fitted Linear Regression Line")
plt.show()

## Exercise 2 — Linear Regression with PyTorch
Now we’ll repeat the same experiment using PyTorch, which will handle the gradient computation for you.

**Concept Reminder:**
- PyTorch uses automatic differentiation (`autograd`) to compute gradients.
- Instead of manually implementing the gradient formulas, we define the model and let PyTorch handle the math.
- The optimization process (SGD) follows the same logic as before — but under the hood.


In [None]:
import torch
from torch import nn, optim

# Convert your NumPy arrays to torch tensors
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.float32)

# TODO: Define model, loss, and optimizer
# model = nn.Linear(...)
# criterion = nn.MSELoss()
# optimizer = optim.SGD(...)

### Step 2. Training Loop
Implement the typical PyTorch training cycle:
1. Forward pass → compute predictions
2. Compute loss → how wrong is the model?
3. Zero gradients → reset previous gradients
4. Backward pass → compute new gradients
5. Optimizer step → update parameters

 **Concept Reminder:**  
This mirrors the NumPy loop, but now gradients are handled automatically by PyTorch.


In [None]:
num_epochs = 100
losses = []

for epoch in range(num_epochs):
    # TODO: forward, compute loss, backward, step
    pass

plt.plot(losses)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("PyTorch Training Loss")
plt.show()

### Step 3. Compare both models
Compare learned parameters between your NumPy and PyTorch implementations. The parameters should be very close if both models were trained correctly.


In [None]:
# TODO: extract and print learned parameters

## Exercise 3 — Real Dataset: California Housing
Apply your PyTorch linear regression model to a real dataset.

We'll use the California housing dataset (you can load it directly using the sklearn python package) to predict house prices from one feature.

**Concept Reminder:**
- Real datasets rarely follow perfect linear relationships.  
- Still, linear regression provides a good first baseline and a visual introduction to model fitting.


In [None]:
from sklearn.datasets import fetch_california_housing


# Load dataset
data = fetch_california_housing(as_frame=True)
print(data.feature_names)

# Choose one feature and the target
X_real = data.data[['AveRooms']].values
y_real = data.target.values.reshape(-1, 1)

plt.scatter(X_real, y_real, s=5, alpha=0.5)
plt.xlabel('Average number of rooms')
plt.ylabel('Median house value ($100,000s)')
plt.title('California Housing: AveRooms vs. Price')
plt.show()

### Step 2. Train a linear model on the real data
 **Hint:** Start from your PyTorch code above. Adjust the learning rate — this dataset has a different scale.

 **Concept Reminder:**
- Larger datasets might require smaller learning rates or more epochs.  
- The goal is to minimize the loss — even if the line doesn’t perfectly fit the points.


In [None]:
# TODO: reuse PyTorch model and training loop for real dataset. Plot your training loss as we did before too.

## Step 3. Visualize your results
Plot predictions vs. actual data points. A single linear feature often cannot fully capture complex dependencies — but you can observe the overall trend.


In [None]:
# TODO: visualize predictions vs. ground truth

## Optional (will not be graded)
You can also try adding more input features to see how the model behaves.

---

## Homework submission
Upload your answer file to Moodle, preferably with all the code cells executed.