# Deep Learning – Regression (PyTorch)

This notebook is part of the **ML-Methods** project.

It introduces **Deep Learning for supervised regression**
using the PyTorch framework.

As with all other notebooks in this project,
the initial sections focus on data preparation
and are intentionally repeated.

This ensures:
- consistency across models
- fair comparison of results
- a unified learning pipeline


_____

## Notebook Roadmap (standard ML-Methods)

1. Project setup and common pipeline  
2. Dataset loading  
3. Train-test split  
4. Feature scaling (why we do it)  

----------------------------------

5. What is this model? (Intuition)  
6. Model training  
7. Model behavior and key parameters  
8. Predictions  
9. Model evaluation  
10. When to use it and when not to  
11. Model persistence  
12. Mathematical formulation (deep dive)  
13. Final summary – Code only


___
## How this notebook should be read

This notebook is designed to be read **top to bottom**.

Before every code cell, you will find a short explanation describing:
- what we are about to do
- why this step is necessary
- how it fits into the overall process

Compared to scikit-learn,
this notebook exposes more of the **training mechanics**.

The goal is to understand:
- how deep learning regression works internally
- how training is controlled explicitly
- how PyTorch differs from high-level abstractions


___
## What is Deep Learning (in this context)?

In this notebook,
Deep Learning refers to **neural networks trained manually**
using PyTorch.

Unlike scikit-learn:
- the training loop is explicit
- forward and backward passes are visible
- optimization is controlled step by step

This provides deeper insight
into how regression models actually learn.


___
## What do we want to achieve?

Our objective is to train a neural network that:
- takes numerical input features
- processes them through multiple layers
- outputs a **single continuous value**

The model learns a mapping:

input features → numerical target



___
## Why use PyTorch for regression?

PyTorch is a **low-level deep learning framework**
that provides fine-grained control over training.

Using PyTorch allows us to:
- see the forward pass explicitly
- control loss computation
- manage gradients manually
- understand optimization mechanics

This notebook represents the **next conceptual step**
after scikit-learn:
from abstraction → understanding.


___
## What you should expect from the results

With Deep Learning regression in PyTorch,
you should expect:

- non-linear regression capability
- flexible model architecture
- full control over training dynamics
- behavior similar to scikit-learn MLP

However:
- code is more verbose
- more responsibility is on the user
- mistakes are easier to make



___
## 1. Project setup and common pipeline

In this section we set up the common pipeline
used across regression models in this project.

Although the model is implemented in PyTorch,
data preparation remains consistent
with all other regression notebooks.


In [1]:
# ====================================
# Common imports used across regression models
# ====================================

import numpy as np
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score
)

from pathlib import Path
import joblib
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim


### PyTorch vs scikit-learn (at a glance)

Compared to scikit-learn:
- models are defined as Python classes
- training loops are written manually
- gradients are handled explicitly

The surrounding pipeline,
however, remains unchanged.

In the next section,
we will load the regression dataset
used throughout this notebook.


___
## 2. Dataset loading

In this section we load the dataset
used for the deep learning regression task.

We use the same regression dataset
adopted in the other regression notebooks
to ensure fair comparison across models.


In [2]:
# ====================================
# Dataset loading
# ====================================

data = fetch_california_housing(as_frame=True)

X = data.data
y = data.target


### Inputs and target

- `X` contains the input features
- `y` contains the continuous target variable

This is a **supervised regression problem**:
- each input corresponds to a real-valued output
- the goal is to predict a numerical quantity

At this stage:
- data is still in pandas format
- no preprocessing has been applied yet

In the next section,
we will split the dataset
into training and test sets.


____
## 3. Train-test split

In this section we split the dataset
into training and test sets.

This allows us to evaluate
how well the neural network
generalizes to unseen data.


In [3]:
# ====================================
# Train-test split
# ====================================

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)


### Why this step is essential

A regression model must be evaluated
on data it has never seen during training.

By separating the dataset:
- the training set is used for learning
- the test set is used only for evaluation

This prevents data leakage
and ensures realistic performance estimates.

In the next section,
we will apply feature scaling,
which is mandatory for deep learning models.


___
## 4. Feature scaling (why we do it)

In this section we apply feature scaling
to the input features.

For deep learning regression models,
feature scaling is **mandatory**.


In [4]:
# ====================================
# Feature scaling
# ====================================

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


### Why we use standardization here

Neural networks are trained
using gradient-based optimization.

Standardization:
- centers features around zero
- ensures comparable variance across features
- improves numerical stability during training

Without proper scaling:
- gradients may explode or vanish
- optimization may fail
- training becomes unstable

At this point:
- data is in NumPy format
- values are ready to be converted into tensors

In the next section,
we will explain **what the PyTorch model is**
and how neural networks perform regression
at a lower level.


___
## 5. What is this model? (Deep Learning Regression – PyTorch)

Before writing any PyTorch code,
it is important to understand
what the model is conceptually doing.

In regression,
the goal is to predict a **continuous numerical value**
from a set of input features.


### How regression works in a neural network

A neural network for regression:
- receives a vector of input features
- transforms it through multiple layers
- outputs a single real number

The model learns a function:

input vector → numerical output

Unlike linear regression,
this function is **non-linear**
and learned progressively.


### What PyTorch adds conceptually

With PyTorch:
- we define the model explicitly
- we control the forward pass
- we decide how loss is computed
- we update parameters manually

This makes PyTorch ideal
for understanding how learning happens,
not just that it happens.


### High-level learning process

Training follows a loop:
1. forward pass (prediction)
2. loss computation (error)
3. backward pass (gradients)
4. parameter update

This loop is repeated
until the model learns a good approximation
of the regression function.


### Key takeaway

PyTorch regression models:
- learn non-linear functions
- expose training mechanics explicitly
- behave similarly to scikit-learn models
  when architecture and data are the same

In the next section,
we will define the neural network architecture
using PyTorch.


___
## 6. Model training (PyTorch Regression)

In this section we define and train
a neural network regressor using PyTorch.

Unlike scikit-learn,
both the model and the training loop
must be written explicitly.


In [5]:
# ====================================
# Model definition
# ====================================

class RegressionNet(nn.Module):
    def __init__(self, input_dim):
        super().__init__()

        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.out = nn.Linear(32, 1)

        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.out(x)
        return x


In [6]:
# ====================================
# Training setup
# ====================================

input_dim = X_train_scaled.shape[1]

model = RegressionNet(input_dim)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


In [7]:
# ====================================
# Training loop
# ====================================

X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).view(-1, 1)

epochs = 100
losses = []

for epoch in range(epochs):
    model.train()

    optimizer.zero_grad()

    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    loss.backward()
    optimizer.step()

    losses.append(loss.item())


### What is happening during training

- The model performs a forward pass
- Predictions are compared to true values
- Loss (MSE) measures prediction error
- Gradients are computed automatically
- Parameters are updated using Adam

This explicit loop is the core of PyTorch.

In the next section,
we analyze model behavior
and key parameters.


___
## 7. Model behavior and key parameters

In this section we analyze
how the PyTorch regression model behaves
and which parameters influence learning.


### Model architecture

The network uses:
- two hidden layers
- ReLU activation
- linear output layer

This allows:
- non-linear feature interactions
- continuous output prediction


### Loss function

We use Mean Squared Error (MSE):
- penalizes large errors
- standard choice for regression
- differentiable and stable


### Optimizer behavior

Adam optimizer:
- adapts learning rates
- speeds up convergence
- works well for most regression tasks


### Training duration

More epochs:
- improve learning initially
- may cause overfitting if excessive

Monitoring loss over epochs
helps diagnose training behavior.


### Key takeaway

PyTorch gives full control
over model behavior.

This flexibility allows:
- custom architectures
- precise debugging
- deeper understanding

In the next section,
we will generate predictions
using the trained model.


___
## 8. Predictions

In this section we use the trained PyTorch model
to generate predictions on unseen test data.

Predictions are continuous numerical values.


In [8]:
# ====================================
# Predictions
# ====================================

model.eval()

X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)

with torch.no_grad():
    y_pred_tensor = model(X_test_tensor)

y_pred = y_pred_tensor.numpy().flatten()


### What happens during prediction

- The model is set to evaluation mode
- Gradients are disabled
- The forward pass generates predictions

This ensures:
- faster inference
- no gradient accumulation


### What we have now

At this point:
- `y_test` contains true target values
- `y_pred` contains predicted values

These will be compared
using regression metrics
in the next section.


___
## 9. Model evaluation

In this section we evaluate the performance
of the PyTorch regression model
on unseen test data.

For regression problems,
evaluation focuses on **prediction error**
and **quality of fit**.


In [9]:
# ====================================
# Regression evaluation metrics
# ====================================

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

mse, rmse, mae, r2


(0.7457036463188563,
 np.float64(0.8635413402488942),
 0.626965656223535,
 0.4309382347792724)

### How to read these results

- **RMSE**  
  Measures the typical prediction error
  in the same unit as the target variable.

- **MAE**  
  Measures the average absolute error
  and is more robust to outliers.

- **R² score**  
  Measures how much of the variance
  in the target is explained by the model.

These metrics together provide
a complete view of regression performance.


### Key takeaway

Evaluation metrics must always be computed
on unseen data.

A non-zero RMSE is expected
and indicates that the model is generalizing,
not memorizing the data.


___
## 10. When to use it and when not to

Deep Learning regression with PyTorch
is powerful but not always necessary.

Choosing this approach depends on
problem complexity and practical constraints.


### When to use PyTorch for regression

PyTorch regression is a good choice when:

- relationships are highly non-linear
- model architecture must be customized
- training dynamics need full control
- experimentation and research are required

It is especially useful
when building models from scratch
or exploring novel architectures.


### When NOT to use PyTorch for regression

PyTorch may not be ideal when:

- the dataset is small
- the problem is simple
- rapid prototyping is needed
- interpretability is critical

In these cases,
simpler models or scikit-learn
are often more efficient.


### Key takeaway

PyTorch offers maximum flexibility
at the cost of increased complexity.

It should be chosen
when control and understanding
are more important than convenience.


___
## 11. Model persistence

In this section we save the trained PyTorch model
and the preprocessing steps
used during training.


In [None]:
# ====================================
# Model persistence
# ====================================

model_dir = Path("models/supervised_learning/regression/deep_learning_pytorch")
model_dir.mkdir(parents=True, exist_ok=True)

# Save model state
torch.save(model.state_dict(), model_dir / "pytorch_regression_model.pt")

# Save scaler
joblib.dump(scaler, model_dir / "scaler.joblib")


### What we have saved

We saved:
- the trained PyTorch model parameters
- the feature scaler

Together, these represent
the complete regression pipeline.


### Why saving the scaler matters

Neural networks are sensitive
to feature scaling.

Using a different scaler
would lead to inconsistent predictions.

Saving the scaler ensures
reproducibility and correctness.


___
## 12. Mathematical formulation (deep dive)

This section describes the mathematical principles
behind deep learning regression
implemented in PyTorch.


### Regression objective

The dataset is represented as:

$$
\{(x_i, y_i)\}_{i=1}^n
$$

where:
- $x_i \in \mathbb{R}^d$
- $y_i \in \mathbb{R}$


### Model as a function

The neural network learns a function:

$$
\hat{y} = f(x; \theta)
$$

where:
- $\theta$ represents weights and biases
- $\hat{y}$ is the predicted value


### Layer transformations

Each hidden layer computes:

$$
h = \text{ReLU}(Wx + b)
$$

The output layer is linear:

$$
\hat{y} = W_{\text{out}} h + b_{\text{out}}
$$


### Loss function

Training minimizes Mean Squared Error:

$$
MSE = \frac{1}{n} \sum (y - \hat{y})^2
$$


### Optimization

Gradients are computed via backpropagation,
and parameters are updated using Adam:

$$
\theta \leftarrow \theta - \eta \nabla_\theta MSE
$$


### Final takeaway

Deep learning regression can be viewed
as non-linear function approximation
optimized via gradient descent.

PyTorch exposes these mechanisms explicitly,
making learning transparent and flexible.


___
## 13. Final summary – Code only

The following cell contains the complete
PyTorch regression pipeline.

No explanations are provided here.


In [None]:
# ====================================
# Imports
# ====================================

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

from pathlib import Path
import joblib


# ====================================
# Dataset loading
# ====================================

data = fetch_california_housing(as_frame=True)
X = data.data
y = data.target


# ====================================
# Train-test split
# ====================================

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


# ====================================
# Feature scaling
# ====================================

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


# ====================================
# Model definition
# ====================================

class RegressionNet(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.out = nn.Linear(32, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        return self.out(x)


input_dim = X_train_scaled.shape[1]
model = RegressionNet(input_dim)


# ====================================
# Training setup
# ====================================

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).view(-1, 1)


# ====================================
# Training loop
# ====================================

epochs = 100

for _ in range(epochs):
    optimizer.zero_grad()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    loss.backward()
    optimizer.step()


# ====================================
# Predictions
# ====================================

model.eval()
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)

with torch.no_grad():
    y_pred = model(X_test_tensor).numpy().flatten()


# ====================================
# Evaluation
# ====================================

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

mse, rmse, mae, r2


# ====================================
# Model persistence
# ====================================

model_dir = Path("models/supervised_learning/regression/deep_learning_pytorch")
model_dir.mkdir(parents=True, exist_ok=True)

torch.save(model.state_dict(), model_dir / "pytorch_regression_model.pt")
joblib.dump(scaler, model_dir / "scaler.joblib")
