# Deep Learning – Regression (scikit-learn)

This notebook is part of the **ML-Methods** project.

It introduces **Deep Learning for supervised regression**
using the scikit-learn implementation of neural networks.

As with all other notebooks in this project,
the initial sections focus on data preparation
and are intentionally repeated.

This ensures:
- conceptual consistency
- fair comparison across models
- a unified learning pipeline


___

## Notebook Roadmap (standard ML-Methods)

1. Project setup and common pipeline  
2. Dataset loading  
3. Train-test split  
4. Feature scaling (why we do it)  

----------------------------------

5. What is this model? (Intuition)  
6. Model training  
7. Model behavior and key parameters  
8. Predictions  
9. Model evaluation  
10. When to use it and when not to  
11. Model persistence  
12. Mathematical formulation (deep dive)  
13. Final summary – Code only


___
## How this notebook should be read

This notebook is designed to be read **top to bottom**.

Before every code cell, you will find a short explanation describing:
- what we are about to do
- why this step is necessary
- how it fits into the overall process

The goal is not just to run the code,
but to understand how **deep learning regression**
differs from classical regression models
and how it fits into the supervised learning pipeline.


___
## What is Deep Learning (in this context)?

In this notebook,
Deep Learning refers to **neural networks with multiple layers**
used to solve **regression problems**.

Unlike classification:
- the target is a continuous value
- there are no class labels
- the model predicts a real number

The neural network learns a function:

input features → continuous output


___
## What do we want to achieve?

Our objective is to train a model that:
- receives a vector of numerical features
- processes them through multiple layers
- outputs a single continuous value

The model learns how combinations of input features
map to a numerical target.

This is useful when:
- relationships are non-linear
- classical linear regression is insufficient
- feature interactions are complex


___
## Why use scikit-learn for Deep Learning regression?

scikit-learn provides a high-level abstraction
for neural networks through `MLPRegressor`.

This allows us to:
- reuse the same ML pipeline as classical models
- focus on concepts rather than low-level training details
- understand *what* deep learning regression does
  before implementing it manually

This notebook acts as a **bridge**
between classical regression
and full deep learning frameworks
such as PyTorch and TensorFlow.


___
## What you should expect from the results

With Deep Learning (scikit-learn regression),
you should expect:

- ability to model non-linear relationships
- improved performance on complex patterns
- sensitivity to feature scaling
- longer training times than linear models

However:
- interpretability is low
- hyperparameter tuning is important
- the model behaves as a black box


___
## 1. Project setup and common pipeline

In this section we set up the common pipeline
used across regression models in this project.

Although this notebook uses a neural network,
the data preparation steps
remain identical to other regression approaches.


In [1]:
# ====================================
# Common imports used across regression models
# ====================================

import numpy as np
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor

from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score
)

from pathlib import Path
import joblib
import matplotlib.pyplot as plt


### What changes compared to classical regression

Compared to linear regression:
- the target remains continuous
- the pipeline remains identical
- the model becomes non-linear

The main difference lies in the model itself,
not in the surrounding workflow.

In the next section,
we will load the dataset
used for the regression task.


___
## 2. Dataset loading

In this section we load the dataset
used for the deep learning regression task.

We use a **regression-specific dataset**
with a continuous target variable,
which allows us to evaluate how neural networks
behave when predicting real-valued outputs.


In [2]:
# ====================================
# Dataset loading
# ====================================

data = fetch_california_housing(as_frame=True)

X = data.data
y = data.target


### Inputs and target

- `X` contains the input features
- `y` contains the target variable

This is a **supervised regression problem**:
- each sample has a continuous target value
- the goal is to predict a real number, not a class

### Why this dataset

The California Housing dataset is well suited
for regression because:
- it contains numerical features
- relationships are non-linear
- target values are continuous

This makes it a good benchmark
for comparing classical regression
and deep learning regression models.

At this stage:
- data is still in pandas format
- no preprocessing has been applied yet

In the next section,
we will split the dataset
into training and test sets.


___
## 3. Train-test split

In this section we split the dataset
into training and test sets.

This step allows us to evaluate
how well the neural network regressor
generalizes to unseen data.


In [3]:
# ====================================
# Train-test split
# ====================================

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)


### Why this step is essential

A regression model must be evaluated
on data it has never seen during training.

By splitting the data:
- the training set is used to learn the mapping
  from features to target values
- the test set is used only for evaluation

This prevents overly optimistic results
and reflects real-world performance.

### Choice of split ratio

An 80 / 20 split is a common default:
- enough data to train the model
- enough data to reliably evaluate predictions

At this point:
- training and test data are separated
- no preprocessing has been applied yet

In the next section,
we will apply **feature scaling**.

For deep learning regression,
this step is **mandatory**.


___
## 4. Feature scaling (why we do it)

In this section we apply feature scaling
to the input data.

For deep learning regression models,
feature scaling is **mandatory**.

Neural networks are trained using gradient-based optimization,
which is highly sensitive to the scale of input features.


In [4]:
# ====================================
# Feature scaling
# ====================================

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


### Why we use standardization here

We use **standardization** for feature scaling
because neural networks rely on gradients
to update their parameters.

Standardization:
- centers features around zero
- ensures comparable variance across features
- improves numerical stability during training

This helps:
- gradients behave more predictably
- optimization converge faster
- training remain stable across layers

At this stage:
- data is numerically ready
- still in NumPy format

In the next section,
we will explain **what this model is**
and how a neural network performs **regression**
using scikit-learn.
