# Deep Learning – Classification (PyTorch)

This notebook is part of the **ML-Methods** project.

It introduces **Deep Learning for supervised classification**
using **PyTorch**, a low-level and flexible deep learning framework.

As with the other classification notebooks,
the first sections focus on data preparation
and are intentionally repeated.

This ensures consistency across models
and allows fair comparison of results.

-----------------------------------------------------

## Notebook Roadmap (standard ML-Methods)

1. Project setup and common pipeline  
2. Dataset loading  
3. Train-test split  
4. Feature scaling (why we do it)  

----------------------------------

5. What is this model? (Intuition)  
6. Model training  
7. Model behavior and key parameters  
8. Predictions  
9. Model evaluation  
10. When to use it and when not to  
11. Model persistence  
12. Mathematical formulation (deep dive)  
13. Final summary – Code only  

-----------------------------------------------------

## How this notebook should be read

This notebook is designed to be read **top to bottom**.

Before every code cell, you will find a short explanation describing:
- what we are about to do
- why this step is necessary
- how it fits into the overall process

Compared to scikit-learn,
this notebook exposes **more internal details**
of how a Deep Learning model is trained.

The goal is not only to run the code,
but to understand **what happens during training**
and how neural networks learn step by step.

-----------------------------------------------------

## What is Deep Learning (in this context)?

Deep Learning refers to a class of models
based on **neural networks with multiple layers**.

These models are designed to:
- learn complex, non-linear relationships
- build internal representations of the data
- improve performance as data complexity increases

In this notebook, we focus on:
**Deep Learning for tabular classification**
using fully connected neural networks.

-----------------------------------------------------

## Why PyTorch?

PyTorch is a **low-level deep learning framework**
that provides explicit control over:

- model architecture
- forward pass
- loss computation
- backpropagation
- parameter updates

Unlike scikit-learn:
- nothing is hidden
- every step must be defined explicitly

This makes PyTorch ideal for:
- learning how neural networks actually work
- understanding gradient-based optimization
- experimenting with custom architectures

-----------------------------------------------------

## Execution model: eager execution

PyTorch uses **eager execution** by default.

This means:
- operations are executed immediately
- tensors behave like regular Python objects
- debugging is straightforward

Eager execution makes PyTorch:
- intuitive to learn
- flexible to experiment with
- closer to the mathematical description of the model

-----------------------------------------------------

## What you should expect from the results

With Deep Learning (PyTorch), you should expect:

- non-linear decision boundaries
- strong performance on complex data
- behavior similar to scikit-learn neural networks
- higher transparency during training

However:
- more code is required
- implementation errors are easier to make
- careful design is necessary

-----------------------------------------------------


## 1. Project setup and common pipeline

In this section we set up the common pipeline
used across classification models in this project.

Although this notebook uses **PyTorch**,
the overall workflow remains identical
to the scikit-learn Deep Learning notebook.

This allows us to:
- reuse the same data preparation steps
- compare models fairly
- isolate the effect of the framework choice


In [1]:
# Common imports used across classification models

import numpy as np
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    classification_report,
    ConfusionMatrixDisplay
)

from pathlib import Path
import matplotlib.pyplot as plt

# ====================================
# PyTorch imports
# ====================================

import torch
import torch.nn as nn
import torch.optim as optim


### What changes with PyTorch

Compared to scikit-learn:
- the pipeline structure remains the same
- data preparation and evaluation stay unchanged
- only the model implementation differs

With PyTorch, we explicitly define:
- how the model processes the input
- how the loss is computed
- how parameters are updated

Nothing is hidden.

Every step of the learning process
is written manually in code.

This makes PyTorch ideal
for understanding what neural networks
are actually doing during training.

In the next section,
we will load the dataset
and prepare it for PyTorch training.


____________
## 2. Dataset loading

In this section we load the dataset
used for the Deep Learning classification task.

We intentionally use the **same dataset**
adopted in previous classification notebooks.

This ensures:
- direct comparison with classical ML models
- fair comparison across deep learning frameworks
- focus on implementation differences, not on data


In [2]:
# ====================================
# Dataset loading
# ====================================

data = load_breast_cancer(as_frame=True)

X = data.data
y = data.target


### What we have after this step

- `X` contains the input features
- `y` contains the target labels

This is a **binary classification problem**,
where each sample belongs to one of two classes.

At this stage:
- data is still in NumPy / pandas format
- this is intentional for consistency
- conversion to PyTorch tensors will happen later

In the next section,
we will split the dataset
into training and test sets.


_______________
## 3. Train-test split

In this section we split the dataset
into training and test sets.

This step allows us to evaluate
how well the neural network generalizes
to unseen data.


In [3]:
# ====================================
# Train-test split
# ====================================

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)


### What we have after this step

After splitting the data:
- the training set is used to learn model parameters
- the test set is kept completely unseen
- evaluation reflects real-world performance

An 80 / 20 split is a common default
for medium-sized datasets.

In the next section,
we will apply **feature scaling**.

For Deep Learning models,
this step is **mandatory**.


_____________
## 4. Feature scaling (why we do it)

In this section we apply feature scaling
to the input data.

For Deep Learning models,
feature scaling is **mandatory**.

Neural networks rely on gradient-based optimization,
which is highly sensitive to feature scale.


In [4]:
# Feature scaling

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


### Why we use standardization here

We use **standardization** for feature scaling
because neural networks are trained
using gradient-based optimization.

Standardization:
- centers features around zero
- ensures comparable feature variance
- improves numerical stability during training

This helps:
- gradients behave more predictably
- optimization converge faster
- training remain stable across layers

For Deep Learning models,
this preprocessing choice is part of the model design,
not just a data transformation.


### Why scaling is essential here

Without proper scaling:
- gradients may vanish or explode
- optimization becomes unstable
- training may converge very slowly or fail

By scaling the features:
- all inputs are brought to a comparable range
- gradient descent becomes more stable
- learning is faster and more reliable

At this point:
- data is numerically ready
- but still in NumPy format

In the next section,
we will explain **what this model is**
and how a neural network performs classification in PyTorch.


____________
## 5. What is this model? (Deep Learning with PyTorch)

Before writing any PyTorch code,
it is important to understand
**what the model does in practice**
and **what we are manually controlling**.

PyTorch does not hide the learning process.
Everything that happens during training
is explicitly written by us.


### What do we want to achieve?

We want to build a model that:
- receives a vector of input features
- processes them through multiple transformations
- outputs a class prediction

Each input sample can be seen as:
- a list of numerical measurements
- describing a single object
- represented as a vector

The goal of the model is to learn
how to transform this vector
into the correct class label.


### What does the model do, step by step?

A neural network for classification performs
the following operations:

1. Take the input feature vector  
2. Multiply it by learnable weights  
3. Add a bias term  
4. Apply a non-linear function  
5. Repeat this process across multiple layers  
6. Produce an output used to decide the class  

Each step is simple.
The power comes from repeating them
many times in sequence.


### What is a neuron, technically?

A neuron is a very simple computational unit.

It answers one basic question:
> *Is a specific pattern present in the input?*

Technically, a neuron:
- combines input features linearly
- applies a non-linear transformation
- outputs a single value

During training,
the neuron learns which patterns matter
by adjusting its weights.


### Why multiple layers?

A single layer can only learn
simple patterns.

By stacking layers:
- early layers learn basic feature combinations
- deeper layers learn more abstract patterns
- the final layer focuses on class separation

Each layer builds on the output
of the previous one.

This is how the model gradually constructs
a useful internal representation of the data.


### What makes PyTorch different here?

With PyTorch:
- we explicitly define the network structure
- we manually control the training loop
- we decide how loss is computed
- we decide how parameters are updated

Nothing is automatic or hidden.

This allows us to:
- see how predictions are produced
- understand how errors drive learning
- connect code directly to theory


### How learning happens conceptually

Learning follows a simple cycle:

1. The model makes a prediction  
2. The prediction is compared to the true label  
3. An error value is computed  
4. The model parameters are adjusted  
5. The process repeats  

Each iteration slightly improves
the alignment between predictions
and true labels.

This gradual improvement
is what we call training.


### Key takeaway

A PyTorch neural network classifier:
- processes data step by step
- learns by repeatedly correcting its mistakes
- builds internal representations through layers
- requires explicit definition of every step

PyTorch forces us to understand
**how learning actually happens**,
not just what the final result is.

In the next section,
we will implement this process manually
by defining the model
and writing the training loop.
