<div style="text-align: center;">
    <a target="_blank" href="https://colab.research.google.com/github/bmalcover/cursSocib/blob/main/2_AA/2_1_Tabular.ipynb">
        <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
    </a>
</div>

# Deep Learning

With this chapter we will start working with `PyTorch` allowing us to use more complex models, mainly deep neural networks. This is an open-source software library in the field of machine learning. It is written in Python, C++, and CUDA, and is based on the Torch software library from the LUA language. `PyTorch` was initially developed by the artificial intelligence department of the company Facebook and the company Uber.

It is far more powerful than `scikit-learn` and therefore also more complex to use.

### Key Differences (``PyTorch`` vs ``Scikit-learn``):

| Aspect                    | PyTorch                              | Scikit-learn                                |
|--------------------------|---------------------------------------|---------------------------------------------|
| Flexibility              | Very high (custom models/layers)      | Limited to standard configurations          |
| Model Definition         | Manual using classes and layers       | High-level API           |
| Training Loop            | You write your own loop               | Handled internally                          |
| Control over Forward Pass| Full control                          | Not customizable                            |
| Good for                 | Research, experimentation, deep nets | Quick prototyping, small tasks              |

---

## A classification example

We will begin to get to know this library through the execution of a dataset you already have worked with the **Titanic** dataset. You already cleaned this dataset. We will work with a clean version of it.


In [None]:
import torch
import pandas as pd
from sklearn.model_selection import train_test_split

In [None]:
df = pd.read_csv("../titanic/train.csv")

## Data cleaning

In [None]:
df.info()

First we make the categorical columns *One Hot encoding* if needed.

In [None]:
dummy_fields=['Pclass', 'Sex', 'Embarked']
for each in dummy_fields:
    dummies= pd.get_dummies(df[each], prefix= each, drop_first=False)
    df = pd.concat([df, dummies], axis=1)
df.head()

Then we remove all the columns that do not give us any information for the training: **Name**, **ID**, **Ticket**, and **Cabin**. And the columns that are already converted to one hot encoding.

In [None]:
df = df.drop(["Name", "PassengerId", "Ticket", "Cabin", "Pclass", "Sex", "Embarked"], axis=1)

In [None]:
to_normalize=['Age','Fare']
for each in to_normalize:
    mean, std= df[each].mean(), df[each].std()
    df.loc[:, each]=(df[each]-mean)/std

df.head()

We check whether there are NaNs and if that is the case we remove the rows.

In [None]:
if df.isnull().values.any():
    df = df.dropna()
    print("NaNs removed")

In [None]:
df.info()

In [None]:
for series_name in df.columns:
    if df[series_name].dtype == bool:
        df[series_name] = df[series_name].astype('int')

In [None]:
X = df
y = df.pop("Survived")

# Important to use a pytorch model we need to work with Tensor

X_tensor = torch.Tensor(X.values)
y_tensor = torch.Tensor(y.values)

X_train, X_test, y_train, y_test = train_test_split(X_tensor, y_tensor, test_size=0.33, random_state=42)

## Training a model with Pytorch

Neural networks for classification problems used many layers. However, for tabular data we can use a simple model with only a type of layer and an activation function.

### Dense (Fully Connected) Layers

A **dense layer**—also called a **fully connected (FC) layer**—is a fundamental building block in neural networks. It performs a linear transformation of the input data.

Mathematically, a dense layer computes:

$$
\mathbf{y} = \mathbf{W} \cdot \mathbf{x} + \mathbf{b}
$$

Where:
- $\mathbf{x}$ is the input vector (e.g., a flattened image),
- $\mathbf{W}$ is the weight matrix (learned parameters),
- $\mathbf{b}$ is the bias vector (also learned),
- $\mathbf{y}$ is the output vector.

Each neuron in a dense layer is connected to **every** neuron in the previous layer, hence the term *fully connected*.

---

### ReLU Activation Function

After applying a dense layer, we often apply an **activation function** to introduce non-linearity. One of the most commonly used is the **ReLU (Rectified Linear Unit)**, defined as:

$$
\text{ReLU}(z) = \max(0, z)
$$

That is:
- If the input $ z > 0 $, output is $z$,
- If the input $ z \leq $, output is $ 0$.

---

### Sigmoid or not sigmoid

In neural networks, the final layer often has a specific purpose depending on the task. For **binary classification**, we typically use a **Sigmoid activation function** in the **last layer**, defined as:

$$
\text{Sigmoid}(z) = \frac{1}{1 + e^{-z}}
$$

That is:
- It maps the raw output (logits) into a value between **0 and 1**,
- The result can be interpreted as a **probability** of the positive class.

We use **Sigmoid in the last layer** because:
- It provides a **probabilistic output**, which is ideal when deciding between two classes,
- It compresses arbitrary input values into a normalized scale.

However:
- It should be avoided in multi-class classification tasks — use **Softmax** instead,
- If using `nn.BCEWithLogitsLoss`, you **should not apply Sigmoid manually**, as the function includes it internally for numerical stability.


### Pytorch implementation

In PyTorch, we define neural networks by subclassing `nn.Module`, which is the base class for all neural network modules. Each layer of the model is declared in the `__init__` method, and the forward computation is defined in the `forward` method.

Below is an example of a simple Multilayer Perceptron (MLP) with one hidden layer:


- `nn.Linear(in_features, out_features)` defines a fully connected (dense) layer.
- `F.relu(...)` applies the ReLU (Rectified Linear Unit) activation function.

This structure is flexible and allows you to create any custom architecture by chaining layers and operations in `forward()`.

- **Exercise**: We need to define the input and output of each layer. How we do it?


In [None]:
import torch.nn as nn
import torch.nn.functional as F

class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(12, 30)
        self.fc2 = nn.Linear(30, 30)
        self.fc3 = nn.Linear(30, 1)

        self.relu1 = nn.ReLU()
        self.relu2 = nn.ReLU()

    def forward(self, x):
        x = self.fc1(x)  # Apply ReLU activation to hidden layer output
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x) # Output layer (raw logits)
        x = F.sigmoid(x)

        return x

Once we have defined the model we can instantiate. The resulting object will can be called directly, this operation will execute the `forward` method.

In [None]:
net = MLP()

Now we have a first model, we have the data, but how we train the model?

## Theoretical Summary: How to Train an MLP Model with PyTorch

Training a neural network like an MLP involves a few essential steps: forward pass, loss computation, backpropagation, and parameter updates. PyTorch provides a flexible and modular way to handle this process.

---


### 1. Choose a Loss Function

The **loss function** measures the difference between the model's predictions and the true labels.

For classification tasks:
```python
criterion = nn.CrossEntropyLoss()
```

---

### 2. Select an Optimizer

An **optimizer** updates the model’s parameters using the gradients computed during backpropagation.

Example using stochastic gradient descent (SGD):
```python
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
```

You could also use more advanced optimizers like **Adam**:
```python
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
```

---

### 3. Training Loop

This is the heart of the training process. For each epoch (complete pass through the dataset), you perform:

#### a. Forward Pass
Compute the model's predictions:
```python
outputs = model(inputs)
```

#### b. Loss Computation
Calculate how wrong the predictions are:
```python
loss = criterion(outputs, targets)
```

Below is a table summarizing some of the most commonly used loss functions in PyTorch, along with brief verbatim explanations of what they do.

| Loss Function | Description |
|---------------|-------------|
| `nn.MSELoss` | Measures the mean squared error (squared L2 norm) between each element in the input and target. |
| `nn.L1Loss` | Measures the mean absolute error (L1 distance) between each element in the input and target. |
| **`nn.CrossEntropyLoss`** | Combines `LogSoftmax` and `NLLLoss` in one single class. Useful for multi-class classification. |
| `nn.NLLLoss` | The negative log likelihood loss. Used in conjunction with `log_softmax` output. |
| `nn.BCELoss` | Binary Cross Entropy loss. Used for binary classification tasks. |
| **`nn.BCEWithLogitsLoss`** | Combines a `Sigmoid` layer and the `BCELoss` in one class. More numerically stable. |
| `nn.HingeEmbeddingLoss` | Measures whether inputs are similar or dissimilar using a margin-based criterion. |
| `nn.MarginRankingLoss` | Encourages a distance margin between ranked inputs. Useful for pairwise ranking tasks. |
| `nn.HuberLoss` | Combines advantages of `L1Loss` and `MSELoss`, less sensitive to outliers. |
| `nn.SmoothL1Loss` | A smooth version of L1 loss, often used in regression tasks such as object detection. |
| `nn.KLDivLoss` | Kullback-Leibler divergence loss, useful for comparing probability distributions. |

#### c. Backward Pass
Compute gradients of the loss with respect to model parameters:
```python
loss.backward()
```

#### d. Parameter Update
Apply the gradients to update the model's parameters:
```python
optimizer.step()
```

#### e. Zero Gradients
Clear previous gradients before the next iteration:
```python
optimizer.zero_grad()
```

Full loop example:
```python
for epoch in range(num_epochs):
    outputs = model(inputs)
    loss = criterion(outputs, targets)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
```

---

### 4. Evaluation

After training, you should evaluate the model on a validation or test set using `torch.no_grad()` to disable gradient tracking:

```python
model.eval()
with torch.no_grad():
    predictions = model(test_inputs)
```

---

### Summary

| Step               | Purpose                                           |
|--------------------|---------------------------------------------------|
| Loss Function      | Measure prediction error                          |
| Optimizer          | Update weights using gradients                    |
| Forward Pass       | Predict outputs                                   |
| Loss + Backward    | Compute gradients                                 |
| Optimizer Step     | Adjust weights                                    |
| Zero Grad          | Prevent gradient accumulation                     |
| Evaluation         | Test performance without gradient tracking        |

This process is repeated over multiple **epochs**, gradually improving the model’s performance on the task.


Now we make our first training. We will use the titanic data, the simple model we have defined.

In [None]:
from sklearn.metrics import accuracy_score

net = MLP()

LR = 1e-3
EPOCHS = 1000

optimizer = torch.optim.Adam(net.parameters(), lr=LR)
criterion = nn.BCELoss()

## The Loop

### V1

In [None]:
net.train()

for epoch in range(EPOCHS):
    output = net(X_train)
    loss = criterion(output, y_train.reshape(-1, 1))

    loss.backward()
    optimizer.step()

    acc = accuracy_score(y_train, (output.detach() > 0.5).int())

    print(f"Accuracy: {acc}")

### V2

In [None]:
net.train()

accuracies = []
for epoch in range(EPOCHS):
    output = net(X_train)
    loss = criterion(output, y_train.reshape(-1, 1))

    loss.backward()
    optimizer.step()

    acc = accuracy_score(y_train, (output.detach() > 0.5).int())

    net.eval()
    output_val = net(X_test)

    acc_val = accuracy_score(y_test, (output_val.detach() > 0.5).int())
    accuracies.append(acc_val)

    print(f"Accuracy: {acc} - {acc_val}")

In [None]:
from matplotlib import pyplot as plt

plt.title("Accuracy")
plt.plot(accuracies);

### V3

In [None]:
import copy

best_acc = 0
best_weights = None

losses = []
accuracy = []
for epoch in range(EPOCHS):
    net.train()
    output = net(X_train)
    loss = criterion(output, y_train.reshape(-1, 1))

    losses.append(loss.detach().numpy())

    loss.backward()
    optimizer.step()

    net.eval()
    output_val = net(X_test)

    acc = accuracy_score(y_train, (output.detach() > 0.5).int())
    acc_val = accuracy_score(y_test, (output_val.detach() > 0.5).int())

    if acc_val > best_acc:
        best_acc = acc
        best_weights = copy.deepcopy(net.state_dict())

    accuracy.append(acc_val)
net.load_state_dict(best_weights);

In [None]:
from matplotlib import pyplot as plt

plt.subplot(1, 2, 1)
plt.title("Loss")
plt.plot(losses)
plt.subplot(1, 2, 2)
plt.title("Accuracy")
plt.plot(accuracy);

# Regression

Until now, we have worked on a **binary classification** task — predicting a category (class) based on input data. Another widely used type of supervised learning is **regression**, where the goal is to predict a **continuous numeric value** instead of a class label.

**So, what changes do we need to make to adapt our classification model for regression?**

First of all we obtained data for a regression problem: [Wine Quality](https://archive.ics.uci.edu/dataset/186/wine+quality)

Info obtained from their website:

>  Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/).


In [None]:
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import train_test_split

# fetch dataset
wine_quality = fetch_ucirepo(id=186)

# data (as pandas dataframes)
X = wine_quality.data.features
y = wine_quality.data.targets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

First we want to see whether the data is balanced.

In [None]:
plt.hist(y_train)
plt.show()

## Now you do it?

What we have to change?