# **Lab 4.1: Neural Networks (Regression)**

<hr>

## **1. Introduction**  
In previous practices, we learned how to solve classification and regression problems using traditional machine learning methods or models. In this practice, we will see how to solve these types of problems using **Neural Networks**, which are among the most commonly used methods today.  

These networks form the foundation of *Deep Learning* and have the capability of being highly versatile as they are composed of multiple neurons organized into layers. These neurons are also called *perceptrons*, which is why neural networks are also known as **Multilayer Perceptrons**.  

Just like with other models, `scikit-learn` provides a couple of classes that facilitate the use of these networks:  

* **MLPRegressor:** A multilayer perceptron designed to solve regression problems.  
* **MLPClassifier:** A multilayer perceptron designed to solve classification problems.  

For this type of model, we will not use these predefined classes since we are interested in understanding their architecture and internal functioning. For this reason, we will use a new library, [`tensorflow`](https://www.tensorflow.org/), which is one of the most widely used in Python for deep learning tasks, along with [`pytorch`](https://pytorch.org/).  

### **Objectives**  
In this practice, you will learn to:  
* Create and train Neural Networks.  
* Optimize their hyperparameters.  
* Add nonlinear activation functions.  

We will start by installing the library in our environment.

In [None]:
! pip install tensorflow

One of the advantages of neural networks is that they can be trained on either a CPU or a GPU.  

Later on, we will see how to train one of these networks on the GPUs of the lab computers to accelerate the training process. On your personal computer, you will likely only be able to run it on a CPU.  

Next, we load our data once again:

In [None]:
import pandas as pd

seed = 2533
data = pd.read_pickle('https://raw.githubusercontent.com/AIC-Uniovi/Sistemas-Inteligentes/refs/heads/main/datasets/f1_23_monaco.pkl')

<hr>

## **2. Regression Problems**  

We will attempt to solve the same problem as in the previous practice, which is:  

<div class="alert alert-block alert-success">
    <b>Create a model that, given the time in the first sector `<code>Sector1Time</code>, can predict the total lap time <code>LapTime</code>.</b>
</div> 

The first step will be to create the necessary datasets to train a model.  

### **2.1. Data Preprocessing**


<hr>

<div class="alert alert-block alert-info">  
    <b>Exercise:</b> Separate the X and Y from the dataframe <code>data_sector2lap</code>, split it into training and test sets (80/20) by setting a random seed, and finally <b>standardize</b> the X values.  
    <hr>  
    When X has only one column, you must use double brackets (<code>data[['column_name']]</code> instead of <code>data['column_name']</code>) for <code>StandardScaler()</code> to work correctly.  
</div>  

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data_sector2lap = data[['LapTime', 'Sector1Time']].copy()
data_sector2lap['LapTime'] = data_sector2lap['LapTime'].dt.total_seconds()
data_sector2lap['Sector1Time'] = data_sector2lap['Sector1Time'].dt.total_seconds()

# Your code here

### **2.2. Machine Learning**  

With the data ready, we will train and evaluate the well-known machine learning models once again to compare them with our new system.

<div class="alert alert-block alert-info">  
    <b>Exercise:</b> Train and evaluate the remaining models (<i>K-Nearest Neighbors</i>, <i>Decision Trees</i>, and <i>SVR</i>) using the following function.  
</div>  

In [None]:
# This library helps us easily create tables in the console.  
! pip install tabulate  

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score  
from tabulate import tabulate  
from sklearn.linear_model import LinearRegression  
from sklearn.dummy import DummyRegressor  

def evaluate_model(Y_test, preds_test, model_name):  
    metrics = [  
        ('MAE', mean_absolute_error(Y_test, preds_test)),  
        ('MSE', mean_squared_error(Y_test, preds_test)),  
        ('R²', r2_score(Y_test, preds_test))  
    ]  
    
    print(f'Results for {model_name}:')  
    print(tabulate(metrics, headers = ['Metric', 'TEST'], tablefmt = 'rounded_outline'))  
    print()  

# Baseline mean  
baseline_mean = DummyRegressor(strategy = 'mean')  
baseline_mean.fit(X_train, Y_train)  
preds_test = baseline_mean.predict(X_test)  
evaluate_model(Y_test, preds_test, 'Baseline')  

# Linear Regression  
model_linear = LinearRegression()  
model_linear.fit(X_train, Y_train)  
preds_test = model_linear.predict(X_test)  
evaluate_model(Y_test, preds_test, 'Linear')  

# Your code here  

The results should look something like this:

<center>

| Model                  | MAE Test  | MSE Test  | R² Test  |
|------------------------|-----------|-----------|----------|
| *Baseline*             | 5.882     | 52.489    | -0.008   |
| *Linear*               | 1.056     |  2.298    |  0.956   |
| *KNN*                  | 0.820     |  1.801    |  0.965   |
| *Decision Trees*       | 1.110     |  3.575    |  0.931   |
| *SVR*                  | 0.735     |  1.738    |  0.967   |

</center>

### **2.3. Neural Network**  

Once we have the datasets and results from the traditional models, we can now create our first neural network from scratch.  

The steps to create and train a neural network are as follows:  

1) Create the model architecture.  
2) Define the optimizer, loss function, and compile the model.  
3) Train and evaluate the model.  

#### **2.3.1. Create Architecture**  

The first thing we need to define is its architecture, meaning the number of fully connected layers that make up the network.  

<div class="alert alert-block alert-warning">  
    <strong>Note:</strong> As you know, the size of the input and output layers is determined by the problem at hand. In this case, we have 1 input and 1 output.  
</div>  

The simplest network in this case would be as follows:

In [None]:
import os, random  
import numpy as np  
import tensorflow as tf  

from tensorflow.keras.models import Sequential  
from tensorflow.keras.layers import Dense, Input  

# Set the seeds for the libraries to ensure results are reproducible.  
os.environ['PYTHONHASHSEED'] = str(seed)  
random.seed(seed)  
np.random.seed(seed)  
tf.random.set_seed(seed)  

# Define the layers of the model  
model = Sequential()  
model.add(Input(shape = (1,)))  
model.add(Dense(1, name = 'output_layer'))

The previous code simply creates a `Sequential()` model, meaning a model where layers are added one after the other.  

Next, we introduce the following layers into it:  

* `Input()`: Input layer. It is not a layer in itself; it simply allows the model to know the size of the inputs.  
* `Dense()`: Fully connected layer ($y=Wx+b$). We specify the size as a required parameter, and `name` is optional.  

To view the final architecture of our model, we can execute the `summary()` method, which also provides information about the parameters.

In [None]:
model.summary()

It is important to note that, as we mentioned earlier, since `Input()` is not actually a layer, it does not even appear in the summary.  

Another relevant piece of information is the number of parameters or weights in the model, i.e., how many $W$ and $b$ it needs to learn during training to correctly transform the input into the output.  

In this case, there are only 2: one $w$ and one $b$.  

#### **2.3.2. Compile the Model**

Compiling a model in Keras is an essential step before training it. It serves to configure the learning process, specifying how the model will be optimized and how its performance will be evaluated. Essentially, it defines the following key elements:

*   **Optimizer:** Defines the algorithm that will be used to adjust the model's weights during training, aiming to minimize the loss function. Common examples are `Adam`, `SGD`, `RMSprop`, among others. Each optimizer has its own hyperparameters (such as learning rate) that can be adjusted.

*   **Loss Function:** Measures the difference between the model's predictions and the actual values (labels). The goal of training is to **minimize this loss**. The choice of loss function depends on the type of problem (classification, regression, etc.). Examples: `categorical_crossentropy` (for multiclass classification), `binary_crossentropy` (for binary or multilabel classification), `mean_squared_error` or `mean_absolute_error` (for regression).

Next, we compile the model by specifying the `Adam` optimizer with a `learning_rate = 0.05` and a regression loss function (`mean_absolute_error`).

In [None]:
from tensorflow.keras.optimizers import Adam

learning_rate = 0.001
optim = Adam(learning_rate = learning_rate)
model.compile(loss = 'mean_absolute_error', optimizer = optim )

#### **2.3.3. Train and Evaluate**

The next step is the training and evaluation of the model. In `keras`, training is done through the [`.fit()`](https://keras.io/api/models/model_training_apis/) method, just like in `scikit-learn`.  

This method also takes in the training data ($X$ and $Y$) and iteratively adjusts the model's weights (the $W$ and $b$) to minimize the loss function.

Specifically, this method performs the following steps:

1) **Batch Creation:** Divides the entire training dataset into batches (blocks of examples) and obtains the model's prediction for each batch. Neural networks are designed to work with very large datasets, and often it's not feasible to train with the entire dataset at once, as we did in previous machine learning methods.
2) **Calculates the Loss:** Compares the model's predictions $\hat{Y}$ with the actual labels $Y$ and calculates the loss defined during compilation, $MAE$ in this case.
3) **Calculates Gradients:** Uses the backpropagation algorithm to compute the gradients of the loss function with respect to the model's weights. The gradients indicate the direction and magnitude of the change needed in the weights to reduce the loss.

<center>
    <div style="border-radius:5px; padding:10px; background:white; max-width:900px">
        <img src="https://i.imgur.com/1tscXrJ.png">   
    </div>
</center>

4) **Updates Weights:** Adjusts the model's weights using the optimizer based on the calculated gradients. The optimizer determines how the weights should be updated efficiently to minimize the loss.
5) **Repeats the Process:** Repeats the above steps for each of the training batches. Once all batches are processed, the entire process can be repeated for a set number of epochs.

<div class="alert alert-block alert-warning">
    <strong>Note:</strong> An epoch represents a full pass through the entire training dataset.
</div>

Next, we train the model using the training data:

In [None]:
# Train the model. The verbose parameter in the train method allows configuring the amount of information displayed in the console during training.
history = model.fit(X_train, Y_train, validation_split = 0.2, batch_size = 64, epochs = 200, verbose = 2)

<div class="alert alert-block alert-warning">
    <strong>Note:</strong> Every time you run the previous block, <strong>training will continue</strong> from where it left off. To train the model from scratch, you would need to recreate and recompile it.
</div>

As you can see, there is an argument in the `fit` method that we haven't discussed: `validation_split`.

This argument reserves a fraction of the training data to use it as a validation set; in other words, it separates 20% of the examples from the training set (in this case, since `validation_split=0.2`).

So now, our dataset is split into three distinct blocks: Train, Validation, and Test. This approach is also known as **metavalidation**.

##### **Metavalidation**

Until now, to validate our model, we have used **simple validation** or hold-out, meaning we split it into: Train and Test.

This approach is very valid for cases where the models we want to evaluate do not have **hyperparameters**, like linear regression.

<div class="alert alert-block alert-warning">
    <strong>Note:</strong> The validation set is used to adjust the model's hyperparameters. <hr> 
    Remember that a hyperparameter is any value we can set when creating a model, for example, the K in KNN.
</div>

Neural networks have multiple hyperparameters to adjust such as `learning rate`, `batch size`, `epochs`, `number of layers`, `number of neurons`, etc. Therefore, a validation set is necessary.

Now, the training process changes slightly: Based on the training data, the model calculates the loss and updates the weights accordingly, then *calculates the loss on the validation set* **but does not update the weights with this data**.

This allows:

* **Monitor overfitting:** If performance on the training set keeps improving but performance on the validation set stagnates or worsens, this indicates that the model is starting to overfit the training data and losing its ability to generalize to new data.
* **Adjust hyperparameters:** We adjust the values of hyperparameters to select the configuration that produces the best performance on the validation data.

<hr>

##### **Why don't we use Test directly to adjust the hyperparameters?**

Imagine you want to bake the perfect cake to take to a baking contest.

* **Training set (train):** These are all the tests you do at home with different recipes, temperatures, and times. Here, you try many combinations to learn how each change affects the outcome.

* **Validation set (val):** Every time you bake a cake at home, you let your friends or family try it. They tell you if it's too dry, too sweet, if the texture is good... With that feedback, you adjust the recipe: more sugar? less time in the oven? This is the process of adjusting hyperparameters.

* **Test set (test):** This is the contest jury. You've never let them taste any of your cakes. They will evaluate whether, beyond your adjustments, your recipe truly works.

**Where's the trap?**

If, before the contest, you have the jury taste several cakes and tell you what to change, when you go to the contest, you'll already know what they like. 
You won't be testing if your recipe is good in general, but rather you'll have made a cake tailored to them.

<hr>

In summary: 
* **Train:** The dataset used to train the model, adjusting its weights to minimize the loss function. 
* **Validation:** The dataset used to monitor overfitting and optimize the model's hyperparameters, **but not the weights**. 
* **Test:** The dataset used to evaluate the final performance of the model trained with the optimal hyperparameters and data that was not used previously (neither for training nor for hyperparameter adjustment).

The `fit()` method returns an object that stores the entire history of the model's training across the epochs. In this case, we store the result in `history`.

Now let's create a function to visualize the evolution of the training and validation losses:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

def plot_loss_history(history):
    # Extract the history data
    loss = history.history['loss']
    val_loss = history.history.get('val_loss', None)  # It may not exist if validation was not used
    epochs = range(1, len(loss) + 1)

    # Create a DataFrame for seaborn
    data = pd.DataFrame({ 'Epoch': list(epochs) * 2, 'Loss': loss + (val_loss if val_loss else []), 'Type': ['Train'] * len(loss) + (['Validation'] * len(val_loss) if val_loss else []) })

    # Create the plot
    plt.figure(figsize = (10, 5))
    sns.lineplot(data = data, x = 'Epoch', y = 'Loss', hue = 'Type')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title('Loss Evolution during Training')
    plt.legend(title = 'Set')
    plt.grid(True)
    plt.show()

# Call the function (after training the model)
plot_loss_history(history)

<div class="alert alert-block alert-info">
    <b>Exercise:</b> For simplicity, let's combine the creation and compilation of the network into one function. Complete the following code and retrain the network from scratch.
</div>

In [None]:
def neural_network_one(learning_rate):
    # Create and compile the model
    
    # Your code here

    return model

# Create the network from scratch
model_1 = neural_network_one(learning_rate = 0.001)

# Train
# Your code here

# Visualize training
# Your code here

##### **Hyperparameter Tuning**

Once everything is combined into a single code block, we can move on to the hyperparameter tuning part.

There are many things we can adjust (`batch size`, `epochs`, `learning rate`, ...) but we will focus on the `learning rate`.

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Record the results of the last epoch of the previous model in the following table and perform the necessary experiments to complete the rest of the rows.
</div>

<center>

| Model                    | MAE Train  | MAE Val |
|---------------------------|------------|---------|
| *Neural Network (lr=0.001)* |            |         |
| *Neural Network (lr=0.05)*  |            |         |
| *Neural Network (lr=0.1)*   |            |         |

<center>

In [None]:
# Your code here

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Train the final model with the best <code>learning_rate</code> and evaluate on test using the <code>.predict()</code> method and the <code>evaluate_model()</code> function created previously.
    <hr>
    Add the results to the table.
</div>

<center>

| Model                   | MAE Test  | MSE Test  | R² Test  |
|-------------------------|-----------|-----------|----------|
| *Baseline*              | 5.882     | 52.489    | -0.008   |
| *Linear*                | 1.056     |  2.298    |  0.956   |
| *KNN*                   | 0.820     |  1.801    |  0.965   |
| *Decision Trees*        | 1.110     |  3.575    |  0.931   |
| *SVR*                   | 0.735     |  1.738    |  0.967   |
| *Neural Network Linear* |           |           |          |

</center>

In [None]:
# Your code here

##### **Adding Non-Linearity**

We will make some modifications and create a different, slightly more complex model to see if we can improve performance.

As you may recall, not all problems have a linear solution. A basic neural network like ours is limited to performing linear combinations of numbers (sums and multiplications), meaning it transforms the input into the output using the formula $y = Wx + b$.

To introduce non-linearity, we need to incorporate **non-linear activation functions**. These functions are applied to the output of each layer, allowing the network to learn more complex relationships between inputs and outputs. Some common activation functions include `ReLU`, `sigmoid`, and `tanh`. By adding these functions, the network can approximate non-linear functions and solve more challenging problems.

<center>
    <div style="border-radius:5px; padding:10px; background:white; max-width:900px">
        <img src="https://i.imgur.com/e7kd5fs.png">   
    </div>
</center>

Our current model has only one layer, the output layer. If we were to apply any of these functions to this layer, not only would we fail to introduce non-linearity, but we would also significantly restrict the range of values the model can predict.

For example, if we only add a `sigmoid` function to the final layer, we would have a linear model whose output will **always** be limited to the range between 0 and 1. This is not ideal for our regression problem.

To achieve non-linearity and avoid this limitation—while allowing the model to predict in a range of values between $(-\infty , \infty)$—we will add a hidden (or intermediate) layer and apply activation in this layer. This way, the final layer can generate outputs without the restrictions imposed by the activation function while still achieving non-linearity.

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Create, within the provided function, a new neural network, but this time with two layers, the hidden one of <b>size 10</b> and with an activation function of <code>tanh</code> to add non-linearity.
    <br>
    Check the documentation for the <a href="https://keras.io/api/layers/core_layers/dense/"><code>Dense()</code></a> layer.
    <hr>
    Find the best <code>learning_rate</code> and evaluate on test with the best version. Fill in both tables.
</div>

<center>

| Model                    | MAE Train  | MAE Val |
|---------------------------|------------|---------|
| *Neural Network (lr=0.001)* |            |         |
| *Neural Network (lr=0.05)*  |            |         |
| *Neural Network (lr=0.1)*   |            |         |

<center>
<br> 
<center>

| Model                   | MAE Test  | MSE Test  | R² Test  |
|--------------------------|-----------|-----------|----------|
| *Baseline*               | 5.882     | 52.489    | -0.008   |
| *Linear*                 | 1.056     |  2.298    |  0.956   |
| *KNN*                    | 0.820     |  1.801    |  0.965   |
| *Decision Trees*         | 1.110     |  3.575    |  0.931   |
| *SVR*                    | 0.735     |  1.738    |  0.967   |
| *Linear Neural Network*  |           |           |          |
| *Non-Linear Neural Network* |        |           |          |

</center>

In [None]:
def neural_network_two(learning_rate):
    # Create and compile the model
    
    # Your code here

    return model

# Create the network from scratch
model_2 = neural_network_two(learning_rate = 0.001)

# Your code here

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Which model would you choose?
</div>

**Answer:**  

### **2.4. Multiple Inputs**

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Create a pair of networks capable of predicting <i>LapTime</i> from <i>SpeedI1</i>, <i>SpeedI2</i>, <i>SpeedFL</i>, <i>SpeedST</i>, and <i>TyreLife</i>. Adjust the hyperparameters for each model.
</div>

<center>

| Model                    | MAE Test  | MSE Test  | R² Test  |
|--------------------------|-----------|-----------|----------|
| *Baseline*               | 5.858     | 54.094    | -0.008   |
| *Linear*                 | 0.932     | 1.678     | 0.969    |
| *KNN*                    | 0.785     | 1.743     | 0.968    |
| *Decision Trees*         | 0.899     | 1.841     | 0.966    |
| *SVR*                    | 0.938     | 2.724     | 0.949    |
| *Neural Network 1*       |           |           |          |
| *Neural Network 2*       |           |           |          |

</center>

In [None]:
data_lap_time = data[['LapTime', 'SpeedI1', 'SpeedI2', 'SpeedFL', 'SpeedST', 'TyreLife']].copy()
data_lap_time = data_lap_time.dropna()
data_lap_time['LapTime'] = data_lap_time['LapTime'].dt.total_seconds()

X = data_lap_time[['SpeedI1', 'SpeedI2', 'SpeedFL', 'SpeedST', 'TyreLife']]
Y = data_lap_time['LapTime']

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state = seed, test_size = .2)

standardizer = StandardScaler()
X_train = standardizer.fit_transform(X_train)
X_test = standardizer.transform(X_test)

# Baseline mean
baseline_mean = DummyRegressor(strategy = 'mean')
baseline_mean.fit(X_train, Y_train)
preds_test = baseline_mean.predict(X_test)
evaluate_model(Y_test, preds_test, 'Baseline')

# Linear Regression
model_linear = LinearRegression()
model_linear.fit(X_train, Y_train)
preds_test = model_linear.predict(X_test)
evaluate_model(Y_test, preds_test, 'Linear')

# KNN
model_knn = KNeighborsRegressor()
model_knn.fit(X_train, Y_train)
preds_test = model_knn.predict(X_test)
evaluate_model(Y_test, preds_test, 'KNN')

# Decision Trees
model_tree = DecisionTreeRegressor()
model_tree.fit(X_train, Y_train)
preds_test = model_tree.predict(X_test)
evaluate_model(Y_test, preds_test, 'Decision Trees')

# SVR
model_svr = SVR()
model_svr.fit(X_train, Y_train)
preds_test = model_svr.predict(X_test)
evaluate_model(Y_test, preds_test, 'SVR')

In [None]:
# Your code here