## Name: Raffaello Baluyot
## Course: DT8058

<center><h1 style="font-size:40px;">Regression</h1></center>

Welcome to the third lab in the Deep learning course! In this lab we will continue to take a look at four parts for MLP regression;
* Introduction for setup and train an MLP
* Model selection for classification
* Impact of overfitting in validation performance 
* Avoid overfitting for a regression problem

The lab includes different datasets, both synthetic and real for regression task. 
The first part of the lab uses two different synthetic regression problems. The **regr1()** synthetic dataset is a two-dimensional dataset with linear and non-linear relationships between the input features and the output value. It is a good benchmark dataset for regression models, as it is challenging and realistic. The **generate_piecewise_linear_data()** function generates a synthetic dataset with piecewise linear relationships between the input features and the output value with varying amount of noise for each piece. 

All **Tasks** include **TODO's** these are expected to be done before the deadline for this lab. The **Tasks** also include question(s), which should be answered and included in the report. Some sections do not contain any **TODO's** but are good to understand.

Good luck!

---

## Intro to regression


The regression task is to learn a function f that maps from a set of input features X to a continuous output value y. The input features X can be either real-valued or categorical. The output value y is a real-valued number.

$$
y = f(X) + \epsilon
$$

The regression model is trained on a set of training data points, where the input features and the output values are known. The model learns to identify the relationship between the input features and the output values, and then uses this relationship to predict the output value for new input features.



## Necessary Imports 

In [None]:
# select gpu by index in case of multiple gpus
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [None]:
import torch
import numpy as np

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
from torch.autograd import Variable
from torch.utils.data import TensorDataset, DataLoader
import torch.nn as nn
from torch.optim import Adam
import torch.nn.functional as F
import copy
from sklearn.model_selection import train_test_split
from sklearn import datasets
import pandas as pd
import seaborn as sns

torch.manual_seed(0)
sns.set()

## Let's use GPU if possible

In [None]:
# set device
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
device

## Dataset generator

In [None]:
class MLPData:
    """
    This class will manage all the dataset related functions for this lab.
    Please take the time do go through the code and try to understand how each point is generated
    """

    @staticmethod
    def regr1(N, v=0):
        """data(samples, features)

        :param N: param v:  (Default value = 0)
        :param v: Default value = 0)

        """
        data = np.empty(shape=(N, 2), dtype=np.float64)

        uni = lambda n: np.random.uniform(0, 1, n)
        norm = lambda n: np.random.normal(0, 1, n)
        noise = lambda n: np.random.normal(0, 1, (n,))
        data[:, 0] = norm(N)
        data[:, 1] = uni(N)

        tar = 10 * data[:, 0] + np.sin(20 * np.pi * data[:, 1])
        std_signal = np.std(tar)
        no = noise(N)
        tar = tar + v * std_signal * no
        return data, tar

    @staticmethod
    def generate_piecewise_linear_data(n_samples, n_segments, stocastic_noise=False):
        """
        Generates a piecewise linear dataset with n_segments segments.
        :param n_samples: Number of samples to generate
        :param n_segments: Number of segments to use
        :return: x, y

        """
        x = torch.rand(n_samples) * 10  # Generate random input values between 0 and 10
        y = torch.zeros(n_samples)

        segment_length = 10 / n_segments
        for i in range(n_segments):
            mask = (x >= i * segment_length) & (x < (i + 1) * segment_length)
            slope = torch.randn(1) * 2  # Random slope for each segment
            if stocastic_noise:
                noise = torch.randn(sum(mask)) * (0.5 + i * torch.randn(1))
            else:
                noise = torch.randn(sum(mask)) * (
                    0.5 + i * 0.2
                )  # Heteroscedastic noise
            y[mask] = slope * x[mask] + noise

        return x, y

Do not forget to instanciate an object of the above class for you to be able to generate dataset on the fly!

In [None]:
synthetic_datasets = MLPData()

Let's see how each dataset looks like! 

In [None]:
def data_distribution(imgs, shape=(2, 2)):
    """Plot scatter distribution for a list of images."""
    f, axs = plt.subplots(*shape, figsize=(10, 10))
    axs = axs.flatten()

    if isinstance(imgs, list):
        for idx, ((d, t), ax) in enumerate(zip(imgs, axs)):
            ax.scatter(d[:, 0], d[:, 1], c=t)
            ax.set_title(f"Plot number: {idx}")
    elif isinstance(imgs, dict):
        for (key, (d, t)), ax in zip(imgs.items(), axs):
            ax.scatter(d[:, 0], d[:, 1], c=t)
            ax.set_title(key)
    plt.show()

In [None]:
## Plotting synthetic datasets
# make sure you understand how the data is generated

In [None]:
# Generate the synthetic dataset
n_samples = 10000
n_segments = 2
x, y = synthetic_datasets.generate_piecewise_linear_data(
    n_samples, n_segments, stocastic_noise=False
)

# Define colors for each segment
segment_colors = [
    "b",
    "g",
    "r",
    "c",
    "m",
] * (
    n_segments // 5 + 1
)  # Repeat the colors to have enough colors for each segment (up to 5 segments

# Plot the synthetic data with different colors for each segment
plt.figure(figsize=(8, 6))
for i in range(n_segments):
    mask = (x >= i * (10 / n_segments)) & (x < (i + 1) * (10 / n_segments))
    plt.scatter(x[mask], y[mask], label=f"Segment {i+1}", c=segment_colors[i])

plt.xlabel("X")
plt.ylabel("Y")
plt.title("Piecewise Linear Regression with Heteroscedasticity")
plt.show()

In [None]:
data_distribution(
    {
        "reg 0": MLPData.regr1(1000, v=0),
        "reg 2": MLPData.regr1(1000, v=2),
        "reg 5": MLPData.regr1(1000, v=5),
        "reg 10": MLPData.regr1(1000, v=10),
    },
    shape=(2, 2),
)

## Task 1

**Model definition**

In this lab exercise, you will design a Multi-Layer Perceptron (MLP) for regression. The goal is to create a simple neural network architecture to predict a continuous target variable based on input features. By completing this exercise, you will gain hands-on experience in configuring the architecture of an MLP for regression tasks.

    Task Description:

- Create an MLP architecture for regression.
- Define the input layer, hidden layers, and output layer.
- Configure the input layer to accept input data with dimensions specified as in_dimension.
- Design the hidden layers with num_hidden_layers layers and hidden_nodes neurons in each layer.
- Choose an appropriate activation function, specified as act, for the hidden layers. You can use common activation functions like ReLU (torch.nn.ReLU) for this purpose.
- Configure the output layer to have a linear activation function since this is a regression task.
- Define the output dimension to match your regression problem's requirements (specified as out_dimension).

## Response

See the implementation below

In [None]:
class NeuralNet(torch.nn.Module):
    def __init__(
        self,
        in_dimension=2,
        hidden_nodes=1,
        num_hidden_layers=1,
        act=torch.nn.ReLU(),
        out_dimension=1
    ):
        """
        in_dimension: number of input data/features
        hidden_nodes: number of neurons in the hidden layer(s)
        num_hidden_layers: number of hidden layers
        act: activation function
        out_dimension: number of output neurons e.g. number of classes
        """
        super(NeuralNet, self).__init__()
        self.input = torch.nn.Linear(in_dimension, hidden_nodes)
        self.hidden = [
            torch.nn.Linear(hidden_nodes, hidden_nodes)
            for _ in range(num_hidden_layers)
        ]
        self.output = torch.nn.Linear(hidden_nodes, out_dimension)
        self.act = act

    def forward(self, x):
        x = self.input(x)
        for h in self.hidden:
            x = h(x)
            x = self.act(x)
        x = self.output(x)
        return x

    def predict(self, x):
        return torch.argmax(self.forward(x), dim=-1)

## Process the data for training

We need to make tensor from the numpy data generated from ```MLPData``` class and use them to create a PyTorch dataset. For this exercise we will use ```TensorDataset```. To iterate over the dataset, we need a data loader. We will use the default ```DataLoader```. You can find the corresponding documentation [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset) and [here](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader)

In [None]:
# let's start with 100 points
x, y = synthetic_datasets.regr1(10000)
# simply convert each array to a Tensor
x = torch.Tensor(x)
y = torch.Tensor(y.squeeze())
# create the TensorDataset
syn2_Pytorch = TensorDataset(x, y)
# create the dataloader.
loader = DataLoader(syn2_Pytorch, batch_size=1024)

# Task 2

Finish the below function. The task at this point is to create a function that is able to train your ```model``` for ```epoch_number``` using ```optimizer```, ```loss``` and ```dataloader```. You can read about optimizer [here](https://pytorch.org/docs/stable/optim.html)



# Response

See the implementation below.

**Note:** the implementation assumes that the loss reduction function is "mean", since the function documentation says the return value is the average loss for the epoch.

In [None]:
def train_epoch(
        epoch: int, 
        optimizer:torch.optim.Optimizer, 
        loss: torch.nn.Module, 
        model: torch.nn.Module, 
        train_loader: DataLoader
):
    """
    Trains the model for one epoch using the given training data.

    Args:
        epoch (int): The current epoch number.
        optimizer (torch.optim.Optimizer): The optimizer used for updating the model parameters.
        loss (torch.nn.Module): The loss function used for calculating the loss.
        model (torch.nn.Module): The neural network model to be trained.
        train_loader (torch.utils.data.DataLoader): The data loader providing training data.

    Returns:
        float: The average loss for the epoch.

    This function iterates through the provided `train_loader`, computes the forward pass,
    calculates the loss, performs backpropagation, and updates the model parameters using
    the given optimizer. It then returns the average loss for the entire epoch.

    Example:
        >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
        >>> loss_fn = torch.nn.MSELoss()
        >>> train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
        >>> for epoch in range(num_epochs):
        ...     epoch_loss = train_epoch(epoch, optimizer, loss_fn, model, train_loader)
        ...     print(f"Epoch {epoch+1}, Loss: {epoch_loss:.4f}")
    """

    total_loss = 0
    total_items = 0
    model.train(True)

    for batch_idx, (xi, yi) in enumerate(train_loader):
        optimizer.zero_grad()

        outputs = model(xi)

        loss_t = loss(outputs, yi.reshape(-1, 1))
        loss_t.backward()

        optimizer.step()
        
        n_items = len(xi)
        total_loss += loss_t.item() * n_items
        total_items += n_items

    return total_loss / total_items

Now that we have a way to trian our model, we need to create an instance of the model and train it. We still need a way to evaluate our model. In this simple datasets, we can try to visualize the decision boundaries. 

We will create one ```helper function```: ```plot_decision_boundary```

In [None]:
def plot_decision_boundary(dataset, y, model, steps=50):
    xmin, xmax = dataset[:, 0].min(), dataset[:, 0].max()
    ymin, ymax = dataset[:, 1].min(), dataset[:, 1].max()
    x_span = np.linspace(xmin, xmax, steps)
    y_span = np.linspace(ymin, ymax, steps)
    xx_pred, yy_pred = np.meshgrid(x_span, y_span)
    model_viz = np.array([xx_pred.flatten(), yy_pred.flatten()]).T

    # Make predictions across region of interest
    model.eval()
    labels_predicted = model(Variable(torch.Tensor(model_viz)).float())

    labels_predicted = labels_predicted.detach().numpy()

    fig = plt.figure()
    ax = fig.add_subplot(projection="3d")
    ax.scatter(dataset[:, 0], dataset[:, 1], y)
    ax.scatter(
        xx_pred.flatten(),
        yy_pred.flatten(),
        labels_predicted,
        facecolor=(0, 0, 0, 0),
        s=20,
        edgecolor="#70b3f0",
    )
    ax.view_init(elev=28, azim=120)
    plt.show()
    model.train()
    return fig, ax

In [None]:
def stats_reg(x, y, model):
    """
    Returns the MSE and CorrCoef for a given dataset and y
    """

    A = ["MSE", "CorrCoeff"]
    model.eval()
    preds = model(x)
    pcorr = np.corrcoef(y.flatten(), preds.detach().numpy().flatten())
    mse = torch.nn.MSELoss()(preds, y)

    B = [mse.item(), pcorr]

    print(f"\n {'#'*20} STATISTICS{'#'*20}\n")
    for r in zip(A, B):
        print(*r, sep="   ")
    return print(f"\n {'#'*50}")

def stats_reg_ds(dataset: TensorDataset, model):
    x = dataset.tensors[0]
    y = dataset.tensors[1]
    if y.ndim == 1:
        y = y.reshape(-1, 1)
    else:
        raise ValueError()
    return stats_reg(x, y, model)

## Task 3 

### Instantiation
Now the only thing missing to visualize you results is a trained network. **TODO:** Instantiate your model, loss and optimizer below. The choice of loss is critical for the training.  

## Response

See implementation below. 2 layers and 4 notes were selected. No particular reason but I think the size is small enough given that the problem is pretty trivial. Adding a higher than normal learning rate and some momentum for SGD.

In [None]:
my_model = NeuralNet(num_hidden_layers=2, hidden_nodes=4)
critereon = torch.nn.MSELoss()
optimizer = torch.optim.SGD(my_model.parameters(), lr=0.01, momentum=0.9)

### Train the model

Now, our model, loss and optimizer are setup and we are ready to go Training

In [None]:
num_epoch = 100
train_losses = list()
for epoch in range(1, num_epoch + 1):
    epoch_loss = train_epoch(epoch, optimizer, critereon, my_model, loader)
    train_losses.append(epoch_loss)

    if epoch % 10 == 0:
        print(f"Epoch {epoch + 1}/{num_epoch}: Loss = {epoch_loss}")

In [None]:
plot_decision_boundary(x, y, my_model, steps=50)

In [None]:
stats_reg(x, y.reshape(-1, 1), my_model)

## Visualize the train losses 

In [None]:
ax = plt.figure().gca()
plt.plot(np.arange(len(train_losses)), train_losses)
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
plt.xlabel("Epoch")
plt.ylabel("Loss Value")
plt.show()

## Model Selection 

A proper training procedure is divided into ```3 splits```: training, validation and test. Generally, for each epoch, training is done on training data, and then a validation is done on the validation data. During validation the model weights are not updated. Best performing model on the validaiton data is selected and saved for final evaluation on test data. 

## Task 4

**TODO:** Split the data to 3 parts, one for training, one for validation and one hold out set for testing. A good starting point can be 70%, 15% and 15% of the dataset respectively for each split

**HINT** you can either do this manually with indexing or use readily available tools e.g. in sklearn

# Response

See implementation below. I picked the sklearn implementation so there's randomization.

In [None]:
n_samples = 10000
x, y = synthetic_datasets.regr1(n_samples)

def split_dataset(x, y, train_size=0.7, valid_size=0.15, random_state=None, batch_size=1024):
    others_size = 1.0 - train_size

    train_x, others_x, train_y, others_y = train_test_split(x, y, train_size=train_size, random_state=random_state)
    valid_x, test_x, valid_y, test_y = train_test_split(others_x, others_y, train_size=valid_size / others_size, random_state=0)

    train_dataset = TensorDataset(torch.Tensor(train_x), torch.Tensor(train_y))
    valid_dataset = TensorDataset(torch.Tensor(valid_x), torch.Tensor(valid_y))
    test_dataset = TensorDataset(torch.Tensor(test_x), torch.Tensor(test_y))

    return (
        DataLoader(train_dataset, batch_size=batch_size),
        DataLoader(valid_dataset, batch_size=batch_size),
        DataLoader(test_dataset, batch_size=batch_size),
    )

train_loader, valid_loader, test_loader = split_dataset(x, y, random_state=0)

## Task 5
**TODO:** Complete the following functions. Run a proper training on each of the synthetic datasets you have created. Discuss the performance of model in report. What could be the reason behind the performance? Feel free to adapt the number of hidden nodes (and possibly the number of hidden layers and epochs)

## Response

See implementation below. Similar notes as the training implementation since they are pretty similar.

Result wise the performance of the model through the different datasets are not very different. This makes sense since the data is generated from a proper distribution, even if the sampling is performed with noise.

The performance of the model is also very good, highly likely because the data I used didn't use the `v` parameter from the data generator

In [None]:
def validate_epoch(epoch: int, loss: torch.nn.Module, model: torch.nn.Module, val_loader: DataLoader):
    """
    Validates the model on the validation data for one epoch.

    Args:
        epoch (int): The current epoch number.
        loss (torch.nn.Module): The loss function used for calculating the validation loss.
        model (torch.nn.Module): The neural network model to be evaluated.
        val_loader (torch.utils.data.DataLoader): The data loader providing validation data.

    Returns:
        float: The average validation loss for the epoch.

    This function switches the provided model to evaluation mode, iterates through the
    validation data provided by `val_loader`, computes the forward pass, and calculates
    the validation loss. It then returns the average validation loss for the entire epoch.

    Example:
        >>> loss_fn = torch.nn.MSELoss()
        >>> val_loader = DataLoader(val_dataset, batch_size=32)
        >>> for epoch in range(num_epochs):
        ...     epoch_loss = validate_epoch(epoch, loss_fn, model, val_loader)
        ...     print(f"Epoch {epoch+1}, Validation Loss: {epoch_loss:.4f}")
    """

    total_loss = 0
    total_items = 0
    model.eval()

    with torch.no_grad():
        for batch_idx, (xi, yi) in enumerate(val_loader):
            output = model(xi)

            n_items = len(xi)
            total_loss += loss(output, yi.reshape(-1, 1)) * n_items
            total_items += n_items

    return total_loss / total_items

In [None]:
def a_proper_training(num_epoch, model, optimizer, loss, train_loader, val_loader):
    """
    Performs a complete training and validation process for the given model.

    Args:
        num_epoch (int): The number of training epochs.
        model (torch.nn.Module): The neural network model to be trained.
        optimizer (torch.optim.Optimizer): The optimizer used for updating the model parameters.
        loss (torch.nn.Module): The loss function used for calculating the loss.
        train_loader (torch.utils.data.DataLoader): The data loader providing training data.
        val_loader (torch.utils.data.DataLoader): The data loader providing validation data.

    Returns:
        tuple: A tuple containing the best trained model, a list of training losses for each epoch,
        and a list of validation losses for each epoch.

    This function trains the provided model for the specified number of epochs, monitoring both
    training and validation losses. It also saves the best model based on the lowest validation
    loss achieved during training.

    Example:
        >>> num_epochs = 10
        >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
        >>> loss_fn = torch.nn.MSELoss()
        >>> train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
        >>> val_loader = DataLoader(val_dataset, batch_size=32)
        >>> best_model, train_losses, val_losses = a_proper_training(num_epochs, model, optimizer, loss_fn, train_loader, val_loader)
        >>> # After training, you can use the best_model for inference.
    """
    best_val_loss = np.inf
    best_model = None
    train_losses = list()
    val_losses = list()
    for epoch in range(num_epoch):
        train_loss = train_epoch(epoch, optimizer, loss, model, train_loader)
        val_loss = validate_epoch(epoch, loss, model, val_loader)
        train_losses.append(train_loss)
        val_losses.append(val_loss)

        if val_loss < best_val_loss:
            best_model = copy.deepcopy(model)
            best_val_loss = val_loss

    return best_model, train_losses, val_losses

In [None]:
my_model = NeuralNet(num_hidden_layers=2, hidden_nodes=4)
critereon = torch.nn.MSELoss()
optimizer = torch.optim.SGD(my_model.parameters(), lr=0.01, momentum=0.9)

In [None]:
best_model, train_losses, val_losses = a_proper_training(
    100, my_model, optimizer, critereon, train_loader, valid_loader
)

In [None]:
plt.plot(train_losses, label="train")
plt.plot(val_losses, label="val")
plt.title("MSE Loss")
plt.legend()

In [None]:
stats_reg_ds(train_loader.dataset, best_model)
stats_reg_ds(valid_loader.dataset, best_model)
stats_reg_ds(test_loader.dataset, best_model)

## Task 6 

Add dropout to the model, and rerun the previous experiment, does it have any effect and why?

## Response

The dropout has the following effects:

*   The training takes longer to converge: This makes sense since the connections are being dropped. Also, the generated dataset does not have feature redundancy as compared with real world data. While real world data can still utilize other features to makes sense of the labels, it's not the same for a equation generated, 2 input dataset.

*   The training loss is worse than the validation loss: This is due to how dropouts work. It is applied in training, but not in test and validation.

In [None]:
class NeuralNet(torch.nn.Module):
    def __init__(
        self,
        in_dimension=2,
        hidden_nodes=1,
        num_hidden_layers=1,
        act=torch.nn.ReLU(),
        out_dimension=1,
        dropout_pct=0,
    ):
        """
        in_dimension: number of input data/features
        hidden_nodes: number of neurons in the hidden layer(s)
        num_hidden_layers: number of hidden layers
        act: activation function
        out_dimension: number of output neurons e.g. number of classes
        """
        super(NeuralNet, self).__init__()
        self.input = torch.nn.Linear(in_dimension, hidden_nodes)
        self.hidden = [
            torch.nn.Linear(hidden_nodes, hidden_nodes)
            for _ in range(num_hidden_layers)
        ]
        self.output = torch.nn.Linear(hidden_nodes, out_dimension)
        self.act = act
        self.dropout = torch.nn.Dropout(p=dropout_pct)

    def forward(self, x):
        x = self.input(x)
        for h in self.hidden:
            x = h(x)
            x = self.act(x)
            if self.dropout.p > 0:
                x = self.dropout(x)
        x = self.output(x)
        return x

    def predict(self, x):
        return torch.argmax(self.forward(x), dim=-1)

In [None]:
n_samples = 10000
x, y = synthetic_datasets.regr1(n_samples)

train_loader, valid_loader, test_loader = split_dataset(x, y, random_state=0)

In [None]:
my_model = NeuralNet(num_hidden_layers=2, hidden_nodes=4, dropout_pct=0.2)
critereon = torch.nn.MSELoss()
optimizer = torch.optim.SGD(my_model.parameters(), lr=0.01, momentum=0.9)

In [None]:
best_model, train_losses, val_losses = a_proper_training(
    200, my_model, optimizer, critereon, train_loader, valid_loader
)

In [None]:
plt.plot(train_losses, label="train")
plt.plot(val_losses, label="val")
plt.title("MSE Loss")
plt.legend()

In [None]:
stats_reg_ds(test_loader.dataset, best_model)

## Task 7


**TODO!** Rerun the experiment with the second synthetic dataset, with splitting, and proper training. Feel free to play with the parameters of the models

## Response

I had to increase the model complexity, due to the piece wise function being harder for the network to converge to compared to a continuous function.

In [None]:
# Generate the synthetic dataset
n_samples = 10000
n_segments = 5
x, y = synthetic_datasets.generate_piecewise_linear_data(
    n_samples, n_segments, stocastic_noise=True
)
x = x.reshape(-1, 1)

train_loader, valid_loader, test_loader = split_dataset(x, y, random_state=0)

In [None]:
my_model = NeuralNet(in_dimension=1, num_hidden_layers=4, hidden_nodes=32)
critereon = torch.nn.MSELoss()
optimizer = torch.optim.SGD(my_model.parameters(), lr=0.01, momentum=0.9)

In [None]:
best_model, train_losses, val_losses = a_proper_training(
    200, my_model, optimizer, critereon, train_loader, valid_loader
)

In [None]:
plt.plot(train_losses, label="train")
plt.plot(val_losses, label="val")
plt.title("MSE Loss")
plt.legend()

In [None]:
stats_reg_ds(test_loader.dataset, best_model)

<center><h1 style="font-size:40px;">Real dataset</h1></center>

## Task 9

In the following example, we will import the diabetes dataset. This dataset contains data from diabetic patients and contains certain features such as their bmi, age , blood pressure and glucose levels which are useful in predicting the diabetes disease progression in patients.

Here are the key details of the sklearn diabetes dataset:

- Target Variable: The target variable is a quantitative measure of disease progression, which represents the one-year change in a patient's disease progression. It is a continuous variable, not a binary classification of diabetes.
- Features (Predictors):
    The dataset contains ten baseline variables (predictors) that are used to predict the disease progression:

        - Age
        - Sex
        - BMI (Body Mass Index)
        - Average Blood Pressure
        - S1: Total serum cholesterol
        - S2: Low-density lipoproteins (LDL cholesterol)
        - S3: High-density lipoproteins (HDL cholesterol)
        - S4: Total cholesterol / HDL cholesterol ratio
        - S5: Log of serum triglycerides level
        - S6: Blood sugar level

In [None]:
diabetes = datasets.load_diabetes(scaled=True, as_frame=True, return_X_y=True)
x, y = diabetes[0], diabetes[1]
x = x.astype(np.float32)
y = y.astype(np.float32)

In [None]:
cols = list(x.columns) + ["Target"]
diabetes_df = pd.DataFrame(np.hstack([x, np.atleast_2d(y).T]), columns=cols)
plt.figure(figsize=(12, 10))
sns.heatmap(diabetes_df.corr().values, vmin=-1, vmax=1)
plt.show()

**TODO:**  run the experiments for the diabetes dataset, do you get similar performance? why? Do you suffer from overfitting/underfitting?

## Response

The performance I got is reasonable. The training loss is low and the validation loss is a bit higher. It's surprising that the test loss is lower but given the small number of data, I would assume that's just the variance of the dataset.

In [None]:
train_loader, valid_loader, test_loader = split_dataset(x.values, y.values, random_state=0)

In [None]:
my_model = NeuralNet(in_dimension=x.shape[1], num_hidden_layers=8, hidden_nodes=256)
critereon = torch.nn.MSELoss()
optimizer = torch.optim.SGD(my_model.parameters(), lr=0.01, momentum=0.9)

In [None]:
best_model, train_losses, val_losses = a_proper_training(
    10000, my_model, optimizer, critereon, train_loader, valid_loader
)

In [None]:
plt.plot(train_losses, label="train")
plt.plot(val_losses, label="val")
plt.title("MSE Loss")
plt.legend()

In [None]:
stats_reg_ds(train_loader.dataset, my_model)
stats_reg_ds(train_loader.dataset, best_model)
stats_reg_ds(valid_loader.dataset, best_model)
stats_reg_ds(test_loader.dataset, best_model)

## Task 10

**TODO** try applying l1 and l2 regularization, does it help with the performance? Why?

**HINT** in pytorch l2 can be applied by adding weight decay to optimizer, and for adding L1, you can choose HuberLoss instead of MSE. Or feel free to apply them manually

## Response

The only noticeable thing is that it takes longer for the validation loss to separate from the training loss in the graph. Otherwise the performance is more or less then same. I tried different values for the regularization values and they mostly just worsen the result.

Also, I'm not sure if the `HuberLoss` is the right implementation of L1 regularization. It looks like a piecewise loss to me rather than an additional terms to the loss. But I will follow the hint for now.

In [None]:
my_model = NeuralNet(in_dimension=x.shape[1], num_hidden_layers=8, hidden_nodes=256)
critereon = torch.nn.MSELoss()
optimizer = torch.optim.SGD(my_model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.005)
best_model, train_losses, val_losses = a_proper_training(
    10000, my_model, optimizer, critereon, train_loader, valid_loader
)

In [None]:
plt.plot(train_losses, label="train")
plt.plot(val_losses, label="val")
plt.title("MSE Loss")
plt.legend()

In [None]:
stats_reg_ds(train_loader.dataset, my_model)
stats_reg_ds(train_loader.dataset, best_model)
stats_reg_ds(valid_loader.dataset, best_model)
stats_reg_ds(test_loader.dataset, best_model)

In [None]:
my_model = NeuralNet(in_dimension=x.shape[1], num_hidden_layers=8, hidden_nodes=256)
critereon = torch.nn.HuberLoss(delta=200)
optimizer = torch.optim.SGD(my_model.parameters(), lr=0.01, momentum=0.9)
best_model, train_losses, val_losses = a_proper_training(
    10000, my_model, optimizer, critereon, train_loader, valid_loader
)

In [None]:
plt.plot(train_losses, label="train")
plt.plot(val_losses, label="val")
plt.title("Huber Loss")
plt.legend()

In [None]:
stats_reg_ds(train_loader.dataset, my_model)
stats_reg_ds(train_loader.dataset, best_model)
stats_reg_ds(valid_loader.dataset, best_model)
stats_reg_ds(test_loader.dataset, best_model)