# Diplomatura en ciencia de datos, aprendizaje automático y sus aplicaciones - Edición 2023 - FAMAF (UNC)

## Introducción al aprendizaje profundo

### Trabajo práctico entregable 2/2

- **Estudiantes:**
    - [Chevallier-Boutell, Ignacio José](https://www.linkedin.com/in/nachocheva/) (materia completa).
    - Gastelu, Gabriela (materia completa).
    - Spano, Marcelo (materia completa).

- **Docentes:**
    - Johanna Analiz Frau (Mercado Libre).
    - Nindiría Armenta Guerrero (fyo).

---

## Librerías

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import torch
# Configuración del dispositivo
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Para que todo sea reproducible
torch.manual_seed(1994)
import torch.nn as nn
import torch.optim as optim

from torch.utils.data import DataLoader#, TensorDataset
# import torch.nn.functional as F

from torchvision import datasets
# from torchvision.utils import make_grid
# from torchvision.transforms import ToTensor, transforms
import torchvision.transforms as transforms

# from sklearn import metrics
# from tqdm.notebook import tqdm

## Funciones útiles

In [None]:
def data_preparation(BATCH_SIZE, transform):

    # Download and load the training data
    trainset = datasets.CIFAR10(root='./data', train=True,
                                            download=True, transform=transform)
    trainloader = DataLoader(trainset, batch_size=BATCH_SIZE,
                                            shuffle=True, num_workers=2)

    # Download and load the test data
    testset = datasets.CIFAR10(root='./data', train=False,
                                        download=True, transform=transform)
    testloader = DataLoader(testset, batch_size=BATCH_SIZE,
                                            shuffle=False, num_workers=2)

    return trainset, trainloader, testset, testloader

In [None]:
def train(model, trainloader, loss_function, optimizer, epoch, device, use_tqdm=True):
    '''
    Lleva a cabo el entrenamiento del modelo.

    Args:
        model: estructura de la red neuronal.
        trainloader: cargador de datos de entrenamiento.
        loss_function: función de costo a utilizar.
        optimizer: tipo de descenso por gradiente a utilizar.
        epoch: número de épocas a entrenar.
        use_tqdm: muestra el progreso del entrenamiento.
        device: dónde se realiza el cálculo.
    '''

    # Enviamos el modelo al dispositivo donde se realiza el cálculo
    model.to(device)

    # Activamos el modo de entrenamiento en el modelo
    model.train()

    # Inicializamos el costo acumulado de la época
    training_loss = 0.0
    pbar = tqdm(trainloader) if use_tqdm else trainloader
    for step, (inputs, labels) in enumerate(pbar, 1):
        # Tensors to gpu (if necessary)
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the gradients to zero
        optimizer.zero_grad()
        # Run a forward pass
        predicted_outputs = model(inputs.view(inputs.shape[0], -1))
        # Compute loss
        loss = loss_function(predicted_outputs, labels.long())
        # Backpropagation
        # Compute gradients
        loss.backward()

        # Accumulate the average loss of the mini-batch
        training_loss += loss.item()
        # Update the parameters
        optimizer.step()

        # Print statistics each 50 mini-batches
        if use_tqdm and step % 50 == 0:
          # Show number of epoch, step and average loss
          pbar.set_description(f"[{epoch}, {step}] loss: {training_loss / step:.4g}")

    epoch_training_loss = round(training_loss / len(trainloader), 4)

    return epoch_training_loss

In [None]:
def validation(model, valloader, loss_function, device, use_tqdm=True):
    '''
    Lleva a cabo la validación del modelo. Se utiliza la accuracy como métrica 
    principal y el f1-score como métrica secundaria.

    Args:
        model: estructura de la red neuronal.
        valloader: cargador de datos de validación.
        use_tqdm: muestra el progreso del entrenamiento.
        device: dónde se realiza el cálculo.
    '''

    model.eval()  # Activate evaluation mode
    y_true = []
    y_pred = []
    validation_loss = 0.0
    running_accuracy = 0.0
    total = 0
    # Don't calculate gradient speed up the forward pass
    with torch.no_grad():
        pbar = tqdm(valloader) if use_tqdm else valloader
        for (inputs, labels) in pbar:
            inputs, labels = inputs.to(device), labels.to(device)
            # Run the forward pass
            predicted_outputs = model(inputs.view(inputs.shape[0], -1))
            # Compute loss
            loss = loss_function(predicted_outputs, labels.long())
            # Accumulate the average loss of the mini-batch
            validation_loss += loss.item()

            # The label with the highest value will be our prediction
            _, predicted = torch.max(predicted_outputs , 1)
            y_true.extend(labels.cpu().numpy())
            y_pred.extend(predicted.cpu().numpy())

    epoch_validation_loss = round(validation_loss / len(valloader), 4)

    # Calculate metrics
    accuracy = metrics.accuracy_score(y_true, y_pred)
    f1 = round(metrics.f1_score(y_true, y_pred, average='macro'), 4)

    return epoch_validation_loss, (accuracy, f1)

In [None]:
def run_experiment(model, n_epochs, trainloader, valloader, loss_function, optimizer, device, use_tqdm=True):
    '''
    Función de ejecución de experimentos, que entrena y valida el modelo, 
    evaluando la función de costo para cada conjunto de hiperparámetros. Guarda 
    los hiperparámetros y resultados en un diccionario.

    Args:
        model: estructura de la red neuronal.
        n_epochs: número de épocas a entrenar.
        trainloader: cargador de datos de entrenamiento.
        valloader: cargador de datos de validación.
        loss_function: función de costo a utilizar.
        optimizer: tipo de descenso por gradiente a utilizar.
        device: dónde se realiza el cálculo.
        use_tqdm: muestra el progreso del entrenamiento.
    '''

    register_performance = {
        'epoch': [],
        'epoch_training_loss': [], 'epoch_validation_loss': [],
        'validation_accuracy': [], 'validation_f1': []
        }
    best_accuracy = 0.0

    print("Begin training...")
    start = time.time()
    # Loop through the dataset multiple times
    for epoch in range(1, n_epochs + 1):
        # Train the model
        epoch_training_loss = train(model, trainloader, loss_function, optimizer, epoch, device, use_tqdm)
        # Validate the model
        epoch_validation_loss, metrics = validation(model, valloader, loss_function, device, use_tqdm)

        register_performance['epoch'].append(epoch)
        register_performance['epoch_training_loss'].append(epoch_training_loss)
        register_performance['epoch_validation_loss'].append(epoch_validation_loss)
        register_performance['validation_accuracy'].append(metrics[0])
        register_performance['validation_f1'].append(metrics[1])

        # Save the model if the accuracy is the best
        if metrics[0] > best_accuracy:
            best_model = model
            best_accuracy = metrics[0]

        if (epoch % 10 == 0) and (epoch != n_epochs):
            print(f'\tVoy por la época {epoch}! :)')
        elif epoch == n_epochs:
            WallTime = time.time() - start
            print(f'\tTerminé! :D >>> WallTime = {WallTime/60:.2f} min')


    # Save the results
    experiment = {
        'arquitecture': str(model),
        'optimizer': optimizer,
        'loss': str(loss_function),
        'epochs': n_epochs,
    }

    # Print the statistics of the epoch
    print(f'Completed training in {epoch} batch: ',
          'Training Loss is: ' , epoch_training_loss,
          '- Validation Loss is: ', epoch_validation_loss,
          '- Accuracy is: ', (metrics[0]),
          '- F1 is: ', (metrics[1])
          )
    return experiment, register_performance, best_model

In [None]:
def get_data_loss_metrics(experiments_set, path):
    df_base = pd.DataFrame()
    for i in range(len(experiments_set)):
        arquitecture = experiments_set[i][0]['arquitecture']
        model_name = arquitecture.split('(')[0]
        if len(arquitecture.split('activ1): ')) == 1:
            activation_function_name = arquitecture.split('(1): ')[1].split('()\n')[0].split('(negative_slope')[0]
        else:
            activation_function_name = arquitecture.split('activ1): ')[1].split('()\n  (drop1)')[0].split('(negative_slope')[0]
        optim = type(experiments_set[i][0]['optimizer']).__name__
        lr = experiments_set[i][0]['optimizer'].param_groups[0]['lr']
        weight_decay = experiments_set[i][0]['optimizer'].param_groups[0]['weight_decay']
        df = pd.DataFrame(experiments_set[i][1])
        df['model-activation-optimizer-lr-wd'] = f'{model_name}-{activation_function_name}-{optim}-{lr}-{weight_decay}'
        df_base = pd.concat([df_base, df])

    df_base.to_csv(path, index=False)

    df_metrics = df_base.drop(columns=['epoch_training_loss', 'epoch_validation_loss'])
    df_loss = df_base.drop(columns=['validation_accuracy', 'validation_f1']).melt(id_vars=['epoch', 'model-activation-optimizer-lr-wd'],
                                                                                        value_vars=['epoch_training_loss', 'epoch_validation_loss'],
                                                                                        var_name='task', value_name='loss')
    return df_loss, df_metrics

In [None]:
def plot_results(n_epochs, path, exs_set=None, yLoss=[None, None], yMet=[None, None]):

    L = [k*10+9 for k in range(int(n_epochs/10))]

    if exs_set == None:
        df_base = pd.read_csv(path)
        df_metrics = df_base.drop(columns=['epoch_training_loss', 'epoch_validation_loss'])
        df_loss = df_base.drop(columns=['validation_accuracy', 'validation_f1']).melt(id_vars=['epoch', 'model-activation-optimizer-lr-wd'],
                                                                                            value_vars=['epoch_training_loss', 'epoch_validation_loss'],
                                                                                            var_name='task', value_name='loss')
    else:
        df_loss, df_metrics = get_data_loss_metrics(exs_set, path)

    print('Pérdidas:')
    sns.catplot(data=df_loss, x='epoch', y='loss',  hue='task', col='model-activation-optimizer-lr-wd',
                col_wrap=3, kind='point', height=4, aspect=1.5)
    plt.xticks(L)
    if yLoss[0] != None:
        plt.ylim(yLoss[0], yLoss[1])
    plt.show()

    print('\nMétricas:')
    _, axs = plt.subplots(1, 2, figsize=(15, 4))
    sns.pointplot(data=df_metrics, x='epoch', y='validation_accuracy', hue='model-activation-optimizer-lr-wd', ax=axs[0])
    axs[0].set_xticks(L)
    sns.pointplot(data=df_metrics, x='epoch', y='validation_f1', hue='model-activation-optimizer-lr-wd', ax=axs[1])
    axs[1].set_xticks(L)
    if yMet[0] != None:
        axs[0].set_ylim(yMet[0], yMet[1])
        axs[1].set_ylim(yMet[0], yMet[1])
    plt.show()

In [None]:
def run_best(model, n_epochs, trainloader, valloader, testloader, loss_function, optimizer, device, use_tqdm=True):
    '''
    Ejecuta la corrida con los mejores hiperparámetros encontrados, entrenando, 
    validando y evaluando el modelo.

    Args:
        model: estructura de la red neuronal.
        n_epochs: número de épocas a entrenar.
        trainloader: cargador de datos de entrenamiento.
        valloader: cargador de datos de validación.
        testloader: cargador de datos de evaluación.
        loss_function: función de costo a utilizar.
        optimizer: tipo de descenso por gradiente a utilizar.
        device: dónde se realiza el cálculo.
        use_tqdm: muestra el progreso del entrenamiento.
    '''

    register_performance = {
        'epoch': [],
        'epoch_training_loss': [], 
        'epoch_validation_loss': [], 
        'epoch_testing_loss': [], 
        'training_accuracy': [], 'training_f1': [],
        'validation_accuracy': [], 'validation_f1': [],
        'testing_accuracy': [], 'testing_f1': []
        }
    best_accuracy = 0.0

    print("Begin training...")
    start = time.time()
    # Loop through the dataset multiple times
    for epoch in range(1, n_epochs + 1):
        register_performance['epoch'].append(epoch)
        # Train the model
        epoch_training_loss = train(model, trainloader, loss_function, optimizer, epoch, device, use_tqdm)
        register_performance['epoch_training_loss'].append(epoch_training_loss)
        _, metrics_train = validation(model, trainloader, loss_function, device, use_tqdm)
        register_performance['training_accuracy'].append(metrics_train[0])
        register_performance['training_f1'].append(metrics_train[1])

        # Validate the model
        epoch_validation_loss, metrics_val = validation(model, valloader, loss_function, device, use_tqdm)
        register_performance['epoch_validation_loss'].append(epoch_validation_loss)
        register_performance['validation_accuracy'].append(metrics_val[0])
        register_performance['validation_f1'].append(metrics_val[1])
        
        # Test the model
        epoch_testing_loss, metrics_test = validation(model, testloader, loss_function, device, use_tqdm)
        register_performance['epoch_testing_loss'].append(epoch_testing_loss)
        register_performance['testing_accuracy'].append(metrics_test[0])
        register_performance['testing_f1'].append(metrics_test[1])

        # Save the model if the accuracy is the best
        if metrics_val[0] > best_accuracy:
            best_model = model
            best_accuracy = metrics_val[0]

        if (epoch % 10 == 0) and (epoch != n_epochs):
            print(f'\tVoy por la época {epoch}! :)')
        elif epoch == n_epochs:
            WallTime = time.time() - start
            print(f'\tTerminé! :D >>> WallTime = {WallTime/60:.2f} min')

    # Save the results
    experiment = {
        'arquitecture': str(model),
        'optimizer': optimizer,
        'loss': str(loss_function),
        'epochs': n_epochs,
    }

    # Print the statistics of the epoch
    print(f'Completed training in {epoch} batch: ',
          'Training Loss is: ' , epoch_training_loss,
          'Training Accuracy is: ', (metrics_train[0]),
          'Training F1 is: ', (metrics_train[1]),
          'Validation Loss is: ', epoch_validation_loss,
          'Validation Accuracy is: ', (metrics_val[0]),
          'Validation F1 is: ', (metrics_val[1]),
          'Testing Loss is: ', epoch_testing_loss,
          'Testing Accuracy is: ', (metrics_test[0]),
          'Testing F1 is: ', (metrics_test[1])
          )
    return experiment, register_performance, best_model

In [None]:
def get_best_loss_metrics(best_exp, path):
    df_base = pd.DataFrame()
    
    arquitecture = best_exp[0]['arquitecture']
    model_name = arquitecture.split('(')[0]
    if len(arquitecture.split('activ1): ')) == 1:
            activation_function_name = arquitecture.split('(1): ')[1].split('()\n')[0].split('(negative_slope')[0]
    else:
        activation_function_name = arquitecture.split('activ1): ')[1].split('()\n  (drop1)')[0].split('(negative_slope')[0]
    optim = type(best_exp[0]['optimizer']).__name__
    lr = best_exp[0]['optimizer'].param_groups[0]['lr']
    weight_decay = best_exp[0]['optimizer'].param_groups[0]['weight_decay']
    df = pd.DataFrame(best_exp[1])
    df['model-activation-optimizer-lr-wd'] = f'{model_name}-{activation_function_name}-{optim}-{lr}-{weight_decay}'
    df_base = pd.concat([df_base, df])

    df_base.to_csv(path, index=False)

    df_metrics = df_base.drop(columns=['epoch_training_loss', 'epoch_validation_loss', 'epoch_testing_loss'])
    df_loss = df_base.drop(columns=['training_accuracy', 'training_f1', 'validation_accuracy', 'validation_f1', 'testing_accuracy', 'testing_f1']).melt(id_vars=['epoch', 'model-activation-optimizer-lr-wd'],
                                                                                        value_vars=['epoch_training_loss', 'epoch_validation_loss', 'epoch_testing_loss'],
                                                                                        var_name='task', value_name='loss')
    return df_loss, df_metrics

In [None]:
def plot_best(n_epochs, path, experiments_set=None):

    L = [k*10+9 for k in range(int(n_epochs/10))]

    if experiments_set == None:
        df_base = pd.read_csv(path)
        df_metrics = df_base.drop(columns=['epoch_training_loss', 'epoch_validation_loss', 'epoch_testing_loss'])
        df_loss = df_base.drop(columns=['training_accuracy', 'training_f1', 'validation_accuracy', 'validation_f1', 'testing_accuracy', 'testing_f1']).melt(id_vars=['epoch', 'model-activation-optimizer-lr-wd'],
                                                                                            value_vars=['epoch_training_loss', 'epoch_validation_loss', 'epoch_testing_loss'],
                                                                                            var_name='task', value_name='loss')
    else:
        df_loss, df_metrics = get_best_loss_metrics(experiments_set, path)

    print('Pérdidas:')
    sns.catplot(data=df_loss, x='epoch', y='loss',  hue='task', col='model-activation-optimizer-lr-wd',
                col_wrap=1, kind='point', height=4, aspect=1.5)
    plt.xticks(L)
    plt.show()

    print('\nMétricas:')
    sns.pointplot(data=df_metrics, x='epoch', y='training_accuracy', color='C0', label='Train')
    sns.pointplot(data=df_metrics, x='epoch', y='validation_accuracy', color='C1', label='Val')
    sns.pointplot(data=df_metrics, x='epoch', y='testing_accuracy', color='C2', label='Test')
    plt.ylabel('Accuracy')
    plt.xticks(L)
    plt.legend(loc='lower right')
    plt.show()

---
# Descripción, carga y preprocesamiento del dataset

El dataset a utilizar es el **[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)**, el cual es un conjunto estándar para hacer reconocimiento de imágenes. Consta de 60000 imágenes RGB de 32x32 divididas en 10 clases mutuamente excluyentes (avión, auto, pájaro, gato, ciervo, perro, rana, caballo, barco y camión), con 6000 imágenes por clase. De estas 60000 hay 50000 imágenes de entrenamiento y las otras 10000 son de evaluación.

Nuestro objetivo es entrenar una CNN que clasifique los objetos de las imágenes dentro de alguna de las 10 categorías dadas.

***Observación:*** este dataset ya está incoporado dentro de las librerías de pytorch.

In [None]:
# Tamaño de lote del baseline
BATCH_SIZE = 64

# Convertimos en tensor y normalizamos, tomando el valor medio del rango 
# posible, i.e. [0; 1].
transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Obtenemos el conjunto de entrenamiento y de evaluación, junto a sus loaders
trainset, trainloader, testset, testloader = data_preparation(BATCH_SIZE, transform)

In [None]:
# Variables comunes a todas las corridas
EPOCHS = 100

n_inputs = np.prod(np.array(trainset[0][0].shape))

CIFAR_CLASSES = trainset.classes
n_outputs = len(CIFAR_CLASSES)

loss_function = nn.CrossEntropyLoss()

---
# Modelo 1

## Definición del modelo

In [None]:
class SmallMLP(nn.Module):
    '''
    Modelo fully-connected con 5 capas ocultas, teniendo n_hidden neuronas en
    cada una de ellas. Se incluye además la posibilidad de aplicar un mismo
    dropout a todas las capas (tanto la de entrada como las ocultas).
    '''

    def __init__(self, n_inputs, n_outputs, n_hidden, activation_function, dropout=0.0):
        super().__init__()
        self.drop0 = nn.Dropout(dropout)

        self.hidden1 = nn.Linear(n_inputs, n_hidden)
        self.activ1 = activation_function
        self.drop1 = nn.Dropout(dropout)

        self.hidden2 = nn.Linear(n_hidden, n_hidden)
        self.activ2 = activation_function
        self.drop2 = nn.Dropout(dropout)

        self.hidden3 = nn.Linear(n_hidden, n_hidden)
        self.activ3 = activation_function
        self.drop3 = nn.Dropout(dropout)

        self.hidden4 = nn.Linear(n_hidden, n_hidden)
        self.activ4 = activation_function
        self.drop4 = nn.Dropout(dropout)

        self.hidden5 = nn.Linear(n_hidden, n_hidden)
        self.activ5 = activation_function
        self.drop5 = nn.Dropout(dropout)

        self.output = nn.Linear(n_hidden, n_outputs)

    def forward(self, x: torch.Tensor):
        x = self.drop0(x)

        x = self.activ1(self.hidden1(x))
        x = self.drop1(x)

        x = self.activ2(self.hidden2(x))
        x = self.drop2(x)

        x = self.activ3(self.hidden3(x))
        x = self.drop3(x)

        x = self.activ4(self.hidden4(x))
        x = self.drop4(x)

        x = self.activ5(self.hidden5(x))
        x = self.drop5(x)

        x = self.output(x)  # Output Layer
        return x

## Baseline

In [None]:
EPOCHS = 100

baseline_param = {
    'nH': 3,
    'AF': nn.Sigmoid(),
    'GD': optim.SGD,
    'LR': 0.1,
    'Mom': 0.9
}

In [None]:
#baseline_exp = []
# Instanciamos el modelo
# model = SmallMLP(n_inputs, n_outputs, baseline_param['nH'], baseline_param['AF'])
# Definimos el optimizador
# optimizer = baseline_param['GD'](model.parameters(), lr=baseline_param['LR'], momentum=baseline_param['Mom'])
# Corremos el baseline
# experiment = run_experiment(model, EPOCHS, Load_train, Load_val, loss_function, optimizer, device, use_tqdm=False)
# baseline_exp.append(experiment)

In [None]:
#plot_results(EPOCHS, f'SmallMLP/Baseline_Epocas-{EPOCHS}.csv', baseline_exp)
SmallMLP_Baseline_Epocas_100 = "https://raw.githubusercontent.com/Cheva94/Diplo_Opt/main/3_DL/Lab1/SmallMLP/Baseline_Epocas-100.csv"
plot_results(100, SmallMLP_Baseline_Epocas_100)

Nuestro baseline con esta red pequeña (5 capas ocultas con 3 neuronas por capa), es bastante pobre. Por un lado, el costo de entrenamiento es prácticamente constante y, además, el costo de validación es muy errático. Esto se ve reflejando en las métricas, ya que oscilan a lo largo de todas las épocas, sin alncazar alguna estabilización.

## Estudio de la función de activación

In [None]:
activation_functions = {
    'Sigmoid': nn.Sigmoid(),
    'Tanh': nn.Tanh(),
    'ReLU': nn.ReLU(),
    'LeakyReLU': nn.LeakyReLU()
}

In [None]:
# ActFunc_Exps1 = []

# for key in activation_functions.keys():
#     print(f'\n\n Corriendo con {key}')
#     model = SmallMLP(n_inputs, n_outputs, baseline_param['nH'], activation_functions[key])
#     optimizer = baseline_param['GD'](model.parameters(), lr=baseline_param['LR'], momentum=baseline_param['Mom'])
#     experiment = run_experiment(model, EPOCHS, Load_train, Load_val, loss_function, optimizer, device, use_tqdm=False)
#     ActFunc_Exps1.append(experiment)

In [None]:
# plot_results(EPOCHS, f'SmallMLP/ActFunc_Epocas-{EPOCHS}.csv', ActFunc_Exps1)
SmallMLP_ActFunc_Epocas_100 = "https://raw.githubusercontent.com/Cheva94/Diplo_Opt/main/3_DL/Lab1/SmallMLP/ActFunc_Epocas-100.csv"
plot_results(100, SmallMLP_ActFunc_Epocas_100)

Lo primero que analizamos es qué ocurre al variar la función de activación. La sigmoide es la función de activación utilizada en el baseline. Vemos que LeakyReLU tiene un comportamiento similar a ésta. Las otras dos mejoran bastante. Tenemos que la ReLU tiene bastantes saltos en el costo de validación, correspondiéndose con saltos en las métricas, mientras que la Tanh resulta ser la de mayor estabilidad, alcanzando dicha estabilidad en una menor cantidad de épocas (aprox. 30) respecto a ReLU (aprox. 70). También se observa una mejor correspondencia entre los costes de entrenamiento y validación para Tanh que para ReLU. Decidimos quedarnos con Tanh.