<h1 align="center">Deep Learning - Master in Deep Learning of UPM</h1> 

En esta sesión práctica de Pytorch Lightning aprenderemos:
- [LightningDataModule](https://lightning.ai/docs/pytorch/stable/data/datamodule.html) - Utilidad única de Lightning que encapsula todo lo relacionado con el manejo de los datos antes, durante y después del entrenamiento.
- [LightningModule](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html) - La pieza básica de Lightning en la que definiremos tanto la arquitectura como el entrenamiento de nuestros modelos.
- [Trainer](https://lightning.ai/docs/pytorch/stable/common/trainer.html) - Módulo altamente customizable con el que podremos entrenar y realizar inferencia.
- [Callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html) - Funciones que se ejecutarán cuando determinados eventos a nuestra elección se den.
- [Logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) - Registro automático de métricas

**IMPORTANTE**

Antes de empezar debemos instalar PyTorch Lightning, por defecto, esto valdría:

In [None]:
!pip install pytorch-lightning

Además, si te encuentras ejecutando este código en Google Collab, lo mejor será que montes tu drive para tener acceso a los datos:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Preprocesado y Gestión de Datos con Lightning: LightningDataModule

Pytorch Lightning propone el uso del [LightningDataModule](https://lightning.ai/docs/pytorch/stable/data/datamodule.html) para el preprocesado y la gestión de los diferentes splits.
¿Es realmente necesario hacer uso de esta herramienta?
- Respuesta corta: no.
- Respuesta larga (y coherente): no, sin embargo, nos va a ayudar a tener un código mucho más centralizado y lo más importante, **reproducible**.

Existen ciertos parámetros y/o funciones de preprocesamiento que vamos a aplicar a todos los splits de nuestro dataset. El DataModule nos va a otorgar la capacidad de gestionar esto con amplia flexibilidad.

Como ejemplo vamos a cargar el dataset de iris

In [None]:
import pandas as pd

IRIS_PATH = 'data/iris.csv'

iris_df = pd.read_csv(IRIS_PATH)
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


A continuación vamos a encapsular este conjunto de datos en un PyTorch Dataset

In [None]:
import torch
import pandas as pd

class IrisDataset(torch.utils.data.Dataset):
    def __init__(self, df):
        self.data = df
        self.labels = self.data['target'].values
        self.features = self.data.drop('target', axis=1).values # todas las columnas menos la última

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        features = self.features[idx]
        target = self.labels[idx]
        return features, target

iris_df = pd.read_csv(IRIS_PATH) # cargamos el dataset
iris_dataset = IrisDataset(iris_df)
iris_dataset[0:4] # muestra los 4 primeros elementos

(array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2]]),
 array([0, 0, 0, 0]))

Primero, crearemos una función que nos divida un DataFrame en train, validación y test

In [13]:
from sklearn.model_selection import train_test_split

def split_train_val_test(df, val_size=0.2, test_size=0.2):
    eval_size = val_size + test_size # eval es un split intermedio que luego se divide en val y test
    test_prop = test_size / eval_size # proporción de test respecto a eval

    train, eval_ = train_test_split(df, test_size=eval_size)
    val, test = train_test_split(eval_, test_size=test_prop)
    return train, val, test

En PyTorch básico preprocesaríamos y crearíamos los DataLoaders de la siguiente manera

In [None]:
from torch.utils.data import DataLoader
import numpy as np

def collate_fn(batch):
    features, targets = zip(*batch)
    features = torch.tensor(np.stack(features), dtype=torch.float32)
    targets = torch.tensor(targets, dtype=torch.long)
    return features, targets

iris_df = pd.read_csv(IRIS_PATH)
train_df, val_df, test_df = split_train_val_test(iris_df)

iris_train = IrisDataset(train_df)
iris_val = IrisDataset(val_df)
iris_test = IrisDataset(test_df)

iris_train_loader = DataLoader(iris_train, batch_size=16, shuffle=True, collate_fn=collate_fn)
iris_val_loader = DataLoader(iris_val, batch_size=16, collate_fn=collate_fn)
iris_test_loader = DataLoader(iris_test, batch_size=16, collate_fn=collate_fn)

torch.Size([16, 4]) torch.Size([16])


El equivalente en Lightning es exactamente igual, sin embargo, al estar encapsulado, lo hacemos reusable en otros proyectos.

Puede parecer más código, sin embargo, al ser modular, es mucho más fácil realizar cambios. Además desde el mismo objeto tenemos acceso a todos los dataloaders.

In [None]:
import pytorch_lightning
from torch.utils.data import DataLoader

class IrisDataModule(pytorch_lightning.LightningDataModule):
    def __init__(self, df, batch_size=16):
        super().__init__()
        self.train_df, self.val_df, self.test_df = split_train_val_test(df)
        self.batch_size = batch_size
    
    def prepare_data(self): 
        """
        Esta función no es necesaria de implementar
        Solo vale por si se necesita descargar el dataset o algún preprocesamiento muy temprano
        Únicamente se ejecuta una vez al principio del entrenamiento
        """
        pass # descargar_dataset()

    def setup(self, stage=None): # esta función la ejecuta el trainer cuando se va a ejecutar el fit o el predict
        if stage == 'fit':
            self.train_dataset = IrisDataset(self.train_df)
            self.val_dataset = IrisDataset(self.val_df)

        elif stage == 'test':
            self.test_dataset = IrisDataset(self.test_df)

    def collate_fn(self, batch):
        features, targets = zip(*batch)
        features = torch.tensor(np.stack(features), dtype=torch.float32)
        targets = torch.tensor(targets, dtype=torch.long)
        return features, targets

    def train_dataloader(self):
        return DataLoader(self.train_dataset, batch_size=self.batch_size, shuffle=True, collate_fn=self.collate_fn)

    def val_dataloader(self):
        return DataLoader(self.val_dataset, batch_size=self.batch_size, collate_fn=self.collate_fn)

    def test_dataloader(self):
        return DataLoader(self.test_dataset, batch_size=self.batch_size, collate_fn=self.collate_fn)

iris_df = pd.read_csv(IRIS_PATH)
iris_data_module = IrisDataModule(iris_df)

¿Por qué nos da error si ya hemos instanciado el data module?

Aún no hemos empezado a entrenar, por lo que internamente Lightning no ha ejecutado la función setup() del data_module
En nuestro caso (iris) esto no es relevante, sin embargo, en el mundo real donde un split pueden ser, por ejemplo, cientos de GB
de imágenes, tenerlas cargadas en memoria antes de entrenar puede ser innecesario.

In [21]:
train_dataloader = iris_data_module.train_dataloader()

AttributeError: 'IrisDataModule' object has no attribute 'train_dataset'

# Arquitectura y lógica de entrenamiento en un mismo módulo (LightningModule y Trainer)

¿Acaso odias tener que preocuparte de **que está o no está en GPU**?

¿Detestas **el bucle de entrenamiento** tan estéticamente horroroso que tenemos que hacer para entrenar con PyTorch?

¿Estás cansado de tener que preocuparte de **como y cada cuanto** loggear las métricas?

Si te sientes identificado, quizás empieces a entender el potencial de PytorchLightning con su pieza de puzle básica, el LightningModule.

A continuación vamos a realizar una comparativa entre como sería entrenar un simple MLP para el dataset iris con PyTorch versus con PyTorchLightning

In [38]:
import torch.nn as nn
from sklearn.metrics import f1_score

class MLP(nn.Module):
    def __init__(self, input_shape=4, n_classes=3):
        super().__init__()
        self.input_shape = input_shape
        self.n_classes = n_classes

        self.l1 = nn.Linear(self.input_shape, 64)
        self.l2 = nn.Linear(64, self.n_classes)
        self.act = nn.ReLU()

    def forward(self, x):
        # Forward pass
        x = self.l1(x)
        x = self.act(x)
        x = self.l2(x)
        return x
    
def compute_metrics(preds, targets, losses):
    preds = torch.cat(preds).cpu().numpy()
    targets = torch.cat(targets).cpu().numpy()

    acc = (preds == targets).sum() / len(targets)
    f1 = f1_score(targets, preds, average='macro')
    loss = sum(losses) / len(losses)
    return acc, f1, loss
    
def evaluate(loader, model, criterion, device='cpu'):
    with torch.no_grad():
        preds, targets, losses = [], [], []
        for inputs, target in loader:
            inputs = inputs.to(device)
            output = model(inputs)

            loss = criterion(output, target)
            pred = torch.argmax(output, dim=1)

            preds.append(pred)
            targets.append(target)
            losses.append(loss.item())
    
    return compute_metrics(preds, targets, losses)


def train(model, train_loader, val_loader, epochs=10, learning_rate=1e-3):
    # device = 'cuda' if torch.cuda.is_available() else 'cpu'
    device = 'cpu'
    
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    model.to(device)
    for epoch in range(epochs):
        model.train()
        preds, targets, losses = [], [], []
        for inputs, target in train_loader:
            inputs = inputs.to(device) # mandamos el inputs al mismo device (GPU) que el modelo

            optimizer.zero_grad() # ponemos a cero los gradientes

            output = model(inputs) # forward pass
            pred = torch.argmax(output, dim=1) # obtenemos la clase predicha

            loss = criterion(output, target) # calculamos la pérdida
            loss.backward() # backpropagation

            optimizer.step() # actualizamos los pesos

            preds.append(pred)
            targets.append(target)
            losses.append(loss.item())

        train_acc, train_f1, train_loss = compute_metrics(preds, targets, losses)

        model.eval()
        val_acc, val_f1, val_loss = evaluate(val_loader, model, criterion, device=device)

        print(f'Epoch [{epoch+1}/{epochs}] - Train Loss: {train_loss:.3f} - Train Acc: {train_acc:.3f} - Train F1: {train_f1:.3f} - Val Loss: {val_loss:.3f} - Val Acc: {val_acc:.3f} - Val F1: {val_f1:.3f}')

    return model

In [39]:
model = MLP()
model = train(model, iris_train_loader, iris_val_loader, epochs=30)

Epoch [1/30] - Train Loss: 1.428 - Train Acc: 0.367 - Train F1: 0.179 - Val Loss: 1.370 - Val Acc: 0.200 - Val F1: 0.111
Epoch [2/30] - Train Loss: 1.240 - Train Acc: 0.367 - Train F1: 0.179 - Val Loss: 1.185 - Val Acc: 0.200 - Val F1: 0.111
Epoch [3/30] - Train Loss: 1.134 - Train Acc: 0.367 - Train F1: 0.179 - Val Loss: 1.079 - Val Acc: 0.200 - Val F1: 0.111
Epoch [4/30] - Train Loss: 1.050 - Train Acc: 0.367 - Train F1: 0.179 - Val Loss: 1.046 - Val Acc: 0.200 - Val F1: 0.111
Epoch [5/30] - Train Loss: 0.977 - Train Acc: 0.378 - Train F1: 0.201 - Val Loss: 1.016 - Val Acc: 0.233 - Val F1: 0.181
Epoch [6/30] - Train Loss: 0.933 - Train Acc: 0.644 - Train F1: 0.522 - Val Loss: 0.998 - Val Acc: 0.467 - Val F1: 0.457
Epoch [7/30] - Train Loss: 0.894 - Train Acc: 0.711 - Train F1: 0.572 - Val Loss: 0.977 - Val Acc: 0.500 - Val F1: 0.481
Epoch [8/30] - Train Loss: 0.857 - Train Acc: 0.711 - Train F1: 0.572 - Val Loss: 0.947 - Val Acc: 0.500 - Val F1: 0.481
Epoch [9/30] - Train Loss: 0.811

Como podemos ver, en PyTorch, la lógica de entrenamiento está completamente separado de la arquitectura del modelo, así como de como se realiza el forward pass.

Veamos a continuación como implementar el mismo ejemplo en PyTorch Lightning

In [43]:
from torchmetrics import Accuracy, F1Score

class Classifier(pytorch_lightning.LightningModule):
    def __init__(self, input_shape=4, n_classes=3):
        super().__init__()

        # Inicializamos las capas de la red
        self.l1 = nn.Linear(input_shape, 64)
        self.l2 = nn.Linear(64, n_classes)
        self.act = nn.ReLU()

        # Función de pérdida
        self.criterion = nn.CrossEntropyLoss() 

        # Inicializamos las métricas
        self.accuracy = Accuracy(task='multiclass', num_classes=n_classes)
        self.f1 = F1Score(task='multiclass', num_classes=n_classes)

    # Función forward como en un nn.Module de PyTorch
    def forward(self, x):
        x = self.l1(x)
        x = self.act(x)
        x = self.l2(x)
        return x
    
    # Como computamos un batch del train_dataloader
    def training_step(self, batch, batch_idx):
        inputs, targets = batch
        output = self(inputs)
        preds = torch.argmax(output, dim=1)

        loss = self.criterion(output, targets)
        self.log_dict(
            {
                'train_loss': loss, 
                'train_acc': self.accuracy(preds, targets), 
                'train_f1': self.f1(preds, targets)
            }, 
            prog_bar=True, on_epoch=True)
        
        return loss
    
    # Como computamos un batch del val_dataloader
    def validation_step(self, batch, batch_idx):
        inputs, targets = batch
        output = self(inputs)
        preds = torch.argmax(output, dim=1)

        loss = self.criterion(output, targets)
        self.log_dict(
            {
                'val_loss': loss, 
                'val_acc': self.accuracy(preds, targets), 
                'val_f1': self.f1(preds, targets) 
            }, 
            prog_bar=True, on_epoch=True)
        
        return loss
    
    # Como computamos un batch del test_dataloader
    def test_step(self, batch, batch_idx):
        inputs, targets = batch
        output = self(inputs)
        preds = torch.argmax(output, dim=1)

        loss = self.criterion(output, targets)
        self.log_dict(
            {
                'test_loss': loss, 
                'test_acc': self.accuracy(preds, targets), 
                'test_f1': self.f1(preds, targets)
            }, 
            prog_bar=True, on_epoch=True)
        
        return loss
    
    # Configuración del optimizador
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-3) # self.parameters() son los parámetros del modelo

In [45]:
model = Classifier()
trainer = pytorch_lightning.Trainer(max_epochs=5, accelerator='cpu')

trainer.fit(model, iris_data_module)
trainer.test(model, iris_data_module)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name      | Type               | Params | Mode 
---------------------------------------------------------
0 | l1        | Linear             | 320    | train
1 | l2        | Linear             | 195    | train
2 | act       | ReLU               | 0      | train
3 | criterion | CrossEntropyLoss   | 0      | train
4 | accuracy  | MulticlassAccuracy | 0      | train
5 | f1        | MulticlassF1Score  | 0      | train
---------------------------------------------------------
515       Trainable params
0         Non-trainable params
515       Total params
0.002     Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:298: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=5` reached.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.9697222113609314,
  'test_acc': 0.4000000059604645,
  'test_f1': 0.4000000059604645}]

# Funciones que Responden a Eventos - Callbacks

PyTorch Lightning gestiona su "bucle" de entrenamiento de manera encapsulada. Nosotros únicamente debemos decirle como hace el forward pass nuestra arquitectura, como se computa el batch y como se calcula la función de pérdida.

Esto no quiere decir que no tengamos acceso al resto de eventos que ocurren durante el entrenamiento...

¿Y si quiero que al final de cada batch se ejecute cierta función?
¿Y si quiero que cuando cada evaluación termine se almacenen los datos de cierta forma?

Para esto, Lightning nos ofrece los **hooks**, funciones del LightningModule que el usuario puede sobreescribir para ejecutar cierto código arbitrario cuando se dé ese determinado evento.

Los **callbacks** agrupan ciertos de estos **hooks** para realizar cierta funcionalidad. En Lightning podemos crear nuestros propios Callbacks, sin embargo, ya tenemos una gran variedad ya implementados con los que vamos a poder, por ejemplo, guardar los pesos del mejor modelo en validación (que asumimos que ha sido la época en la que se ha conseguido mejor convergencia). 

In [55]:
import pytorch_lightning

class MyPrintingCallback(pytorch_lightning.Callback):
    def on_train_start(self, trainer, pl_module):
        print("Yo me ejecuto CUANDO el entrenamiento EMPIEZA")

    def on_train_end(self, trainer, pl_module):
        print("Yo me ejecuto CUANDO el entrenamiento TERMINA")
    
    def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx):
        if batch_idx % 2 == 0:
            print(f"Hola!, soy el batch {batch_idx}, soy par y acabo de terminar")
        elif batch_idx % 2 != 0:
            print(f"Hola!, soy el batch {batch_idx}, soy impar y acabo de terminar")

In [None]:
import pandas as pd

data = pd.read_csv(IRIS_PATH)
data_module = IrisDataModule(data)

model = Classifier()

trainer = pytorch_lightning.Trainer(max_epochs=5, accelerator='cpu', callbacks=[MyPrintingCallback()])

trainer.fit(model, iris_data_module)
trainer.test(model, iris_data_module)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name      | Type               | Params | Mode 
---------------------------------------------------------
0 | l1        | Linear             | 320    | train
1 | l2        | Linear             | 195    | train
2 | act       | ReLU               | 0      | train
3 | criterion | CrossEntropyLoss   | 0      | train
4 | accuracy  | MulticlassAccuracy | 0      | train
5 | f1        | MulticlassF1Score  | 0      | train
---------------------------------------------------------
515       Trainable params
0         Non-trainable params
515       Total params
0.002     Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Yo me ejecuto CUANDO el entrenamiento EMPIEZA


/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:298: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

Hola!, soy el batch 0, soy par y acabo de terminar
Hola!, soy el batch 1, soy impar y acabo de terminar
Hola!, soy el batch 2, soy par y acabo de terminar
Hola!, soy el batch 3, soy impar y acabo de terminar
Hola!, soy el batch 4, soy par y acabo de terminar
Hola!, soy el batch 5, soy impar y acabo de terminar


Validation: |          | 0/? [00:00<?, ?it/s]

Hola!, soy el batch 0, soy par y acabo de terminar
Hola!, soy el batch 1, soy impar y acabo de terminar
Hola!, soy el batch 2, soy par y acabo de terminar
Hola!, soy el batch 3, soy impar y acabo de terminar
Hola!, soy el batch 4, soy par y acabo de terminar
Hola!, soy el batch 5, soy impar y acabo de terminar


Validation: |          | 0/? [00:00<?, ?it/s]

Hola!, soy el batch 0, soy par y acabo de terminar
Hola!, soy el batch 1, soy impar y acabo de terminar
Hola!, soy el batch 2, soy par y acabo de terminar
Hola!, soy el batch 3, soy impar y acabo de terminar
Hola!, soy el batch 4, soy par y acabo de terminar
Hola!, soy el batch 5, soy impar y acabo de terminar


Validation: |          | 0/? [00:00<?, ?it/s]

Hola!, soy el batch 0, soy par y acabo de terminar
Hola!, soy el batch 1, soy impar y acabo de terminar
Hola!, soy el batch 2, soy par y acabo de terminar
Hola!, soy el batch 3, soy impar y acabo de terminar
Hola!, soy el batch 4, soy par y acabo de terminar
Hola!, soy el batch 5, soy impar y acabo de terminar


Validation: |          | 0/? [00:00<?, ?it/s]

Hola!, soy el batch 0, soy par y acabo de terminar
Hola!, soy el batch 1, soy impar y acabo de terminar
Hola!, soy el batch 2, soy par y acabo de terminar
Hola!, soy el batch 3, soy impar y acabo de terminar
Hola!, soy el batch 4, soy par y acabo de terminar
Hola!, soy el batch 5, soy impar y acabo de terminar


Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=5` reached.


Yo me ejecuto CUANDO el entrenamiento TERMINA


/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.8757173418998718,
  'test_acc': 0.6000000238418579,
  'test_f1': 0.6000000238418579}]

Los que más vamos a utilizar son:
- [EarlyStopping](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.EarlyStopping.html#lightning.pytorch.callbacks.EarlyStopping)
- [ModelCheckpoint](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.ModelCheckpoint.html#lightning.pytorch.callbacks.ModelCheckpoint)

In [None]:
import datetime
# DataModule
data = pd.read_csv(IRIS_PATH)
data_module = IrisDataModule(data)

# LightningModule
model = Classifier()

# Callbacks
early_stopping_callback = pytorch_lightning.callbacks.EarlyStopping(
    monitor='val_loss', # monitorizamos la pérdida en el conjunto de validación
    mode='min', # queremos minimizar la pérdida
    patience=3, # número de epochs sin mejora antes de parar
    min_delta=0.001, # diferencia mínima para considerar que hay mejora
    verbose=False, # si queremos que muestre mensajes del estado del early stopping 
)
model_checkpoint_callback = pytorch_lightning.callbacks.ModelCheckpoint(
    monitor='val_loss', # monitorizamos la pérdida en el conjunto de validación
    mode='min', # queremos minimizar la pérdida
    save_top_k=1, # guardamos solo el mejor modelo
    dirpath='lightning_logs/models/', # directorio donde se guardan los modelos
    filename=f'best_model_{datetime.datetime.now()}' # nombre del archivo
)

callbacks = [early_stopping_callback, model_checkpoint_callback]

# Trainer
trainer = pytorch_lightning.Trainer(max_epochs=50, accelerator='cpu', callbacks=callbacks)

trainer.fit(model, iris_data_module)
trainer.test(model, iris_data_module)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:652: Checkpoint directory /home/adrian/workspace/deep-learning-dlmasterupm/assignments/pytorch_basics/session_5/lightning_logs/models exists and is not empty.

  | Name      | Type               | Params | Mode 
---------------------------------------------------------
0 | l1        | Linear             | 320    | train
1 | l2        | Linear             | 195    | train
2 | act       | ReLU               | 0      | train
3 | criterion | CrossEntropyLoss   | 0      | train
4 | accuracy  | MulticlassAccuracy | 0      | train
5 | f1        | MulticlassF1Score  | 0      | train
---------------------------------------------------------
515       Trainable params
0         Non-trainable params
515       Total params
0.002     Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:298: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=50` reached.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.3061424195766449,
  'test_acc': 0.9333333373069763,
  'test_f1': 0.9333333373069763}]

# Registro de Métricas Automático

Utilizaremos el logger por defecto de Lightning que nos guarda lo que añadamos al log en un CSV ([CSVLogger](https://lightning.ai/docs/pytorch/stable/extensions/generated/lightning.pytorch.loggers.CSVLogger.html#lightning.pytorch.loggers.CSVLogger))

In [None]:
import datetime

MODEL_ID = datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')

# DataModule
data = pd.read_csv(IRIS_PATH)
data_module = IrisDataModule(data)

# LightningModule
model = Classifier()

# Callbacks
early_stopping_callback = pytorch_lightning.callbacks.EarlyStopping(
    monitor='val_loss', # monitorizamos la pérdida en el conjunto de validación
    mode='min', # queremos minimizar la pérdida
    patience=3, # número de epochs sin mejora antes de parar
    min_delta=0.001, # diferencia mínima para considerar que hay mejora
    verbose=False, # si queremos que muestre mensajes del estado del early stopping 
)
model_checkpoint_callback = pytorch_lightning.callbacks.ModelCheckpoint(
    monitor='val_loss', # monitorizamos la pérdida en el conjunto de validación
    mode='min', # queremos minimizar la pérdida
    save_top_k=1, # guardamos solo el mejor modelo
    dirpath=f'lightning_logs/iris/{MODEL_ID}/', # directorio donde se guardan los modelos
    filename=f'best_model' # nombre del archivo
)

callbacks = [early_stopping_callback, model_checkpoint_callback]

# Loggers
csv_logger = pytorch_lightning.loggers.CSVLogger(
    save_dir=f'lightning_logs/iris/{MODEL_ID}/',
    name='metrics',
    version=None
)

loggers = [csv_logger] # se pueden poner varios loggers (mirar documentación)

# Trainer
trainer = pytorch_lightning.Trainer(max_epochs=50, accelerator='cpu', callbacks=callbacks, logger=loggers)

trainer.fit(model, iris_data_module)
trainer.test(model, iris_data_module)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Missing logger folder: lightning_logs/iris/2024-11-28_16-32-56/metrics


/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:652: Checkpoint directory /home/adrian/workspace/deep-learning-dlmasterupm/assignments/pytorch_basics/session_5/lightning_logs/iris/2024-11-28_16-32-56 exists and is not empty.

  | Name      | Type               | Params | Mode 
---------------------------------------------------------
0 | l1        | Linear             | 320    | train
1 | l2        | Linear             | 195    | train
2 | act       | ReLU               | 0      | train
3 | criterion | CrossEntropyLoss   | 0      | train
4 | accuracy  | MulticlassAccuracy | 0      | train
5 | f1        | MulticlassF1Score  | 0      | train
---------------------------------------------------------
515       Trainable params
0         Non-trainable params
515       Total params
0.002     Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:298: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=50` reached.
/home/adrian/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.3539743423461914,
  'test_acc': 0.9333333373069763,
  'test_f1': 0.9333333373069763}]