<h1 align="center">Deep Learning - Master in Deep Learning of UPM</h1>

**IMPORTANTE**

Antes de empezar debemos instalar PyTorch Lightning, por defecto, esto valdría:

In [1]:
!pip install pytorch-lightning



Además, si te encuentras ejecutando este código en Google Collab, lo mejor será que montes tu drive para tener acceso a los datos:

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


<h1 align="center">FOOD 101 - Dataset</h1>

Este un conjunto de datos desafiante de 101 categorías de alimentos, con 101,000 imágenes. Para cada clase, se proporcionan 250 imágenes de prueba revisadas manualmente, así como 750 imágenes de entrenamiento. A propósito, las imágenes de entrenamiento no se limpiaron y, por lo tanto, aún contienen cierta cantidad de ruido. Esto se presenta principalmente en forma de colores intensos y, a veces, **etiquetas incorrectas**. Todas las imágenes se redimensionaron para tener una longitud de lado máxima de **512** píxeles.

https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/

Primero: Cargar librerias, añade cualquiera que parezca necesaria

In [3]:
import datetime

import torch
import torch.nn as nn

from torchvision.datasets import MNIST
from torchvision import transforms
from torch.utils.data import DataLoader, random_split
from torchvision.transforms import ToTensor
from torchvision import datasets, transforms

import pytorch_lightning as pl
import torchmetrics
from pytorch_lightning import seed_everything

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

import matplotlib.pyplot as plt

from einops.layers.torch import Rearrange

seed_everything(42)

INFO:lightning_fabric.utilities.seed:Seed set to 42


42

Como existe `datasets.Food101` podemos cargarlo al data module

In [4]:
# transforms.Normalize([0.485, 0.456, 0.406],
#                      [0.229, 0.224, 0.225])

class Food101DataModule(pl.LightningDataModule):
    def __init__(self, batch_size=64):
        super().__init__()
        self.batch_size = batch_size
        # Transformaciones
        self.train_transform = transforms.Compose([
            transforms.Resize((512, 512)),
            transforms.RandAugment(num_ops=3, magnitude=1),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406],
                                 [0.229, 0.224, 0.225])
        ])
        self.val_test_transform = transforms.Compose([
            transforms.Resize((512, 512)),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406],
                                 [0.229, 0.224, 0.225])
        ])

    def prepare_data(self):
        datasets.Food101(root="data", split='train', download=True)
        datasets.Food101(root="data", split='test', download=True)

    def setup(self, stage=None):
        if stage in (None, "fit"):
            food_full = datasets.Food101(root="data", split='train',
                                         transform=self.val_test_transform)
            self.train_dataset, self.val_dataset = random_split(
                food_full,
                [70000, 5750],
                generator=torch.Generator().manual_seed(42)
            )
            self.train_dataset.dataset.transform = self.train_transform

        if stage == "test" or stage is None:
            self.test_dataset = datasets.Food101(root="data", split='test',
                                                 transform=self.val_test_transform)

    def train_dataloader(self):
        return DataLoader(self.train_dataset, batch_size=self.batch_size, shuffle=True)

    def val_dataloader(self):
        return DataLoader(self.val_dataset, batch_size=self.batch_size)

    def test_dataloader(self):
        return DataLoader(self.test_dataset, batch_size=self.batch_size)

## Rehaciendo el transformer

Ahora, monta tu propio transformer, usa la clase `RotaryPositionEmbedding` y `nn.MultiheadAttention` para construir la nueva clase `TransformerEncoderPlus`

https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html

In [5]:
class RotaryPositionEmbedding(nn.Module):
    def __init__(self, max_seq_len, dim, base=10000):
        super(RotaryPositionEmbedding, self).__init__()
        self.dim = dim
        self.base = base
        self.max_seq_len = max_seq_len
        self.embeddings = self.get_positional_embeddings()

    def get_positional_embeddings(self):
        inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float() / self.dim))
        pos = torch.arange(self.max_seq_len, dtype=torch.float).unsqueeze(1)
        sinusoid_inp = torch.einsum("ik,j->ij", pos, inv_freq)
        embeddings = torch.cat((sinusoid_inp.sin(), sinusoid_inp.cos()), dim=-1)
        return embeddings

    def forward(self, x):
        # x: [batch_size, seq_len, dim]
        _, seq_len, _ = x.shape
        x_rotated = torch.einsum("bnd,nd->bnd", x, self.embeddings[:seq_len].to(x.device))
        return x_rotated

Primero, rehacer el bloque transformer usando `nn.MultiheadAttention`

In [6]:
class TransformerBlockPlus(nn.Module):
    """
    Transformer Block
    hidden_dim [int]: size of the representation
    num_heads [int]: number of attention heads
    dropout_prob [float]: dropout probability
    """
    def __init__(self, hidden_dim, num_heads, dropout_prob=0.0, **kwargs):
        super(TransformerBlockPlus, self).__init__()
        self.mhsdpa = nn.MultiheadAttention(hidden_dim, num_heads, **kwargs)
        self.rope = RotaryPositionEmbedding(512, hidden_dim) # 512 es un tamaño arbitrario!
        self.norm1 = nn.LayerNorm(hidden_dim)
        self.norm2 = nn.LayerNorm(hidden_dim)
        self.dropout1 = nn.Dropout(dropout_prob)
        self.dropout2 = nn.Dropout(dropout_prob)
        self.feed_forward = nn.Sequential(
            nn.Linear(hidden_dim, 4 * hidden_dim),
            nn.GELU(),
            nn.Linear(4 * hidden_dim, hidden_dim)
        )

    def forward(self, x):
        # x > [batch_size, seq_len, hidden_dim]
        x = self.norm1(x)
        qx, kx = self.rope(x), self.rope(x) # Nuevo!
        attention_output, _ = self.mhsdpa(qx, kx, x)
        x = x + self.dropout1(attention_output)
        x = self.norm2(x)
        feed_forward_output = self.feed_forward(x)
        x = x + self.dropout2(feed_forward_output)
        return x

Finalmente, ponemos todo en comun de nuevo... esta vez generalizando a n capas. Esta vez necesitaremos una red más grande.

In [7]:
class TransformerEncoderPlus(nn.Module):
    """
    LSTM Regressor model
    h[int]: altura de imagen troceada
    w[int]: anchura de imagen troceada
    c[int]: número de canales de la imagen
    hidden_size[int]: tamaño de las capas ocultas de la RNN
    heads[int]: número de cabezas de atención
    blocks[int]: número de bloques transformer
    p_drop[float]: probabilidad de dropout
    output_size[int]: tamaño de la salida de la red (n_classes)
    """
    def __init__(self, h=7, w=7, # Siempre qe la imagen sea divisible por h y w!
                 c=1,
                 hidden_size=64,
                 heads=4,
                 blocks=1,
                 p_drop=0.0,
                 output_size=1,
                 ):
        super(TransformerEncoderPlus, self).__init__()
        self.linproj = nn.Linear(h*w*c, hidden_size)
        self.blocks = nn.Sequential(*[TransformerBlockPlus(hidden_size, heads, p_drop) for _ in range(blocks)])
        self.fc = nn.Linear(hidden_size, output_size)
        self.crop = Rearrange('b c (h i) (w j) -> b (i j) (c h w)', h=h, w=w)

    def forward(self, x):
        # x[batch_size; color_channel; realh; realw]
        # Queremos transformarlo en
        # x[batch_size; seq_len; h*w]
        x = self.crop(x)
        x = self.linproj(x)
        x = self.blocks(x).mean(1) # Mean pooling de todos los embeddings
        return self.fc(x) #out[batch_size; output_size]

## Entrenamiento
Vamos a hacer el entrenamiento, definiendo el modulo de lighting y el propio bucle de entrenamiento.

In [8]:
class Food101Classifier(pl.LightningModule):
    def __init__(self, model, classes=10, learning_rate=1e-3):
        super().__init__()
        self.save_hyperparameters(ignore=['model']) # guardamos la configuración de hiperparámetros
        self.learning_rate = learning_rate
        self.model = model
        self.criterion = nn.CrossEntropyLoss()
        self.acc = torchmetrics.Accuracy('multiclass', num_classes=classes)
        self.f1score = torchmetrics.F1Score(task="multiclass", num_classes=classes)

    def forward(self, x):
        return self.model(x)

    def compute_batch(self, batch, split='train'):
        inputs, targets = batch
        preds = self(inputs)
        targets = targets.view(-1)

        loss = self.criterion(preds, targets)
        self.log_dict(
            {
                f'{split}_loss': loss,
                f'{split}_acc': self.acc(preds, targets),
                f'{split}_f1': self.f1score(preds, targets)
            },
            on_epoch=True, prog_bar=True)

        return loss

    def training_step(self, batch, batch_idx):
        return self.compute_batch(batch, 'train')

    def validation_step(self, batch, batch_idx):
        return self.compute_batch(batch, 'val')

    def test_step(self, batch, batch_idx):
        return self.compute_batch(batch, 'test')

    def configure_optimizers(self):
        return torch.optim.AdamW(self.parameters(), lr=self.learning_rate) # self.parameters() son los parámetros del modelo

### Bucle de entrenamiento

In [None]:
# Parámetros
SAVE_DIR = f'lightning_logs/mnistformer/{datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}'
w = 32
h = 32
batch_size = 64
hidden_size = 1024
heads = 16
blocks = 6
learning_rate = 1e-3
p_drop = 0.1
labels = 101

# DataModule
data_module = Food101DataModule(batch_size=batch_size)

# Model
transformer = TransformerEncoderPlus(h=h,
                                      w=w, # Siempre qe la imagen sea divisible por h y w!
                                      c=3,
                                      heads=heads,
                                      blocks=blocks,
                                      hidden_size=hidden_size,
                                      p_drop=p_drop,
                                      output_size=labels,
                                      )

# LightningModule
module = Food101Classifier(transformer, learning_rate=learning_rate, classes=labels)

# Callbacks
early_stopping_callback = pl.callbacks.EarlyStopping(
    monitor='val_loss', # monitorizamos la pérdida en el conjunto de validación
    mode='min',
    patience=5, # número de epochs sin mejora antes de parar
    verbose=False, # si queremos que muestre mensajes del estado del early stopping
)
model_checkpoint_callback = pl.callbacks.ModelCheckpoint(
    monitor='val_loss', # monitorizamos la pérdida en el conjunto de validación
    mode='min', # queremos minimizar la pérdida
    save_top_k=1, # guardamos solo el mejor modelo
    dirpath=SAVE_DIR, # directorio donde se guardan los modelos
    filename=f'best_model' # nombre del archivo
)

callbacks = [early_stopping_callback, model_checkpoint_callback]

# Loggers
csv_logger = pl.loggers.CSVLogger(
    save_dir=SAVE_DIR,
    name='metrics',
    version=None
)

loggers = [csv_logger] # se pueden poner varios loggers (mirar documentación)

# Trainer
trainer = pl.Trainer(max_epochs=50, accelerator='gpu',
                     callbacks=callbacks, logger=loggers,
                     precision='16-mixed')

trainer.fit(module, data_module)
results = trainer.test(module, data_module)

INFO:pytorch_lightning.utilities.rank_zero:Using 16bit Automatic Mixed Precision (AMP)
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:654: Checkpoint directory /content/lightning_logs/mnistformer/2024-12-11_16-40-40 exists and is not empty.
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name      | Type                   | Params | Mode 
-------------------------------------------------------------
0 | model     | TransformerEncoderPlus | 78.8 M | train
1 | criterion | CrossEntropyLoss       | 0      | train
2 | acc       | MulticlassAccuracy     | 0      | train
3 | f1score   | MulticlassF1Score      | 0      | train


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]