# Clasificación de imágenes con PyTorch

Se parte del mismo modelo de Fashion MNIST, pero cambiando el dataset por MNIST. Además fijamos la semilla para que los resultados sean reproducibles.

<!-- TODO: https://www.reddit.com/r/learnmachinelearning/comments/11m0tyy/what_exactly_is_happening_when_i_dont_normalize/ -->

In [1]:
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

_ = torch.manual_seed(42) # Fijamos la semilla

In [3]:
dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
dataloader = DataLoader(dataset, batch_size=64, shuffle=False)

Cargamos los datos como tensores, pero esta vez calcularemos su media y desviación estándar para normalizarlos.

In [4]:
def calculate_mean_std(dataloader):
    mean = 0.
    std = 0.
    total_samples = 0.
    for data, _ in dataloader:
        batch_samples = data.size(0)
        data = data.view(batch_samples, data.size(1), -1)
        mean += data.mean(2).sum(0)
        std += data.std(2).sum(0)
        total_samples += batch_samples

    mean /= total_samples
    std /= total_samples
    return mean, std

mean, std = calculate_mean_std(dataloader)
print(f"Mean: {mean.item()}, Std: {std.item()}")

Mean: 0.13066041469573975, Std: 0.30150410532951355


Volvemos a cargar los datos aplicando esta vez la transformación de normalización.

(Sin embargo, ya que luego se aplicará normalización por lotes, finalmente he dejado esta transformación comentada. De hecho los resultados son mejores sin hacerla).

In [5]:
from torchvision import datasets, transforms

transform = transforms.Compose([transforms.ToTensor(),
                          #    transforms.Normalize((mean,), (std,)),
                                ])

training_data = datasets.MNIST(root="data", train=True, download=True, transform=transform)
test_data = datasets.MNIST(root="data", train=False, download=True, transform=transform)

In [6]:
from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=len(test_data))

En este caso utilizo un único *batch* para todo el grupo de test, ya que no es necesario hacerlo por partes (todos los datos caben sobradamente en memoria) y reducirá el tiempo de ejecución. 

## Definición de la red neuronal

In [7]:
import torch.nn as nn

class FNN(nn.Module): 
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),   
            nn.Dropout(0.2),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.2),      
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
model = FNN()

Se han añadido capas de normalización de tipo *BatchNorm2d* (normalización por *batch* en 2 dimensiones) y *Dropout* (para evitar el sobreajuste).

BatchNorm2d normaliza la entrada por *batch* de tal forma que la media y la desviación estándar sean 0 y 1 respectivamente. Dropout desactiva aleatoriamente un porcentaje de las neuronas de la capa anterior.

In [8]:
device = (
    "cuda" if torch.cuda.is_available() 
    else "mps" if torch.backends.mps.is_available()
    else "cpu" 
)
print(f"Using {device} device")
model = model.to(device)

Using cpu device


## Optimización y entrenamiento

In [9]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

Se ha utilizado el optimizador Adam, siendo el cambio más significativo.

In [10]:
def train(dataloader, model : nn.Module, loss_fn, optimizer):    
    model.train()
    for X, y in dataloader:
        pred = model(X)
        loss = loss_fn(pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

In [11]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f}\n")

In [12]:
epochs = 20
for t in range(epochs):
    train(train_dataloader, model, loss_fn, optimizer)
    print(f"Epoch {t+1}: ")
    test(test_dataloader, model, loss_fn)

Epoch 1: 
Accuracy: 97.0%, Avg loss: 0.097208

Epoch 2: 
Accuracy: 97.4%, Avg loss: 0.078898

Epoch 3: 
Accuracy: 97.8%, Avg loss: 0.071459

Epoch 4: 
Accuracy: 98.0%, Avg loss: 0.067445

Epoch 5: 
Accuracy: 97.7%, Avg loss: 0.079866

Epoch 6: 
Accuracy: 98.3%, Avg loss: 0.056848

Epoch 7: 
Accuracy: 98.2%, Avg loss: 0.060812

Epoch 8: 
Accuracy: 98.2%, Avg loss: 0.061460

Epoch 9: 
Accuracy: 98.2%, Avg loss: 0.063102

Epoch 10: 
Accuracy: 98.2%, Avg loss: 0.067923

Epoch 11: 
Accuracy: 98.2%, Avg loss: 0.077423

Epoch 12: 
Accuracy: 98.4%, Avg loss: 0.062102

Epoch 13: 
Accuracy: 98.5%, Avg loss: 0.082789

Epoch 14: 
Accuracy: 98.3%, Avg loss: 0.068258

Epoch 15: 
Accuracy: 98.0%, Avg loss: 0.104543

Epoch 16: 
Accuracy: 98.2%, Avg loss: 0.079497

Epoch 17: 
Accuracy: 98.5%, Avg loss: 0.067877

Epoch 18: 
Accuracy: 98.4%, Avg loss: 0.075240

Epoch 19: 
Accuracy: 98.5%, Avg loss: 0.065426

Epoch 20: 
Accuracy: 98.4%, Avg loss: 0.070814



## Fuentes
- https://stackoverflow.com/questions/63746182/correct-way-of-normalizing-and-scaling-the-mnist-dataset