<a href="https://colab.research.google.com/github/HannaRF/DeepLearning/blob/main/ap2_otimizadores.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

# Practical Session 2 - Optimizers

**Teacher:** Dario Oliveira

**T.A.:** João Alcindo



In this practical session, we will explore how different optimizers can impact the training and performance of a neural network model. Choosing the right optimizer is crucial for achieving good performance and ensuring that the model converges efficiently.

#### Objectives of the Session

1. **Understand the Function of Optimizers in Practice**: Optimizers are algorithms that adjust the model's parameters to minimize the loss function. We will analyze how different optimizers work and how they can influence the training process.

2. **Compare Various Optimizers**: We will test and compare several optimizers, including Adam, SGD, RMSprop, Adagrad, and their variations with Momentum and Nesterov. We will observe how each affects the convergence speed and model accuracy.

---




## Importing Libraries

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
from torchsummary import summary


## Set the GPU

In [2]:
# Definir o dispositivo: usa GPU se disponível, senão usa CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


Using device: cuda


## Load and preprocessing

In [3]:
# Transformações para o dataset MNIST (normalização)
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])

# Carregar o dataset MNIST
full_train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Dividir o conjunto de treino em treino e validação (80% treino, 20% validação)
train_size = int(0.8 * len(full_train_dataset))
val_size = len(full_train_dataset) - train_size
train_dataset, val_dataset = random_split(full_train_dataset, [train_size, val_size])

# DataLoader para treino, validação e teste
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:02<00:00, 4553179.60it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 132613.96it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:01<00:00, 1258654.09it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 3008611.62it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






## Building the Neural Network

In [4]:

# Definição do modelo em PyTorch
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28*28, 256)  # MNIST tem imagens 28x28
        self.fc2 = nn.Linear(256, 512)
        self.fc3 = nn.Linear(512, 256)
        self.fc4 = nn.Linear(256, 10)
        self.sigmoid = nn.Sigmoid()
        self.relu = nn.ReLU()

    def forward(self, x):
        x = x.view(-1, 28*28)  # Flatten a entrada (batch_size, 1, 28, 28) para (batch_size, 784)
        x = self.sigmoid(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x


## Training and Evaluation Functions

In [5]:
# Função de treinamento
def train_epoch(model, loader, optimizer, criterion):
    model.train()
    total_loss = 0
    for inputs, labels in loader:
        inputs, labels = inputs.to(device), labels.to(device)  # Mover dados para o dispositivo
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    return total_loss / len(loader)

# Função de avaliação (usada para validação e teste)
def evaluate(model, loader, criterion):
    model.eval()
    total_loss = 0
    correct = 0
    with torch.no_grad():
        for inputs, labels in loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            pred = outputs.argmax(dim=1, keepdim=True)  # Obtém as previsões
            correct += pred.eq(labels.view_as(pred)).sum().item()
    return total_loss / len(loader), correct / len(loader.dataset)


## Defining Optimizers and Loss Criterion

In [6]:

# Inicializar o modelo e mover para o dispositivo apropriado
model = SimpleNN().to(device)

# Otimizadores disponíveis (comentados)

#optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam

optimizer = optim.SGD(model.parameters(), lr=0.01)  # SGD
# optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)  # SGD com Momentum
# optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, nesterov=True)  # SGD com Nesterov
# optimizer = optim.RMSprop(model.parameters(), lr=0.001)  # RMSprop
# optimizer = optim.Adagrad(model.parameters(), lr=0.01)  # Adagrad


# Critério de perda
criterion = nn.CrossEntropyLoss()


## Summary

In [7]:
summary(model, (1,28,28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                  [-1, 256]         200,960
           Sigmoid-2                  [-1, 256]               0
            Linear-3                  [-1, 512]         131,584
              ReLU-4                  [-1, 512]               0
            Linear-5                  [-1, 256]         131,328
              ReLU-6                  [-1, 256]               0
            Linear-7                   [-1, 10]           2,570
           Sigmoid-8                   [-1, 10]               0
Total params: 466,442
Trainable params: 466,442
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.02
Params size (MB): 1.78
Estimated Total Size (MB): 1.80
----------------------------------------------------------------


## Training the Model

In [8]:

# Exemplo de uso: Treinamento com conjunto de validação
for epoch in range(1, 11):  # Treinando por 10 épocas
    train_loss = train_epoch(model, train_loader, optimizer, criterion)
    val_loss, val_accuracy = evaluate(model, val_loader, criterion)

    print(f'Epoch {epoch}: Train Loss = {train_loss:.4f}, Val Loss = {val_loss:.4f}, Val Accuracy = {val_accuracy * 100:.2f}%')



Epoch 1: Train Loss = 2.3008, Val Loss = 2.2993, Val Accuracy = 11.29%
Epoch 2: Train Loss = 2.2975, Val Loss = 2.2960, Val Accuracy = 11.29%
Epoch 3: Train Loss = 2.2938, Val Loss = 2.2916, Val Accuracy = 11.29%
Epoch 4: Train Loss = 2.2879, Val Loss = 2.2836, Val Accuracy = 11.41%
Epoch 5: Train Loss = 2.2761, Val Loss = 2.2659, Val Accuracy = 18.31%
Epoch 6: Train Loss = 2.2468, Val Loss = 2.2196, Val Accuracy = 36.20%
Epoch 7: Train Loss = 2.1754, Val Loss = 2.1183, Val Accuracy = 47.81%
Epoch 8: Train Loss = 2.0478, Val Loss = 1.9739, Val Accuracy = 61.26%
Epoch 9: Train Loss = 1.9180, Val Loss = 1.8701, Val Accuracy = 64.98%
Epoch 10: Train Loss = 1.8403, Val Loss = 1.8137, Val Accuracy = 63.77%


In [9]:
# Avaliação final no conjunto de teste
test_loss, test_accuracy = evaluate(model, test_loader, criterion)
print(f'Test Loss = {test_loss:.4f}, Test Accuracy = {test_accuracy * 100:.2f}%')

Test Loss = 1.8114, Test Accuracy = 64.37%


## Exercise

Try other training parameters, following this table:

| Optimizer         | Learning Rate |
|-------------------|---------------|
| SGD               | 0.01          |
| SGD + Momentum    | 0.01          |
| Nesterov Momentum | 0.01          |
| Adagrad           | 0.01          |
| RMSProp           | 0.01          |
| RMSProp           | 0.1           |
| Adam              | 0.01          |
| Adam              | 0.0001        |
| Adam              | 0.1           |
| Adam              | 0.3           |
| SGD               | 0.3           |
| Adagrad           | 0.3           |
| RMSProp           | 0.3           |
