<a href="https://colab.research.google.com/github/Tensor-Reloaded/AI-Learning-Hub/blob/main/resources/beginner_pytorch/04_optimizers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 04. Optimizers

In [1]:
import torch

from tqdm import tqdm
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import v2

In [2]:
class Trainer:
    def __init__(
            self,
            model: nn.Module,
            optimizer: torch.optim.Optimizer,
            criterion: nn.Module,
            batch_size: int = 64,
            val_batch_size: int = 500,
            use_cpu: bool = False,
    ):
        self.batch_size = batch_size
        self.val_batch_size = val_batch_size  # We can use a bigger batch size for validation

        self.device = torch.device("cpu") if use_cpu else torch.accelerator.current_accelerator()
        # The current accelerator automically detects CUDA/MPS/CPU
        print(f"Using device: {self.device}")

        transforms = v2.Compose([
            v2.ToImage(),
            v2.ToDtype(torch.float32, scale=True),
            v2.Normalize([0.5], [0.5]),
            torch.flatten,
        ])

        train_set = datasets.MNIST(root='./data', train=True, transform=transforms, download=True)
        val_set = datasets.MNIST(root='./data', train=False, transform=transforms, download=True)
        self.train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
        self.val_loader = DataLoader(val_set, batch_size=val_batch_size, shuffle=False)
        # We don't need to shuffle the validation set

        self.model = model.to(self.device)  # The model must be on the same device
        self.criterion = criterion.to(self.device)  # Required for some loss functions
        self.optimizer = optimizer


    def train(self):
        self.model.train()

        total = 0
        correct = 0
        total_loss = 0

        for data, target in tqdm(self.train_loader, desc="Training", leave=False, disable=True):  # Disable on notebook
            # We must move the data to the same device as the model
            data = data.to(self.device)
            target = target.to(self.device)
            # We can also use non_blocking=True to speed up the transfer for large tensors
            # data = data.to(self.device, non_blocking=True)
            # but this is useful only for pinned memory transfers (CPU-to-GPU)
            # In most cases, the improvement is negligible

            predicted = self.model(data)
            loss = self.criterion(predicted, target)
            loss.backward()

            self.optimizer.step()
            self.optimizer.zero_grad()

            correct += (predicted.argmax(dim=1) == target).sum().item()
            total += data.size(0)
            total_loss += loss.item() * data.size(0)

        return total_loss / total, correct / total

    # @torch.no_grad()  # This is what you usually see in tutorials
    @torch.inference_mode()  # This is the recommended way to do this
    def val(self):
        self.model.eval()

        total = 0
        correct = 0
        total_loss = 0

        for data, target in tqdm(self.val_loader, desc="Validation", leave=False, disable=True):  # Disable on notebook
            data = data.to(self.device)
            target = target.to(self.device)

            predicted = self.model(data)
            loss = self.criterion(predicted, target)

            correct += (predicted.argmax(dim=1) == target).sum().item()
            total += data.size(0)
            total_loss += loss.item() * data.size(0)

        return total_loss / total, correct / total

    def run(self, epochs: int):
        print(f"Running {epochs} epochs")
        with tqdm(range(epochs), desc="Training") as pbar:
            for _ in pbar:
                tr_loss, tr_acc = self.train()
                va_loss, va_acc = self.val()
                pbar.set_postfix(train_loss=tr_loss, train_acc=tr_acc, val_loss=va_loss, val_acc=va_acc)
        print("Last validation accuracy: ", va_acc)
        print()

In [3]:
def main(epochs: int, optimizer: str):
    print(f"Running {epochs} epochs with {optimizer} optimizer")

    model = nn.Sequential(
        nn.Linear(28 * 28, 16),
        nn.ReLU(inplace=True),
        nn.Linear(16, 10),
    )
    if optimizer == "sgd":
        optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    elif optimizer == "sgd_momentum":
        optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
    elif optimizer == "sgd_momentum_nesterov":
        optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, nesterov=True)
    elif optimizer == "sgd_momentum_nesterov_weight_decay":
        optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, nesterov=True, weight_decay=0.001)
    elif optimizer == "adam":
        optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    elif optimizer == "adamw":
        optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
    elif optimizer == "rmsprop":
        optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)
    else:
        raise NotImplementedError(f"Optimizer {optimizer} not implemented")

    trainer = Trainer(model, optimizer, nn.CrossEntropyLoss())
    trainer.run(epochs)

Recommended resources:
* https://emiliendupont.github.io/2018/01/24/optimization-visualization/
* The official documentation for each optimizer

In [4]:
if __name__ == '__main__':
    main(10, "sgd")
    main(10, "sgd_momentum")
    main(10, "sgd_momentum_nesterov")
    main(10, "sgd_momentum_nesterov_weight_decay")
    main(10, "adam")
    main(10, "adamw")
    main(10, "rmsprop")

# Engineering: Why do you think the training is so slow? Can you make it faster?
# Science: Why do you think the results are not better? What can we do to improve them?

Running 10 epochs with sgd optimizer
Using device: cuda
Running 10 epochs


Training: 100%|██████████| 10/10 [03:25<00:00, 20.59s/it, train_acc=0.924, train_loss=0.266, val_acc=0.926, val_loss=0.261]


Last validation accuracy:  0.9257

Running 10 epochs with sgd_momentum optimizer
Using device: cuda
Running 10 epochs


Training: 100%|██████████| 10/10 [03:27<00:00, 20.70s/it, train_acc=0.926, train_loss=0.254, val_acc=0.929, val_loss=0.243]


Last validation accuracy:  0.9292

Running 10 epochs with sgd_momentum_nesterov optimizer
Using device: cuda
Running 10 epochs


Training: 100%|██████████| 10/10 [03:27<00:00, 20.76s/it, train_acc=0.926, train_loss=0.263, val_acc=0.927, val_loss=0.257]


Last validation accuracy:  0.9266

Running 10 epochs with sgd_momentum_nesterov_weight_decay optimizer
Using device: cuda
Running 10 epochs


Training: 100%|██████████| 10/10 [03:26<00:00, 20.65s/it, train_acc=0.923, train_loss=0.271, val_acc=0.922, val_loss=0.269]


Last validation accuracy:  0.922

Running 10 epochs with adam optimizer
Using device: cuda
Running 10 epochs


Training: 100%|██████████| 10/10 [03:27<00:00, 20.75s/it, train_acc=0.933, train_loss=0.228, val_acc=0.933, val_loss=0.228]


Last validation accuracy:  0.933

Running 10 epochs with adamw optimizer
Using device: cuda
Running 10 epochs


Training: 100%|██████████| 10/10 [03:26<00:00, 20.70s/it, train_acc=0.93, train_loss=0.24, val_acc=0.93, val_loss=0.24]


Last validation accuracy:  0.9299

Running 10 epochs with rmsprop optimizer
Using device: cuda
Running 10 epochs


Training: 100%|██████████| 10/10 [03:26<00:00, 20.60s/it, train_acc=0.883, train_loss=0.4, val_acc=0.872, val_loss=0.42]

Last validation accuracy:  0.8716






## Excercises:

You may start with excercise 3 if 1 and 2 prove to be too difficult. Implementing your own pipeline might bring you closer to the solution.

1. Modify this pipeline in order to improve the training speed. Test various hypotheses. Starting with a simple pipeline allows you to make more focused progress in understanding the bottlenecks. Measuring the is the most important part of the process. Python multi-threading/multi-processing is not the right answer.
2. Focus on increasing the accuracy of the model. You should be able to easily get 95% accuracy on the validation set.
3. Implement your own pipeline from scratch. Implementing it yourself provides a better understanding. Do not be shackeled by this model.
4. Try to get 98% accuracy on the validation set with all optimizers, using this pipeline. It is not very complicated.
5. Try to get 98% with your pipeline.

> !!! **Do not make training decisions based on the validation data! Otherwise you will risk overfitting!**

---


| All     | [beginner_pytorch/](https://github.com/Tensor-Reloaded/AI-Learning-Hub/blob/main/resources/beginner_pytorch) |
|---------|-- |
| Prev    | [Simple Training](https://github.com/Tensor-Reloaded/AI-Learning-Hub/blob/main/resources/beginner_pytorch/03_simple_training.ipynb) |
| Current | [Optimizers](https://github.com/Tensor-Reloaded/AI-Learning-Hub/blob/main/resources/beginner_pytorch/04_optimizers.ipynb) |
| Next    | [LR Schedulers](https://github.com/Tensor-Reloaded/AI-Learning-Hub/blob/main/resources/beginner_pytorch/05_lr_schedulers.ipynb) |