# 4.- Define training process of a DL model as an optimization problem.

Fitting the parameters of our DL model to achieve high performance is achieved through an optimization process called *training*.

1.   Select the model's architecture for the task
2.   Define the error function that will evaluate the performance of our DL model
3.   Choose an optimization algorithm to fit the model's parameters
4.   Run the training loop until convergence


## The DL model

The implementation of the architecture of our model $f_\theta(x)$, that can be a LeNet5, InceptionV3, U-Net, etc.

## The Loss/Error function

This function depends on the task our model is being trained to perform.

This loss/error function is a metric of the distance between the model's output $\hat{y}$ and the expected output $y$.

$Err(\hat{y}, y) = L(\hat{y}, y)$

And the optimization problem is defined in general as

$\theta^* = argmin~Err(\hat{y}=f_\theta(x), y)$

In [1]:
import torch
import torch.nn as nn

criterion = nn.MSELoss() # Mean squared error loss mean((y - y_hat) ** 2) For Regression-like problems

criterion = nn.CrossEntropyLoss() # For multi-class classification sum(-y * log(y_hat))

## The Optimization algorithm

The optimization algorithm is the method that updates the model parameters $\theta$ during the training loop.

In [2]:
#@title A simple example model 
model = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=8, kernel_size=3),
    nn.Flatten(),
    nn.Linear(in_features=8 * 26 * 26, out_features=10)
    )


In [8]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters())

## Run the training loop

In [9]:
#@title a preprocessing pipeline (more about this on Day 2)
from torchvision.datasets import MNIST
from torchvision.transforms import Compose, ToTensor, Normalize

# This will retrieve the images and apzply the pre-processing pipeline
prep_pipeline = Compose([
    ToTensor(),
    Normalize(mean=0.5, std=0.5)
])

trn_data = MNIST("sample_data", train=True, download=True, transform=prep_pipeline)

from torch.utils.data import DataLoader

trn_queue = DataLoader(trn_data, batch_size=128, shuffle=True, pin_memory=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to sample_data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 129484255.81it/s]


Extracting sample_data/MNIST/raw/train-images-idx3-ubyte.gz to sample_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to sample_data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 76619667.19it/s]


Extracting sample_data/MNIST/raw/train-labels-idx1-ubyte.gz to sample_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to sample_data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 41075556.20it/s]

Extracting sample_data/MNIST/raw/t10k-images-idx3-ubyte.gz to sample_data/MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to sample_data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 5333294.73it/s]


Extracting sample_data/MNIST/raw/t10k-labels-idx1-ubyte.gz to sample_data/MNIST/raw



## The training loop

In [10]:
# Move the model to the GPU memory
model.train()

for e in range(10):
  for i, (x, y) in enumerate(trn_queue):
    # Empty the accumulated gradients from any previous iteration
    optimizer.zero_grad()

    # Get the model's output
    y_hat = model(x)

    # Compute the error/loss function
    loss = criterion(y_hat, y)

    # Perform the backward pass to generate the gradients of the loss function with respect to the inputs
    loss.backward()

    # Update the model parameters
    optimizer.step()

    # Log the progress of the model
    if i % 100 == 0:
      acc = torch.sum(y == y_hat.detach().argmax(dim=1)) / x.shape[0]

      print(f"Epoch {e}, step {i}: loss={loss.item()}, acc={acc}")

Epoch 0, step 0: loss=2.45098614692688, acc=0.078125
Epoch 0, step 100: loss=0.4440862536430359, acc=0.84375
Epoch 0, step 200: loss=0.38024723529815674, acc=0.8828125
Epoch 0, step 300: loss=0.3931034207344055, acc=0.875
Epoch 0, step 400: loss=0.3477005660533905, acc=0.8984375
Epoch 1, step 0: loss=0.41891351342201233, acc=0.859375
Epoch 1, step 100: loss=0.3142244219779968, acc=0.9296875
Epoch 1, step 200: loss=0.23581382632255554, acc=0.921875
Epoch 1, step 300: loss=0.3355253040790558, acc=0.9140625
Epoch 1, step 400: loss=0.15423917770385742, acc=0.953125
Epoch 2, step 0: loss=0.2653961479663849, acc=0.953125
Epoch 2, step 100: loss=0.32926279306411743, acc=0.921875
Epoch 2, step 200: loss=0.3378745913505554, acc=0.8671875
Epoch 2, step 300: loss=0.24061369895935059, acc=0.9296875
Epoch 2, step 400: loss=0.33357828855514526, acc=0.921875
Epoch 3, step 0: loss=0.28853321075439453, acc=0.921875
Epoch 3, step 100: loss=0.2998126149177551, acc=0.9140625
Epoch 3, step 200: loss=0.2615