# 4.- Define training process of a DL model as an optimization problem.

Fitting the parameters of our DL model to achieve high performance is achieved through an optimization process called *training*.

1.   Select the model's architecture for the task
2.   Define the error function that will evaluate the performance of our DL model
3.   Choose an optimization algorithm to fit the model's parameters
4.   Run the training loop until convergence


## The DL model

The implementation of the architecture of our model $f_\theta(x)$, that can be a LeNet5, InceptionV3, U-Net, etc.

## The Loss/Error function

This function depends on the task our model is being trained to perform.

This loss/error function is a metric of the distance between the model's output $\hat{y}$ and the expected output $y$.

$Err(\hat{y}, y) = L(\hat{y}, y)$

And the optimization problem is defined in general as

$\theta^* = argmin~Err(\hat{y}=f_\theta(x), y)$

In [None]:
import torch
import torch.nn as nn


## The Optimization algorithm

The optimization algorithm is the method that updates the model parameters $\theta$ during the training loop.

In [None]:
#@title A simple example model 
model = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=8, kernel_size=3),
    nn.Flatten(),
    nn.Linear(in_features=8 * 26 * 26, out_features=10)
    )


In [None]:
import torch.optim as optim


## Run the training loop

In [None]:
#@title a preprocessing pipeline (more about this on Day 2)
from torchvision.datasets import MNIST
from torchvision.transforms import Compose, ToTensor, Normalize

# This will retrieve the images and apzply the pre-processing pipeline
prep_pipeline = Compose([
    ToTensor(),
    Normalize(mean=0.5, std=0.5)
])

trn_data = MNIST("sample_data", train=True, download=True, transform=prep_pipeline)

from torch.utils.data import DataLoader

trn_queue = DataLoader(trn_data, batch_size=128, shuffle=True, pin_memory=True)

## The training loop

In [None]:
# Move the model to the GPU memory
model.train()

for e in range(10):
  for i, (x, y) in enumerate(trn_queue):
    # Empty the accumulated gradients from any previous iteration


    # Get the model's output


    # Compute the error/loss function between y_hat, and y


    # Perform the backward pass to generate the gradients of the loss function with respect to the inputs


    # Update the model parameters


    # Log the progress of the model
    if i % 100 == 0:
      acc = torch.sum(y == y_hat.detach().argmax(dim=1)) / x.shape[0]

      print(f"Epoch {e}, step {i}: loss={loss.item()}, acc={acc}")