<a href="https://colab.research.google.com/github/Renan-Domingues/LearnTheBasics-Pytorch/blob/main/Tutorials_06_OptimizationLoop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Optimizing Model Parameters
Training a model is a iterative process, in each iteration a model makes a guess about the output, calculate the error in its guess (loss), collects the derivatives od the error with respect to its parameters (in the autograd), and OPTIMIZES these parameters with gradient descent

Backpropagation (this process) video = https://www.youtube.com/watch?v=tIeHLnjs5U8

In [None]:
# from the Dataset & Dataloaders and Buid a Model section we will use these prerequisite code

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64) # DataLoader separa os dados  em batches
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork (nn.Module):
  def __init__(self):
    super().__init__()
    self.flatten = nn.Flatten()
    self.linear_relu_stack = nn.Sequential(
        nn.Linear(28*28, 512),
        nn.ReLU(),
        nn.Linear(512, 512),
        nn.ReLU(),
        nn.Linear(512, 10),
    )
  def forward(self, x):
    x = self.flatten(x)
    logits = self.linear_relu_stack(x)
    return logits

model = NeuralNetwork()

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:01<00:00, 16146618.68it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 271967.88it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:00<00:00, 5079079.45it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 15853360.49it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






### Hyperparameters
Are adjustable parameters theat let you control the model optimization process.
Hyperparameter values can impact model training and convergence rates


we define the following hyperparameters for training:
- Number of epochs = the number of times to iterate over the dataset
- Batch Size = the number of data samples propagated through the network before the parameters are update
- Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.

In [None]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

### Optimization Loop
Once we set our hyperparameters, we can train and optimize our model with the optimization loop.
Each interation of the loop is called an epoch.

Each epoch consists in 2 main parts:
- The training loop = iterate over the training dataset and try to converge to optimal parameters.
- the validation/test loop = iterate over the test dataset to check if model performace is improving

### Loss Function
When presenting with the training data, our untrained network is likely  not to give a correct answer.
The LOSS FUNCTION mesures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training.
To calculate the loss, we make predctions using inputs of our given data sample and compare against the true data label value

Common loss function
nn.MSELoss = https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss
nn.NLLLoss = https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss
nn.CrossEntropyLoss (combines nn.LogSoftmax and nn.NLLLoss) = https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss

In [None]:
'''
We pass our model's output logits to nn.CrossEntropyLoss, which will normalize the logits and compute prediction error.
'''

loss_fn = nn.CrossEntropyLoss()

### Optimizer
It adjust the model parameters to reduce model error in each training step.
``Optimization algorithms`` define how this process os performed (in this example we use Stochastic Gradient Descent).
All the optimization logic is encapsulated in the optimizer object. Here we use the SGD optimizer

We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.

In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:
- Call optimizer.zero_grad() to resset the gradients of model parameters. (Gradients by default add up, to precent double-couting, we explicitly zero them at each interation)
- Backpropagate the prediction loss with a call to loss.backward(). Pytorch deposits the gradients od the loss w.r.t each parameter.
- Once we have our gradients, we call optimizer.step() to adjust parameters by the gradients collected in the backward pass.

### Full Implamentation
We define train_loop that loops over out optimization code, and test_loop that evaluates the model's performance against our test data.

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer):
  size = len(dataloader.dataset)

  model.train()
  for batch, (X, y) in enumerate(dataloader):
    # Compute prediction and loss
    pred = model(X)
    loss = loss_fn(pred, y)

    # Backpropagation
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    if batch % 100 == 0:
      loss, current = loss.item(), (batch + 1) * len(X)
      print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

def test_loop(dataloader, model, loss_fn):
  # set the model to evaluation mode
  model.eval()
  size = len(dataloader.dataset)
  num_batches = len(dataloader)
  test_loss, correct = 0, 0

  # Evaluating the model with torch.no_grad() ensures that no gradients are computed during the test mode
  with torch.no_grad():
    for X, y in dataloader:
      pred = model(X)
      test_loss += loss_fn(pred, y).item()
      correct += (pred.argmax(1) == y).type(torch.float).sum().item()

  test_loss /= num_batches
  correct /= size

  print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to train_loop and test_loop.

In [None]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
  print(f"Epoch {t+1}\n---------------------")
  train_loop(train_dataloader, model, loss_fn, optimizer)
  test_loop(test_dataloader, model, loss_fn)
  print("Done!")

Epoch 1
---------------------
loss: 0.762102 [   64/60000]
loss: 0.839100 [ 6464/60000]
loss: 0.609797 [12864/60000]
loss: 0.813437 [19264/60000]
loss: 0.709475 [25664/60000]
loss: 0.716383 [32064/60000]
loss: 0.782769 [38464/60000]
loss: 0.770245 [44864/60000]
loss: 0.764624 [51264/60000]
loss: 0.748785 [57664/60000]
Test Error: 
 Accuracy: 73.1%, Avg loss: 0.737093 

Done!
Epoch 2
---------------------
loss: 0.730493 [   64/60000]
loss: 0.812447 [ 6464/60000]
loss: 0.584114 [12864/60000]
loss: 0.793457 [19264/60000]
loss: 0.691530 [25664/60000]
loss: 0.696586 [32064/60000]
loss: 0.759712 [38464/60000]
loss: 0.756886 [44864/60000]
loss: 0.746206 [51264/60000]
loss: 0.729513 [57664/60000]
Test Error: 
 Accuracy: 74.1%, Avg loss: 0.717888 

Done!
Epoch 3
---------------------
loss: 0.702240 [   64/60000]
loss: 0.787967 [ 6464/60000]
loss: 0.561694 [12864/60000]
loss: 0.775916 [19264/60000]
loss: 0.675603 [25664/60000]
loss: 0.679660 [32064/60000]
loss: 0.738291 [38464/60000]
loss: 0.744