# Optimizing Model Parameters

We are going to combine, everything we have done till now. We have loaded the Data and have defined the Neural Network. We also saw how to do `partial differentiation` with parameters. 

We are going to train the model. 

> Training model is an iterative process, which involves model making a guess about the output, calcualtes the error in the guess (loss), collects derivatives of error with respect to its parameters and optimizes the parameters with gradient descent.

In [1]:
# Importing Libraries
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import Compose, Resize, ToTensor

In [2]:
transformations = Compose([
    Resize([28,28]),
    ToTensor()
])

In [3]:
device = "cpu"

In [4]:
training_data = datasets.OxfordIIITPet(
    root="data",
    split="trainval",
    download=True,
    transform= transformations
    )

testing_data = datasets.OxfordIIITPet(
    root = "data",
    split = "test",
    download = True,
    transform = transformations
)

In [5]:
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(testing_data, batch_size=64)

In [6]:
class PetNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_stack = nn.Sequential(
            nn.Linear(28*28*3, 512),
            nn.ReLU(),
            nn.Linear(512,512),
            nn.ReLU(),
            nn.Linear(512, 37),
        )
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_stack(x)
        return logits

In [7]:
model = PetNeuralNetwork()

In [8]:
model.to(device)

PetNeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_stack): Sequential(
    (0): Linear(in_features=2352, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=37, bias=True)
  )
)

## Hyperparameter Tuning
> Hyperparameters are adjustable parameters, can be tuned to increase model performance, training and convergance rates.

Hyperparameters for Training:
1. Number of Epochs - Iteration times
2. Batch Size - Selection of input data before updation of parameters is based on `batch_size`
3. Learning Rate - How much to update model parameters at each epoch/batch.

In [9]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Optimization Loop
Each iteration of training is called the `epoch`.

During each epoch:
1. The Train Loop - Iterate over training dataset and try to converge to optimal parameters
2. The Validation/Test Loop - Iterate over test dataset to check if model performance is improving.

### Loss Function
It measures the degree of dissimilarity between the obtained result and the target value. We want to minimize this. 

Common Loss Functions:
1. `nn.MSELoss` - Mean Squared Error Loss for Regression Tasks
2. `nn.NLLLoss` - Negative Log Likelihood Loss for Classification Tasks
3. `nn.CrossEntropyLoss` - Combination of `nn.Softmax` and `nn.NLLLoss`

In [10]:
loss_fn = nn.CrossEntropyLoss()
loss_fn.to(device)

CrossEntropyLoss()

### Optimizer
Optimization is the process of adjusting model parameters to reduce model error during training steps.  

In [11]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Optimization happens in three steps for each iteration:
1. Calls `optimizer.zero_grad()` - resets all gradients of model parameters to zero, to avoid double-counting.
2. Backpropagate the prediction loss with `loss.backward()`. PyTorch deposits the gradients of the loss with respect to each parameter
3. After getting gradients, we call `optimizer.step()` to adjust parameters by the gradients collected in backward pass.

## Everything Together

In [12]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Setting the model in train mode - Important for Batch Normalization and Dropout Layers
    model.train()
    for batch, (X,y) in enumerate(dataloader):
        # Calculate Prediction and Loss
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        if batch %100 == 0:
            loss, current = loss.item(), (batch+1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

def test_loop(dataloader, model, loss_fn):
    # Setting the model in test mode - Important for Batch Normalization and Dropout Layers
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0
    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    with torch.no_grad():
        for X,y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [None]:
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!!!!")

Epoch 1
-------------------------------
loss: 3.571516  [   64/ 3680]
Test Error: 
 Accuracy: 2.8%, Avg loss: 3.610824 

Epoch 2
-------------------------------
loss: 3.571309  [   64/ 3680]
Test Error: 
 Accuracy: 2.8%, Avg loss: 3.610376 

Epoch 3
-------------------------------
loss: 3.570958  [   64/ 3680]
