<center> </center>

<center><font size=5 face="Helvetica" color=#EE4B2B><b>
Pytorch Tutorial: Optimizing Model Parameters
</b></font></center>

<center><font face="Helvetica" size=3><b>Ang Chen</b></font></center>
<center><font face="Helvetica" size=3>July, 2024</font></center>

***

Now that we have a model and data it's time to train, validate and test our model by optimizing its parameters on our data.
Training model is an iterative process;
in each iteration, the model makes a guess about the output, calculates the error in its guess (loss), collects the derivatives of the error with respect to its parameters, and **optimizes** these parameters using gradient descent. 
For a more detailed walkthrough of this process, check out the vedio on [backpropagation from 3Blue1Brown](https://www.youtube.com/watch?v=tIeHLnjs5U8).

# Prerequisite Code

We load some code from the previous sections.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

# Hyperparameters

Hyperparameters are adjustable parameters that let you control the model optimization process.
Different hyperparameter values can impact model training and convergence rates.

We define the following hyperparameters for training:
 * **Number of Epochs** - the number times to iterate over the dataset
 * **Batch Size** - the number of data samples propagated through the network before the parameters are updated
 * **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may results in unpredictable behavior during training.

In [1]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

# Optimization Loop

Once we set our hyperparameters, we can then train and optimize our model with an opptimization loop.
Each iteration of the optimization loop is called an **epoch**.

Each epoch consists of two main parts:
 * **The Train Loop** - iteration over the training dataset and try to converge to optimial parameters.
 * **The Validation/Test Loop** - iteration over the test dataset to check if model performance is improving.

Let's briefly familiarize ourselves with some of the concepts used in the training loop.

# Loss Function

When presented with some training data, our 