## Original Reference : [View Source Code](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/03_pytorch_computer_vision.ipynb)

Checkout [using Google Colab with GitHub](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb) for additional information about their integration.

The [pytorch cheat sheet](https://pytorch.org/tutorials/beginner/ptcheat.html) will be useful in this learning process if you need a quick reference.

---
[PyTorch](https://pytorch.org/) is an open-source machine learning framework that allows you to write your own neural networks and optimize them efficiently.
- PyTorch is well established, has a large developer community, and is very flexible and especially used in research.
- PyTorch is not the only framework of this kind. Alternatives to PyTorch include [TensorFlow](https://www.tensorflow.org/), [JAX](https://github.com/google/jax#quickstart-colab-in-the-cloud) and [Caffe](http://caffe.berkeleyvision.org/).
- Once you have a deep knowledge of one machine learning framework, it is very easy to learn other frameworks, as many frameworks share the same concepts and ideas.

In [None]:
# Import PyTorch
import torch
from torch import nn

import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor

import numpy as np
import matplotlib.pyplot as plt

from timeit import default_timer as timer
from tqdm.auto import tqdm

# Check versions
# Note: your PyTorch version shouldn't be lower than 1.10.0 and torchvision version shouldn't be lower than 0.11
print(f"PyTorch version: {torch.__version__}\ntorchvision version: {torchvision.__version__}")

**Randon Seed**:
PyTorch provides stochastic features, however, a very good practice is to set up your code to be reproducible using exactly the same random numbers.
Let's set the random seeds as shown below.

In [None]:
torch.manual_seed(37) # Setting the seed

### 0.1. Tensors

Tensors are the PyTorch equivalent of Numpy arrays and also support GPU acceleration.
The name "tensor" is a generalization of the concept, for example, vectors are one-dimensional tensors and matrices are two-dimensional tensors.
When working with neural networks, we use tensors of various shapes and number of dimensions.
Numpy arrays are very similar to tensors, so we can convert most tensors to numpy arrays and vice versa.

#### (1) Initialization
First, let's look at the different ways to create tensors.
There are many possible options, and the simplest way is to call `torch.Tensor`, passing the desired shape as an input argument.
The `torch.Tensor` allocates memory for the desired tensor, but reuses any values that have already been in the memory.

In [None]:
x = torch.Tensor(1, 2, 3)
print(x)

You can get the shape of a tensor in the same way as numpy, `x.shape`, or using the `.size` method.

In [None]:
shape = x.shape # x.size()
print("Shape:", x.shape)


Alternatively, there are various options to directly assign values to the tensor during initialization:
* `torch.Tensor` (input list): Creates a tensor from the list you provide.
* `torch.zeros`: Creates a tensor filled with zeros.
* `torch.ones`: Creates a tensor filled with ones.
* `torch.rand`: Creates a tensor with random values sampled uniformly between 0 and 1.
* `torch.randn`: Creates a tensor with random values sampled from a normal distribution with mean 0 and variance 1.
* `torch.arange`: Creates a tensor containing the values $N,N+1,N+2,...,M$

In [None]:
# Create a tensor from a (nested) list
x = torch.Tensor([[3., 5., 2.], [7., 1., 9.]])
print(x, "\nShape:", x.shape)

In [None]:
# Try It Out!
'''
1. Create a tensor filled with the scalar value 1 with the shape [1, 2, 7]
2. Check the shape of the tensor
'''

"""Type Your Answer Here"""

#### Tensor to Numpy, and Numpy to Tensor

In general, tensors can be converted to numpy arrays, and numpy arrays can be converted back to tensors:
- To convert a numpy array to a tensor, we can use the `torch.from_numpy` function.
- To transform a PyTorch tensor back to a numpy array, we can use the `.numpy()` on tensors.

In [None]:
np_arr = np.array([[1, 2], [3, 4]])
tensor = torch.from_numpy(np_arr)
print("Numpy -> PyTorch tensor:\n", tensor)
np_arr = tensor.numpy()
print("PyTorch -> Numpy array:\n", np_arr)

Another common operation is to change the shape of a tensor.
A tensor can be re-organized to a different shape with the same number of elements, e.g. a tensor of size (6) or (3,2).
In PyTorch, this operation is called a `view`.

In [None]:
# Create a 1-D tensor of size 6
x = torch.arange(6)
print(x, "\nShape:", x.shape)
x = x.view(3, 2)
print(x, "\nShape:", x.shape)

### 0.2. Dynamic Computation Graph and Backpropagation

PyTorch serves as a preferred framework for deep learning projects primarily because it offers automatic differentiation, allowing us to effortlessly obtain **gradients** or **derivatives** of the functions we define.

Imagine we have some data as input $\mathbf{x}$.
As we manipulate our input, Pytorch automatically generates a **computational graph**; this means PyTorch will keep track of the graph for us.
So all we need to do is calculate the **output**.
we can then ask PyTorch to automatically get the **gradient** for us.

> **Note:  Why do we want gradients?** Consider that we have defined a function, a neural net, that is supposed to compute a certain output $y$ for an input vector $\mathbf{x}$. We then define an **error measure** that tells us how wrong our network is, and how bad it is in predicting output $y$ from input $\mathbf{x}$. Based on this error measure, we can use the gradients to **update** the weights $\mathbf{W}$ that were responsible for the output, so that the next time we present input $\mathbf{x}$ to our network, the output will be closer to what we want (sources: [[1]](https://www.kdnuggets.com/2020/05/5-concepts-gradient-descent-cost-function.html), [[2]](https://colab.research.google.com/github/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/tutorial2/Introduction_to_PyTorch.ipynb#scrollTo=Dr4ENWdTHNA-)).

The first thing we need to do is to specify which tensors require gradients.
By default, no gradients are needed when creating a tensor.

In [None]:
x = torch.ones((3,))
print(x.requires_grad)

We can change this for an existing tensor using the function `requires_grad_()`.
Alternatively, we can pass the argument `requires_grad=True` when creating a tensor.

In [None]:
x.requires_grad_(True)
print(x.requires_grad)
z = torch.tensor([[1., -1.], [1., 1.]], requires_grad=True)
print(z)

To get familiar with the concept of computational graphs, let's create a graph for the following function:

$$y = \frac{1}{n(x)}\sum_i \left[(x_i + 2)^2 + 3\right],$$

where $n(x)$ denotes the number of elements in $x$, i.e., we are taking a mean operation within the sum.
You could imagine that $x$ are our parameters, and we want to optimize (either maximize or minimize) the output $y$.
For this, we want to obtain the gradients $\partial y / \partial \mathbf{x}$.

For example, assume our input is $\mathbf{x}=[0,1,2]$:

In [None]:
x = torch.arange(3, dtype=torch.float32, requires_grad=True) # Only float tensors can have gradients
print("x = ", x)

Then let's build the computational graph step by step.
We can combine multiple operations on a single line, but we'll separate them here so that we can better understand how they add to the computation graph.

In [None]:
a = x + 2
print("a =", a)
b = a ** 2
print("b = ", b)
c = b + 3
print("c = ", c)
y = c.mean()
print("y = ", y)

Here, We calculate $a$ based on the input $x$ and the constant $2$.
Next, $b$ is $a$ squared, and so on.

Each node in the computational graph automatically defined a function that computes the gradient with respect to its inputs, `grad_fn`.
Then we can perform backpropagation on the computational graph by calling the `backward()` on the final output.
This function effectively computes the gradient for each tensor that has the property `requires_grad=True`.

In [None]:
y.backward()

In short, `x.grad` now contains the gradient $\partial y/ \partial \mathcal{x}$, which represents how changes in $\mathbf{x}$ affect the output $y$ given the current input $\mathbf{x}=[0,1,2]$:

In [None]:
print(x.grad)

In [None]:
### Exercise
# Create a simple linear function : out = weight @ inp + bias, where weight and bias are trainable "

# weight = ...
# bias = ...

# inp = ...
# out = ...

# Check the gradient of the output w.r.t. to the weights and bias at the inp value
# print(weight.grad, bias.grad)

# PyTorch Computer Vision

Computer vision is the art of teaching a computer to see.

![example computer vision problems](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/03-computer-vision-problems.png)
*Example computer vision problems for binary classification, multiclass classification, object detection and segmentation.*

## What we're going to cover

![a PyTorch workflow with a computer vision focus](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/03-pytorch-computer-vision-workflow.png)

## 1. Getting a dataset

`torchvision.datasets` contains a lot of example datasets you can use to practice writing computer vision code on. **FashionMNIST** is one of those datasets (https://github.com/zalandoresearch/fashion-mnist).

The FashionMNIST dataset is a collection of **grayscale images** representing various fashion items such as t-shirts, trousers, and shoes.
And since it has 10 different image classes (different types of clothing), it's a multi-class classification problem.

We'll be building a computer vision neural network to identify the different styles of clothing in these images.

In [None]:
# Download and prepare the `training` dataset
train_data = datasets.FashionMNIST(
    root="data",  # Directory where the dataset will be saved.
    train=True,  # Specify that we want the training split of the dataset.
    download=True,  # Download the dataset if it doesn't exist on disk.
    transform=ToTensor(),  # Images come as PIL(Python Imaging Library) format. Convert the PIL Image to a PyTorch tensor.
    target_transform=None  # We can transform the targets (labels) if we'd like too.
)

# Download and prepare the `testing` dataset
# Training on one dataset and testing on another ensures the model performs well on `unseen data`.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,  # Specify that we want the test split of the dataset.
    download=True,
    transform=ToTensor()
)

### Input and output shapes of a computer vision model

Here, the **input is an image** represented as a tensor (a multi-dimensional numerical array), while **the output is a single numerical value** that corresponds to a label (e.g., a category such as "T-shirt" or "Sneaker").

Since we are using the *FashionMNIST* dataset, the images are **grayscale** and have a resolution of **28×28 pixels**.
The labels are **integer values (0-9)** representing different clothing categories.

Let's see the image shape.

In [None]:
# Retrieve the first image-label pair from the training dataset
image, label = train_data[0]

# Print the raw values of image and label
print("Image tensor:", image)
print("Label:", label)

In [None]:
### Exercise: Print the shape of the image tensor and the label
#print("Image Shape:", ...)  # Expected output: (1, 28, 28) → (Channels, Height, Width)
#print("Label:", ...)  # Expected output: an integer (0-9)

The shape of the image tensor is:

```
[color_channels=1, height=28, width=28]
```

Having `color_channels=1` means the image is grayscale.

![example input and output shapes of the fashionMNIST problem](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/03-computer-vision-input-and-output-shapes.png)
*Various problems will have various input and output shapes. But the premise remains: encode data into numbers, build a model to find patterns in those numbers, convert those patterns into something meaningful.*

If `color_channels=3`, the image comes in pixel values for red, green and blue (RGB).

The order of our current tensor is often referred to as `CHW` (Color Channels, Height, Width).

> **Note:**
Often, images are processed in **batches**, as loading and processing all data at once is memory-intensive.
The batch dimension (`N`) represents **the number of images in a batch**.
You'll also see `NCHW` and `NHWC` formats, e.g., if you have a `batch_size=32`, your tensor shape may be `[32, 1, 28, 28]`. We'll cover batch sizes later.

In [None]:
# Check the number of training samples in the dataset
num_train_samples = len(train_data.data)
print("Number of training samples:", num_train_samples)

In [None]:
### Exercise: Check the number of test samples
#num_test_samples = ...
#print("Number of test samples:", num_test_samples)  # Expected output: 10000

In [None]:
# Retrieve class names (categories of fashion items)
class_names = train_data.classes
print("FashionMNIST Class Names:", class_names)

### Visualizing our data

In [None]:
torch.manual_seed(42)  # Set a manual seed for reproducibility

fig = plt.figure(figsize=(9, 9))
rows, cols = 4, 4  # 4x4 grid to show 16 random images
for i in range(1, rows * cols + 1):
    random_idx = torch.randint(0, len(train_data), size=[1]).item()  # Select a random index from the training dataset
    img, label = train_data[random_idx]  # Retrieve the corresponding image and label
    fig.add_subplot(rows, cols, i)  # Add a subplot for the current image
    plt.imshow(img.squeeze(), cmap="gray")  # Display the image (remove the extra dimension using .squeeze())
    plt.title(label)  # Print the number of the label

    ### Exercise: Print the class names instead of the number as the title
    #plt.title(...)  # Expected output: class name (e.g., "T-shirt/top", "Sneaker")

    plt.axis(False)  # Remove axis ticks for better visualization

### Visualizing our data

The next step is to prepare the dataset with a [`torch.utils.data.DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or `DataLoader` for short, which
- **Splits the dataset** into smaller chunks (mini-batches).
- **Shuffles data** during training to improve generalization.
- **Iterates through data** efficiently during training and evaluation.

These smaller chunks are called **batches** or **mini-batches** and can be set by the `batch_size` parameter.

With **mini-batches** (small portions of the data), gradient descent is performed more often per epoch (once per mini-batch rather than once per epoch).

In [None]:
from torch.utils.data import DataLoader

# Set the batch size (number of samples per mini-batch)
BATCH_SIZE = 32

# Create a DataLoader for the training dataset
train_dataloader = DataLoader(
    train_data,  # Dataset to be turned into iterable
    batch_size=BATCH_SIZE,  # Number of samples per batch
    shuffle=True  # Shuffle data every epoch to improve learning
)

# Print DataLoader information
print(f"Length of train dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}")
print("Number of training samples:", len(train_dataloader)*BATCH_SIZE)

In [None]:
### Exercise: Create test DataLoader with the same batch size but shuffle=False
#test_dataloader = DataLoader(
#    ...,  # Dataset to be turned into iterable
#    ...,  # Number of samples per batch
#    ...   # No shuffling for test data (keeps results consistent)
#)
#
## Print DataLoader information
#print(f"Length of test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}")  # Expected output: 313 batches of 32
#print("Number of training samples:", len(test_dataloader)*BATCH_SIZE)  # since 10000 / 32 = 312.5, PyTorch rounds up and includes an extra batch.

In [None]:
# Check out what's inside the training dataloader
train_features_batch, train_labels_batch = next(iter(train_dataloader))  # Retrieve the first mini-batch

# Print batch shapes to understand their structure
print("Train Features Batch Shape:", train_features_batch.shape)
print("Train Labels Batch Shape:", train_labels_batch.shape)

### Setup device-agnostic (CPU or GPU) code for GPU Acceleration
Deep learning models can be computationally intensive, and using a GPU can significantly speed up training.
Let's setup some [device-agnostic code](https://pytorch.org/docs/stable/notes/cuda.html#best-practices) for our models and data to run on GPU if it's available.

If you're running this notebook on Google Colab, and you don't a GPU turned on yet, turn one on via `Runtime -> Change runtime type -> Hardware accelerator -> GPU`. If you do this, your runtime will likely reset and you'll have to run all of the cells above by going `Runtime -> Run before`.

In [None]:
# Setup device-agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"

# Print the selected device
print(f"Using device: {device}")  # Expected output: "cuda" (if GPU is available) or "cpu"

## 2. Model 1: Build a baseline model

Time to build a **baseline model** by subclassing `nn.Module`.
This baseline model provides a simple starting point for comparison with more complex architectures.

Our baseline will consist of two [`nn.Linear()`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) layers after the [`nn.Flatten()`](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) layer, along with non-linear functions (`nn.ReLU()`) in between each linear layer.

`nn.Flatten()` flattens the image (from 2D pixels to a 1D vector), i.e., converts multi-dimensional image tensors into one long vector. Let's check how it works.

In [None]:
# Create a flatten layer
flatten_model = nn.Flatten()  # it works like a nn model that can perform a forward pass

# Get a single sample image tensor from the batch
x = train_features_batch[0]

# Print shapes before flattening
print(f"Shape before flattening: {x.shape} -> [color_channels, height, width]")

# Flatten the sample
output = flatten_model(x)  # Perform forward pass

In [None]:
### Exercise: Print shape after the forward pass
#print(f"Shape after flattening: {...}")  # Expected output: (1, 784) → A single vector with 28×28=784 elements

#### Building the Neural Network

Now that we understand flattening, we can now convert our pixel data into one long **feature vector** for the `nn.Linear()` layers.

Let's create the neural network model and instantiate it.

In [None]:
# Define a simple neural network using PyTorch's nn.Module
class FashionMNISTModelV1(nn.Module):
    def __init__(self,
                 input_shape: int,
                 hidden_units: int,
                 output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(),  # Flatten inputs into a single vector (from (1, 28, 28) to (784,))
            nn.Linear(in_features=input_shape, out_features=hidden_units),  # First linear layer
            nn.ReLU(),  # Activation function (introduces non-linearity)
            nn.Linear(in_features=hidden_units, out_features=output_shape),  # Output layer
        )

    def forward(self, x: torch.Tensor):
        return self.layer_stack(x)

In [None]:
# Instantiate the Model & Send to GPU
torch.manual_seed(42)  # Set manual seed for reproducibility

# Create an instance of the model
model_1 = FashionMNISTModelV1(
    input_shape=784,  # Number of input features (28x28 pixels flattened)
    hidden_units=8,  # Number of hidden layer neurons
    output_shape=len(class_names)  # Number of output classes (10 classes)
).to(device)  # Send model to GPU if available

# Check which device the model is on
print("Model is running on:", next(model_1.parameters()).device)  # Expected: "cuda" or "cpu"

### 2.1 Picking the Loss Function & Optimizer
Every deep learning model requires:
- A **loss function** to measure how well it performs.
- An **optimizer** to update weights and minimize loss.

For this classification problem, we use:
- **nn.CrossEntropyLoss()**: The standard loss function for multi-class classification.
- **Stochastic Gradient Descent (SGD)**: A simple optimizer with a learning rate of 0.1.

More information:
- CrossEntropyLoss: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
- SGD: https://pytorch.org/docs/stable/generated/torch.optim.SGD.html

In [None]:
# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss()  # Also called "cost function"
optimizer = torch.optim.SGD(params=model_1.parameters(), lr=0.1)  # Learning rate = 0.1

# Print all model parameters
for param in model_1.parameters():
    print(param, ", Shape:", param.shape)  # Each layer in a PyTorch model has parameters (weights and biases)

#### Creating a function to time our experiments
When training deep learning models, it’s important to measure **execution time**, especially when comparing different architectures.

We'll create a helper function using [`timeit.default_timer()` function](https://docs.python.org/3/library/timeit.html#timeit.default_timer) to **track training duration**.

In [None]:
# Prints and returns the difference between start and end time.
def print_train_time(start: float,  # Start time of computation (use timeit.default_timer())
                     end: float,   # End time of computation
                     device: torch.device = None):  # Device where computation is running (CPU/GPU)

    total_time = end - start  # Compute elapsed time
    print(f"Train time on {device}: {total_time:.3f} seconds")  # Print with 3 decimal places
    return total_time  # Return the total time in seconds (higher value = longer training time).

### 2.2 Functions for training and test loops

Now create a training loop and a testing loop to train and evaluate our model. Our data batches are contained within our `DataLoader`, `train_dataloader` and `test_dataloader` for the training and test data splits respectively.

We’ll define three key functions:
- `train_step()` → Handles **one epoch of training**.
- `test_step()` → Handles **one epoch of testing**.
- `train()` → Calls `train_step()` and `test_step()` inside an **epoch loop**.

Let's step through it:
1. Loop through epochs.
2. Loop through training batches, perform training steps, calculate the train loss *per batch*.
3. Loop through testing batches, perform testing steps, calculate the test loss *per batch*.
4. Print out what's happening.
5. Time it

To use device-agnostic code, ensure calling `.to(device)` on feature (`X`) and target (`y`) tensors.

> **Note:** Since these are functions, you can customize them in any way you like.

In [None]:
from typing import Tuple

# Performs one epoch of training on a given dataset.
def train_step(model: torch.nn.Module,  # The PyTorch model to train
               dataloader: torch.utils.data.DataLoader,  # DataLoader for training data
               loss_fn: torch.nn.Module,  # Loss function to optimize
               optimizer: torch.optim.Optimizer,  # Optimizer to update weights
               device: torch.device  # The computing device (CPU/GPU)
               ) -> Tuple[float, float]:

  # Set model to training mode
  model.train()  # This enables gradient computation and activates dropout layers (if any)

  # Initialize variables to track training loss and accuracy
  train_loss, train_acc = 0, 0

  # Loop through batches of training data
  for batch, (X, y) in enumerate(dataloader):
      X, y = X.to(device), y.to(device)  # Send data to GPU/CPU

      # 1. Forward pass: model makes predictions on input data
      y_pred = model(X)

      # 2. Compute loss: measure how far predictions (y_pred) are from actual labels (y)
      loss = loss_fn(y_pred, y)
      train_loss += loss.item()  # Accumulate loss over all batches

      # 3. Zero out previous gradients to prevent accumulation
      optimizer.zero_grad()

      # 4. Backpropagation: compute gradients for model parameters
      loss.backward()

      # 5. Optimizer step: update model parameters using gradients
      optimizer.step()

      # Convert model outputs (logits) to class predictions
      y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)

      # Calculate and accumulate accuracy metric across all batches
      train_acc += (y_pred_class == y).sum().item()/len(y_pred)

  # Compute average loss and accuracy for the epoch
  train_loss = train_loss / len(dataloader)
  train_acc = train_acc / len(dataloader)
  return train_loss, train_acc   # Return the final training loss and accuracy

In [None]:
# Evaluates a trained model on test data.
def test_step(model: torch.nn.Module,  # The trained PyTorch model to be evaluated
              dataloader: torch.utils.data.DataLoader,  # DataLoader for test data
              loss_fn: torch.nn.Module,  # The loss function for evaluation
              device: torch.device  # The computing device (CPU/GPU)
              ) -> Tuple[float, float]:

  # Set the model in evaluation mode
  model.eval()  # This disables dropout layers and gradient computation to save memory

  # Initialize variables to track test loss and accuracy
  test_loss, test_acc = 0, 0

  # Turn on inference context manager
  with torch.inference_mode():  # Disable gradient computation for efficiency (faster inference)
      # Loop through DataLoader batches
      for batch, (X, y) in enumerate(dataloader):
          X, y = X.to(device), y.to(device)  # Move test data to the selected device

          # 1. Forward pass: model makes predictions
          test_pred_logits = model(X)

          # 2. Compute loss for the batch
          loss = loss_fn(test_pred_logits, y)
          test_loss += loss.item()  # Accumulate loss over all batches

          # Convert model outputs to class predictions
          test_pred_labels = test_pred_logits.argmax(dim=1)

          # Calculate and accumulate accuracy metric across all batches
          test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))

  # Compute average loss and accuracy across all test batches
  test_loss = test_loss / len(dataloader)
  test_acc = test_acc / len(dataloader)
  return test_loss, test_acc  # Return test loss and accuracy

In [None]:
""" Side Note """
# 1. Understanding `torch.argmax(torch.softmax(y_pred, dim=1), dim=1)` in Training
#   `y_pred`: contains the raw logits (unprocessed output) from the model.
#   `torch.softmax(y_pred, dim=1)`: converts logits into probabilities (values between 0 and 1).
#   `torch.argmax(..., dim=1)`: selects the class index with the highest probability.

# Simulated raw output logits from a model (3 samples, 4 classes)
y_pred = torch.tensor([
    [2.0, 1.0, 0.1, 3.2],  # Model thinks class 3 is most likely
    [0.5, 2.1, 1.8, 0.3],  # Model thinks class 1 is most likely
    [1.2, 2.3, 3.1, 0.7]   # Model thinks class 2 is most likely
])

# Apply softmax to convert logits to probabilities
probabilities = torch.softmax(y_pred, dim=1)
print("Softmax probabilities:\n", probabilities)

# Use argmax to find the predicted class index
y_pred_class = torch.argmax(probabilities, dim=1)
print("Predicted class labels:", y_pred_class)

# 2. Understanding `test_pred_logits.argmax(dim=1)` in Testing
#   `test_pred_logits` contains raw model outputs (logits).
#   Instead of applying softmax, we directly use argmax(dim=1) to get the class with the highest value.
#   Why no softmax?
#     - Softmax does not change the ranking of logits.
#     - The highest logit before softmax is still the highest after softmax.

# Simulated raw output logits (3 samples, 4 classes)
test_pred_logits = torch.tensor([
    [2.0, 1.0, 0.1, 3.2],  # Class 3 has the highest logit
    [0.5, 2.1, 1.8, 0.3],  # Class 1 has the highest logit
    [1.2, 2.3, 3.1, 0.7]   # Class 2 has the highest logit
])

# Use argmax to get the predicted class index
test_pred_labels = test_pred_logits.argmax(dim=1)
print("Predicted class labels:", test_pred_labels)

## 3. Fit the model to the data and make a prediction

Now we'll combine `train_step()` and `test_step()` into `train()`, full training function (multiple epochs).
Use these inside another loop for each epoch.

> **Note:** You can customize how often you do a testing step in comparison to the testing steps.

In [None]:
from typing import Dict, List

# Full Training Function (Multiple Epochs)
def train(model: torch.nn.Module,  # The PyTorch model to be trained and tested
          train_dataloader: torch.utils.data.DataLoader,  # DataLoader for training data
          test_dataloader: torch.utils.data.DataLoader,  # DataLoader for test data
          optimizer: torch.optim.Optimizer,  # Optimizer for updating model parameters
          loss_fn: torch.nn.Module,  # Loss function for computing loss
          epochs: int,  # Number of epochs for training.
          device: torch.device  # The computing device (CPU/GPU)
          ) -> Dict[str, List[float]]:

  # Create empty dictionary to store results
  results = {
      "model_name": model._get_name(),
      "train_loss": [],
      "train_acc": [],
      "test_loss": [],
      "test_acc": []
  }

  # Loop through training and testing steps for a number of epochs
  for epoch in tqdm(range(epochs)):
      # Perform one training epoch
      train_loss, train_acc = train_step(model=model,
                                         dataloader=train_dataloader,
                                         loss_fn=loss_fn,
                                         optimizer=optimizer,
                                         device=device)
      # Perform one testing epoch
      test_loss, test_acc = test_step(model=model,
                                      dataloader=test_dataloader,
                                      loss_fn=loss_fn,
                                      device=device)

      # Print out what's happening
      print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
      )

      # Store results: Each metric has a list of values ​​for each epoch
      results["train_loss"].append(train_loss)
      results["train_acc"].append(train_acc)
      results["test_loss"].append(test_loss)
      results["test_acc"].append(test_acc)

  return results  # Return the filled results at the end of the epochs

## 4. Evaluate the model
- Running Training & Measuring Time

In [None]:
# Set random seed for reproducibility
torch.manual_seed(42)

# Set number of epochs
NUM_EPOCHS = 10

# Start timer before training
start_time = timer()

# Train the model and store results
model_1_results = train(model=model_1,
                        train_dataloader=train_dataloader,
                        test_dataloader=test_dataloader,
                        optimizer=optimizer,
                        loss_fn=loss_fn,
                        epochs=NUM_EPOCHS,
                        device=device)

# Stop timer after training
end_time = timer()

# Print total training time
total_train_time_model_1 = print_train_time(start=start_time,
                                            end=end_time,
                                            device=device)

> **Note:** The training time on CUDA vs CPU will depend largely on the quality of the CPU/GPU you're using, and also upon the size of the dataset and the model

# Model 2: Building a Convolutional Neural Network (CNN)


A [Convolutional Neural Network](https://en.wikipedia.org/wiki/Convolutional_neural_network) is a deep learning model designed specifically for **image processing**.
CNNs use a combination of **convolutional layers, activation functions, and pooling layers** to extract meaningful patterns from images.

The CNN model we're going to be using is known as **TinyVGG** from the [CNN Explainer](https://poloclub.github.io/cnn-explainer/) website.
It follows the typical structure of a convolutional neural network:

`Input layer -> [Convolutional layer -> activation layer -> pooling layer] -> Output layer`

where the contents of `[Convolutional layer -> activation layer -> pooling layer]` can be upscaled and repeated multiple times.

In [None]:
# Define a convolutional neural network based on TinyVGG
class TinyVGG(nn.Module):
    """
    Model architecture copying TinyVGG from:
    https://poloclub.github.io/cnn-explainer/
    """
    def __init__(self,
                 input_shape: int,  # Number of input channels (1 for grayscale images, 3 for RGB images)
                 hidden_units: int,  # Number of hidden units
                 output_shape: int):  # Number of output classes (e.g., 10 for FashionMNIST)
        super().__init__()

        # Block 1: First set of convolutional layers
        self.block_1 = nn.Sequential(
            # First convolutional layer
            nn.Conv2d(in_channels=input_shape,
                      out_channels=hidden_units,
                      kernel_size=3, # Size of each filter (3x3)
                      stride=1,  # Step size when sliding filter (default: 1)
                      padding=1),  # options = "valid" (no padding) or "same" (output has same shape as input) or int for specific number
            nn.ReLU(),  # Activation function (introduces non-linearity)
            # Second convolutional layer (uses output of first Conv2d)
            nn.Conv2d(in_channels=hidden_units,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            # Max Pooling layer (downsamples the feature maps)
            nn.MaxPool2d(kernel_size=2,  # Reduces spatial size by 2x
                         stride=2)  # Moves 2 pixels at a time (default stride value is same as kernel_size)
        )

        # Block 2: Second set of convolutional layers
        self.block_2 = nn.Sequential(
            nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        # Fully Connected Classifier
        self.classifier = nn.Sequential(
            nn.Flatten(),  # Flattens the 2D feature map into 1D vector
            # Where did this in_features shape come from?
            # It's because the final convolution output is of shape (hidden_units, 7, 7).
            nn.Linear(in_features=hidden_units*7*7,
                      out_features=output_shape)  # Output = number of classes
        )

    # Forward pass (defines how the input moves through the network)
    def forward(self, x: torch.Tensor):
        x = self.block_1(x)  # Pass through first block of conv layers
        # print(f"After block_1 shape: {x.shape}")
        x = self.block_2(x)  # Pass through second block of conv layers
        # print(f"After block_2 shape: {x.shape}")
        x = self.classifier(x)  # Pass through the fully connected layer
        # print(f"Final output shape: {x.shape}")
        return x

# Set random seed for reproducibility
torch.manual_seed(42)

# Create an instance of TinyVGG
model_2 = TinyVGG(input_shape=1,  # 1 for grayscale images (FashionMNIST)
                  hidden_units=10,
                  output_shape=len(class_names)
                  ).to(device)
# Print the model architecture
print(model_2)

When **FashionMNIST images** (`28×28` pixels, 1 channel) are passed through the network:

| **Layer**  | **Operation**  | **Output Shape** |
|------------|--------------|------------------|
| **Input Image**  | **28×28** (Grayscale) | **(1, 28, 28)** |
| **Conv2d (Block 1 - Layer 1)**  | `3×3` kernel, `padding=1` | **(10, 28, 28)** |
| **Conv2d (Block 1 - Layer 2)**  | `3×3` kernel, `padding=1` | **(10, 28, 28)** |
| **MaxPool2d (2×2)**  | Reduces by **2×** | **(10, 14, 14)** |
| **Conv2d (Block 2 - Layer 3)**  | `3×3` kernel, `padding=1` | **(10, 14, 14)** |
| **Conv2d (Block 2 - Layer 4)**  | `3×3` kernel, `padding=1` | **(10, 14, 14)** |
| **MaxPool2d (2×2)**  | Reduces by **2×** | **(10, 7, 7)** |
| **Flatten Layer**  | Converts `(10, 7, 7)` to 1D | **(10×7×7 = 490)** |


### 1. Stepping through `nn.Conv2d()`

A **convolutional layer** (`nn.Conv2d`) applies **filters (kernels)** to an image to detect patterns such as **edges, textures, and objects**.
- [`nn.Conv2d()`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html), also known as a 2-dimensional convolutional layer.
- [`nn.MaxPool2d()`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html), also known as a max pooling layer.

What Does `nn.Conv2d()` Do?
- **Slides a small kernel/filter** (e.g., 3×3) across the image.
- Computes a **dot product** between the kernel and the image pixels.
- Produces a **feature map**, highlighting areas of interest.



In [None]:
torch.manual_seed(42)

# Create a batch of random images (mimicking real images)
images = torch.randn(size=(32, 3, 64, 64)) # [batch_size, color_channels, height, width]

# Extract a single image for testing
test_image = images[0]  # Shape: [3, 64, 64] → 3 color channels, 64×64 pixels
print(f"Image batch shape: {images.shape} -> [batch_size, color_channels, height, width]")
print(f"Single image shape: {test_image.shape} -> [color_channels, height, width]")

Parameters of `nn.Conv2d()`:
* `in_channels` (int) - Number of channels in the input image.
* `out_channels` (int) - Number of channels produced by the convolution.
* `kernel_size` (int or tuple) - Size of the convolving kernel/filter.
* `stride` (int or tuple, optional) - How big of a step the convolving kernel takes at a time. Default: 1.
* `padding` (int, tuple, str) - Padding added to all four sides of input. Default: 0.

![example of going through the different parameters of a Conv2d layer](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/03-conv2d-layer.gif)

*Example of what happens when you change the hyperparameters of a `nn.Conv2d()` layer.*

In [None]:
torch.manual_seed(42)

# Create a convolutional layer with same dimensions as TinyVGG
# (try changing any of the parameters and see what happens)
conv_layer = nn.Conv2d(in_channels=3,  # Input channels (3 for RGB images)
                       out_channels=10,
                       kernel_size=3,
                       stride=1,
                       padding=0)

# Pass the single image through the convolutional layer
conv_layer(test_image).shape # Note: If running PyTorch <1.11.0, this will error because of shape issues (nn.Conv.2d() expects a 4d tensor as input)

# Print output shape
print(f"Shape after passing through conv_layer: {conv_layer(test_image).shape}")

In [None]:
### Exercise: Pass test image with extra dimension at 0 index through conv_layer ('unsqueeze' method)
conv_output = conv_layer(test_image.unsqueeze(dim=0))
print(f"Shape after passing through conv_layer: {conv_output.shape}")

In [None]:
# Check out the conv_layer internal parameters
print([(key, value.shape) for (key, value) in conv_layer.state_dict().items()])
# [out_channels, in_channels, kernel_size, kernel_size], [out_channels]

### 2. Stepping through `nn.MaxPool2d()`

A **max pooling layer** (`nn.MaxPool2d()`) reduces the **spatial dimensions** while keeping important features.

What Does nn.MaxPool2d() Do?
- **Downsamples** feature maps, making them smaller and faster to process.
- **Extracts important features** while discarding unimportant details.

In [None]:
# Print out original image shape without and with unsqueezed dimension
print(f"Test image original shape: {test_image.shape}")
print(f"Test image with unsqueezed dimension: {test_image.unsqueeze(dim=0).shape}")

# Create a sample nn.MaxPoo2d() layer
max_pool_layer = nn.MaxPool2d(kernel_size=2)  # Takes max value from 2x2 regions

# Pass data through just the conv_layer
test_image_through_conv = conv_layer(test_image.unsqueeze(dim=0))
print(f"Shape after going through conv_layer(): {test_image_through_conv.shape}")

### Exercise: Pass this new data `test_image_through_conv` through the max pool layer
#test_image_through_conv_and_max_pool = ...
#print(f"Shape after going through conv_layer() and max_pool_layer(): {test_image_through_conv_and_max_pool.shape}")

**Every layer in a neural network is trying to compress data from higher dimensional space to lower dimensional space**.

In other words, it takes a lot of numbers (raw data) and learns patterns from those numbers. These patterns are predictable and smaller than the original values.

![each layer of a neural network compresses the original input data into a smaller representation that is (hopefully) capable of making predictions on future input data](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/03-conv-net-as-compression.png)

The `nn.Conv2d()` performs a convolutional operation on the data (see this in action on the [CNN Explainer webpage](https://poloclub.github.io/cnn-explainer/)).

A `nn.MaxPool2d()` layer: take the maximum value from a portion of a tensor and disregard the rest.

> **Exercise:** What do you think the [`nn.AvgPool2d()`](https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html) layer does? Try making a random tensor like we did above and passing it through. Check the input and output shapes as well as the input and output values.

### 3. Training and testing `model_2` using our training and test functions

In [None]:
# Set random seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Set number of epochs
NUM_EPOCHS = 10

### Exercise: Complete the training setup

# # Recreate an instance of TinyVGG
# model_2 = TinyVGG(
#     ...
#     ).to(device)

# # Setup loss function and optimizer
# loss_fn = ...
# optimizer = ...

# # Start the timer
# start_time = timer()

# # Train model_2
# model_2_results = train(
#     ...
# )

# # End the timer and print out how long it took
# end_time = timer()
# total_train_time_model_2 = print_train_time(start=start_time,
#                                             end=end_time,
#                                             device=device)

## Evaluating Model Predictions

Once our CNN model (`model_2`) is trained, we need to **evaluate its performance** by making predictions on **unseen test data** and analyzing the results.

#### Key Steps in Model Evaluation
- **Select random test samples** from the dataset.
- **Pass samples through the model** to get raw logits.
- **Convert logits into probabilities** using softmax().
- **Select the most likely class** using argmax().
- **Compare predictions with true labels**.
- **Visualize** results to check model performance.


In [None]:
# Makes predictions on a list of data samples using a trained PyTorch model.
def make_predictions(model: torch.nn.Module,  # Trained PyTorch model
                     data: list,  # List of input data samples (e.g., images)
                     device: torch.device = device):  # Device to run the model on (CPU/GPU)
    # Store prediction probabilities
    pred_probs = []

    # Set model to evaluation mode
    model.eval()

    # No gradient calculation needed (saves memory & speeds up inference)
    with torch.inference_mode():
        for sample in data:
            # Prepare sample
            sample = torch.unsqueeze(sample, dim=0).to(device) # Add batch dimension (from [C, H, W] -> [1, C, H, W]

            # 1. Forward pass: Get raw output (logits) from the model
            pred_logit = model(sample)

            # Convert logits to probabilities using softmax
            pred_prob = torch.softmax(pred_logit.squeeze(), dim=0)
            # note: perform softmax on the "logits" dimension, not "batch" dimension (in this case we have a batch size of 1, so can perform on dim=0)

            # Move predictions to CPU and store them
            pred_probs.append(pred_prob.cpu())

    # Convert list of prediction probabilities into a tensor
    return torch.stack(pred_probs)

In [None]:
# Select Random Test Samples
import random
random.seed(42)  # Ensure reproducibility

test_samples = []
test_labels = []

# Select 9 random test images
for sample, label in random.sample(list(test_data), k=9):
    test_samples.append(sample)  # Store the image
    test_labels.append(label)    # Store the label

# View details of the first test sample
print(f"Test sample image shape: {test_samples[0].shape}")  # Expected: torch.Size([1, 28, 28])
print(f"Test sample label: {test_labels[0]} ({class_names[test_labels[0]]})")

In [None]:
# Make predictions on test samples using model 2
pred_probs= make_predictions(model=model_2,
                             data=test_samples)

# Convert prediction probabilities into class labels (argmax selects the highest probability)
pred_classes = pred_probs.argmax(dim=1)

# Compare predicted classes with actual test labels
test_labels, pred_classes

In [None]:
# Visualizing Predictions
plt.figure(figsize=(9, 9))
nrows = 3
ncols = 3
for i, sample in enumerate(test_samples):
  # Create a subplot
  plt.subplot(nrows, ncols, i+1)

  # Plot the target image
  plt.imshow(sample.squeeze(), cmap="gray")

  # Find the prediction label (in text form, e.g. "Sandal")
  pred_label = class_names[pred_classes[i]]

  # Get the truth label (in text form, e.g. "T-shirt")
  truth_label = class_names[test_labels[i]]

  # Create the title text of the plot
  title_text = f"Pred: {pred_label} | Truth: {truth_label}"

  # Check for equality and change title colour accordingly
  if pred_label == truth_label:
      plt.title(title_text, fontsize=10, c="g") # green text if correct
  else:
      plt.title(title_text, fontsize=10, c="r") # red text if wrong
  plt.axis(False);

## Performance Comparison

We've trained different models.

1. `model_1` - our baseline model with `nn.Linear()` layers (A fully connected neural network (MLP)).
2. `model_2` - our first TinyVGG-inspired CNN model.

Now, let's **visualize** loss and accuracy curves to compare performance.

### Plot the loss curves

**Loss curves** show the model's results over time.

And they're a great way to see how your model performs on different datasets (e.g. training and test).
- **Training loss** (`train_loss`): Measures how well the model is performing on the **training data**.
- **Test loss** (`test_loss`): Measures how well the model is performing on **unseen test data**.

In [None]:
def plot_loss_curves(results: Dict[str, List[float]]):
    """Plots training curves of a results dictionary.

    Args:
        results (dict): dictionary containing list of values, e.g.
            {"train_loss": [...],
             "train_acc": [...],
             "test_loss": [...],
             "test_acc": [...]}
    """

    # Get the loss values of the results dictionary (training and test)
    loss = results['train_loss']
    test_loss = results['test_loss']

    # Get the accuracy values of the results dictionary (training and test)
    accuracy = results['train_acc']
    test_accuracy = results['test_acc']

    # Figure out how many epochs there were
    epochs = range(len(results['train_loss']))

    # Setup a plot
    plt.figure(figsize=(10, 5))

    # Plot loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, loss, label='train_loss')
    plt.plot(epochs, test_loss, label='test_loss')
    plt.title('Loss')
    plt.xlabel('Epochs')
    plt.legend()

    # Plot accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, accuracy, label='train_accuracy')
    plt.plot(epochs, test_accuracy, label='test_accuracy')
    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.legend();

In [None]:
plot_loss_curves(model_1_results)

In [None]:
### Exercise: Plot the loss curves of the VGG model and compare the performance with respect to the baseline
#plot_loss_curves(...)

### What should an ideal loss curve look like?

1. **Overfitting** → The model learns the training data too well but **fails to generalize** to new data.
- Training loss **decreases** significantly, but test loss **stays high**.
- Training accuracy is **much higher** than test accuracy.

2. **Underfitting** → The model is **not learning well** from the training data.
- Both **training and test loss remain high**.
- Accuracy is **low** on both datasets.

3. **Ideal Model** → Training and test loss curves should be **close together** and **gradually decrease**.

<img src="https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/04-loss-curves-overfitting-underfitting-ideal.jpg" alt="different training and test loss curves illustrating overfitting, underfitting and the ideal loss curves" width="800"/>

*Left: If your training and test loss curves aren't as low as you'd like, this is considered **underfitting**. *Middle:* When your test/validation loss is higher than your training loss this is considered **overfitting**. *Right:* The ideal scenario is when your training and test loss curves line up over time. This means your model is generalizing well. There are more combinations and different things loss curves can do, for more on these, see Google's [Interpreting Loss Curves guide](https://developers.google.com/machine-learning/testing-debugging/metrics/interpretic).*

### Performance-speed tradeoff

Generally, you get better performance (lowest loss, highest accuracy) out of a larger, more complex model (like we did with `model_2`).

However, this performance increase often comes at a sacrifice of training speed and inference speed, e.g.,

| **Factor**            | **Effect on Performance** | **Effect on Speed** |
|-----------------------|-------------------------|---------------------|
| **Model Complexity**  | ✅ Higher accuracy   | ❌ Slower training |
| **Number of Layers**  | ✅ Learns better features | ❌ More computations |
| **Batch Size**        | ✅ Smoother optimization | ❌ Needs more memory |

> **Note:** The training times you get will be very dependant on the hardware you use.



---



---




## Making a confusion matrix for further prediction evaluation

There are many [different evaluation metrics](https://www.learnpytorch.io/02_pytorch_classification/#9-more-classification-evaluation-metrics) we can use for classification problems.

A confusion matrix shows you where your classification model got confused between predicitons and true labels.

Let's start by making predictions with our trained model.

In [None]:
# 1. Make predictions with trained model
y_preds = []
model_2.eval()
with torch.inference_mode():
  for X, y in tqdm(test_dataloader, desc="Making predictions"):
    # Send data and targets to target device
    X, y = X.to(device), y.to(device)
    # Do the forward pass
    y_logit = model_2(X)
    # Turn predictions from logits -> prediction probabilities -> predictions labels
    y_pred = torch.softmax(y_logit, dim=1).argmax(dim=1) # note: perform softmax on the "logits" dimension, not "batch" dimension (in this case we have a batch size of 32, so can perform on dim=1)
    # Put predictions on CPU for evaluation
    y_preds.append(y_pred.cpu())
# Concatenate list of predictions into a tensor
y_pred_tensor = torch.cat(y_preds)

In [None]:
from sklearn.metrics import confusion_matrix
import pandas as pd
import plotly.express as px

# 2. Setup confusion matrix instance and compare predictions to targets
confmat = confusion_matrix(
    y_true=test_data.targets,
    y_pred=y_pred_tensor)

# 3. Plot the confusion matrix
df_cm = pd.DataFrame(
    confmat,
    index = [i for i in class_names],
    columns = [i for i in class_names])
fig = px.imshow(df_cm, text_auto=True)
fig.show()

# Can also use seaborn
# import seaborn as sns
# plt.figure(figsize = (10,8))
# sns.heatmap(df_cm, cmap='coolwarm', annot=True, fmt='d')

We can see our model does fairly well since most of the data is on the diagonal.

The model gets most "confused" on classes that are similar, for example predicting "Pullover" for images that are actually labelled "Shirt". It's understandable the model sometimes predicts "Shirt" for images labelled "T-shirt/top".

This kind of information is often more helpful than a single accuracy metric because it tells use *where* a model is getting things wrong. It also hints at *why* the model may be getting certain things wrong.

We can use this kind of information to further inspect our models and data to see how it could be improved.

> **Exercise:** Use the trained `model_2` to make predictions on the test FashionMNIST dataset. Then plot some predictions where the model was wrong alongside what the label of the image should've been. After visualing these predictions do you think it's more of a modelling error or a data error? As in, could the model do better or are the labels of the data too close to each other (e.g. a "Shirt" label is too close to "T-shirt/top")?

## Save and load best performing model

We can save and load a PyTorch model using a combination of:
* `torch.save` - a function to save a whole PyTorch model or a model's `state_dict()`.
* `torch.load` - a function to load in a saved PyTorch object.
* `torch.nn.Module.load_state_dict()` - a function to load a saved `state_dict()` into an existing model instance.

You can see more of these three in the [PyTorch saving and loading models documentation](https://pytorch.org/tutorials/beginner/saving_loading_models.html).

In [None]:
from pathlib import Path

# Create models directory (if it doesn't already exist), see: https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, # create parent directories if needed
                 exist_ok=True # if models directory already exists, don't error
)

# Create model save path
MODEL_NAME = "03_pytorch_computer_vision_model_2.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# Save the model state dict
print(f"Saving model to: {MODEL_SAVE_PATH}")
torch.save(obj=model_2.state_dict(), # only saving the state_dict() only saves the learned parameters
           f=MODEL_SAVE_PATH)

Now we've got a saved model `state_dict()` we can load it back in using a combination of `load_state_dict()` and `torch.load()`.

Since we're using `load_state_dict()`, we'll need to create a new instance of `TinyVGG()` with the same input parameters as our saved model `state_dict()`.

In [None]:
# Create a new instance of TinyVGG (the same class as our saved state_dict())
# Note: loading model will error if the shapes here aren't the same as the saved version
loaded_model_2 = TinyVGG(input_shape=1,
                         hidden_units=10, # try changing this to 128 and seeing what happens
                         output_shape=10)

# Load in the saved state_dict()
loaded_model_2.load_state_dict(torch.load(f=MODEL_SAVE_PATH))

# Send model to GPU
loaded_model_2 = loaded_model_2.to(device)

And now we've got a loaded model we can evaluate it with `eval_model()` to make sure its parameters work similarly to `model_2` prior to saving.

In [None]:
# Evaluate loaded model
torch.manual_seed(42)

loaded_model_2_results = test_step(model=loaded_model_2,
                                   dataloader=test_dataloader,
                                   loss_fn=loss_fn,
                                   device=device)

In [None]:
# Check to see if this result is close to the final value of the original version
np.isclose(model_2_results["test_loss"][-1], loaded_model_2_results[0],
              atol=1e-08, # absolute tolerance
              rtol=0.0001) # relative tolerance

# Exercises

* [Excellent place to learn pytorch](https://www.learnpytorch.io/)
* [Official deep learning blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
* [3B1B videos](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi)
* [Transfer Learning for Computer Vision Tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)
* Watch [MIT's Introduction to Deep Computer Vision](https://www.youtube.com/watch?v=iaSUYvmCekI&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=3) lecture. This will give you a great intuition behind convolutional neural networks.