<a href="https://colab.research.google.com/github/christophergaughan/PyTorch/blob/main/ComputerVision_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Computer Vision- Using PYTorch

**Basis**

pixels are read as RGB colors and turned into --> numbers (tensors) or `numerical encoding` --> model (algorithm) --> output probability that the image is X ot Y or Z

**Details**
 Tensors contain the following information:
 1. Width of image
 2. Height of image
 3. Color channels == 3 (RGB)
 depending on what algorithm you're working with data as tensors whose ID is as follows:

 [batch_size, height, width, color_channels] OR [batch_size, color_channels, height, width]

 These will be mainly CNN models

 We will be working with `torch.nn.Conv2d`

 ## Computer version libraries in PyTorch

* `torchvision`- base domain library for PyTorch computer vision-
  https://pytorch.org/vision/stable/index.html
* `torchvision.datassets`get datasets and loading functions here:
  https://pytorch.org/vision/stable/datasets.html#built-in-datasets
* `torchvision.models` get pre-trained computer vision models i.e. have pretrained weights, etc. that you can leverage for your own problems.
* `torchvision.transforms`- functions for manipulating your vision data (images) to be suitable for use with an ML model.
* `torch.utils.Dataset`- Base dataset class for PyTorch.
* `torch.utils.data.DataLoader` - Creates a Python iterable over a dataset

Torchvision supports common computer vision transformations in the torchvision.transforms and torchvision.transforms.v2 modules. Transforms can be used to transform or augment data for training or inference of different tasks (image classification, detection, segmentation, video classification).

* PIL is the Python Imaging Library by Fredrik Lundh and contributors.

### torchvision.datasets

All datasets are subclasses of torch.utils.data.Dataset i.e, they have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers. For example:
```
imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=args.nThreads)
```

In [None]:
import torch
import torchvision
from torchvision import datasets
from torchvision import transforms
from torchvision.transforms import ToTensor
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import numpy as np
import matplotlib.pyplot as plt

print(torch.__version__)
print(torchvision.__version__)

## Getting a dataset

we will be using `fashion.mnist` datset- greyscale images of clothing
basic dataset for implementation here

Be aware that IMAGENET  is the gold standard for computer vision evaluations

`torchvision.datasets.FashionMNIST(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]`

### Fashion-MNIST Dataset.

Parameters:
* **root (string)** – Root directory of dataset where FashionMNIST/processed/training.pt and FashionMNIST/processed/test.pt exist.
* **train (bool, optional)** – If True, creates dataset from training.pt, otherwise from test.pt.
* **download (bool, optional)** – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
* **target_transform (callable, optional)** – A function/transform that takes in the target and transforms it.

In [None]:
# Setup Training data
train_data = datasets.FashionMNIST(
    root="data", # where to download data to
    train=True, # do we want the training dataset?
    download=True, # do we want to download?
    transform=torchvision.transforms.ToTensor(), # how to transform the data
    target_transform=None # how do we want to transform the labels/target
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor(),
    target_transform=None
)



In [None]:
len(train_data), len(test_data)

In [None]:
# See the first training data- this will output the data as tensors (C x H x W) NOTE: grey scale images only have 1 color channel
image, label = train_data[0]
image, label

In [None]:
class_names = train_data.classes
class_names

In [None]:
class_to_idx = train_data.class_to_idx
class_to_idx

In [None]:
train_data.targets

In [None]:
# Check shape of our image
print(f"Image Shape: {image.shape} --> [color_channels, height, width], Image Label: {class_names[label]}")

## Visualizing our data

In [None]:
image, label = train_data[0]
print(f"Image Shape: {image.shape}")
plt.imshow(image.squeeze(), cmap="gray") # had to remove a dimension so it would plot
plt.title(class_names[label])
plt.axis("off")
plt.imshow(image.squeeze())
# image

In [None]:
# Plot more images
torch.manual_seed(42)
fig = plt.figure(figsize=(9, 9))
row, cols = 4, 4
for i in range(1, row * cols + 1):
    random_idx = torch.randint(0, len(train_data), size=[1]).item()
    img, label = train_data[random_idx]
    fig.add_subplot(row, cols, i)
    plt.imshow(img.squeeze(), cmap="gray")
    plt.title(class_names[label])
    plt.axis(False)

## Check Input/Output shapes of Data

In [None]:
print(f"Image Shape: {image.shape}")
print(f"Image Label: {class_names[label]}")

Visualizing data

In [None]:
image, label = train_data[0]
print(f"Image Shape: {image.shape}")
plt.imshow(image.squeeze(), cmap="plasma") # had to remove a dimension so it would plot b/c shape issue (1, 28, 28) and output data is not correlating with image size it is looking for, in this case it expects color channels to be last the squeze gets rid of the 1 in [1, 28, 28]
plt.title(class_names[label])
plt.axis("off")

In [None]:
from matplotlib import colormaps
list(colormaps)

In [None]:
# Plot more images
torch.manual_seed(42)
fig = plt.figure(figsize=(9, 9))
row, cols = 4, 4
for i in range(1, row * cols + 1):
    random_idx = torch.randint(0, len(train_data), size=[1]).item()
    img, label = train_data[random_idx]
    fig.add_subplot(row, cols, i)
    plt.imshow(img.squeeze(), cmap="gray")
    plt.title(class_names[label])
    plt.axis(False);

Can these items of clothing (images) could be modelled with linear lines only? Or is it the case we will have to introduce some non-linearity? Just a thought.

In [None]:
train_data, test_data

## Prepare DataLoader

Right now, our data is in the form of PyTorch Datasets.

DataLoader turns our dataset into Python iterable.

More specifically, we want to turn our data into batches (or mini-batches)

Q) Why do we do this?

A) The data takes up memory, and we have 60,000 training mages and 10,000 testing images. To alleviate this memeory load, we break the data up into batches. More Specifically:

1. It is more computationally efficient, as in, your computing hardware may not be able to look at (store in memory) 60000 images at once. Thus we brak these images up into batches of 32 (batch_size=32). This is a very common batch size.
2. It gives our neural network more chances to update it's gradients per epoch. See video by Andrew ng: https://www.youtube.com/watch?v=4qJaSmvhxi8 for more info about this.
3. One parameter in the DataLoader is `shuffle`. We want to be able to shuffle the data incase there is some pre-determined order to our data and this helps randomize the images the training loop sees without that order grafted onto our model, thus producing a poor model. We don't want our model to 'memorize' the data.


In [None]:
# Batchify our dataset
from torch.utils.data import DataLoader
BATCH_SIZE = 32
# Turn our datasets into iterables (batches)
train_dataloader = DataLoader(dataset=train_data,
                              batch_size=BATCH_SIZE,
                              shuffle=True)

test_dataloader = DataLoader(dataset=test_data,
                             batch_size=BATCH_SIZE,
                             shuffle=False) # we don't shuffle the test dataset

train_dataloader, test_dataloader

In [None]:
# Let's check out what we've created
print(f"Length of train dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}")
print(f"Length of test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}")

## Check out what is inside the training dataloader

In [None]:
train_features_batch, train_labels_batch = next(iter(train_dataloader))
train_features_batch.shape, train_labels_batch.shape

Note above, the color channels are first

In [None]:
torch.manual_seed(42)
random_idx = torch.randint(0, len(train_features_batch), size=[1]).item()
image, label = train_features_batch[random_idx], train_labels_batch[random_idx]
plt.imshow(img.squeeze(), cmap="gray")
plt.title(class_names[label])
plt.axis(False)
print(f"Image Shape: {image.shape}")
print(f"Label: {label}, label_size: {label.shape}")

## Model 0: Build a baseline model

When starting to build a series of machine learning modelling experiments, it's best practice to start with a *baseline model*

A baseline model in a model you will try to improve upon with subsequent models/expt's

AKA: start simply and add/ experiment with complexity when necessary 🧪

In [None]:
# Create a flattened layer
flatten_model = torch.nn.Flatten()

# Get a single sample
x = train_features_batch[0]
x.shape
# Flatten the sample
output = flatten_model(x)

# Print out what happened
print(f"Shape before flattening: {x.shape} -> [color_channels, height, width]")
print(f"Shape after flattening: {output.shape} -> [color_chanells, height*width]")

we can see the batch size and the product of 78x78

In [None]:
import torch
from torch import nn

torch.manual_seed(42)

class FashionMNISTModelV0(nn.Module):  # Inherit from nn.Module
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(  # Correct the attribute name
            nn.Flatten(),
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.Linear(in_features=hidden_units, out_features=output_shape)
        )

    def forward(self, x):
        return self.layer_stack(x)  # Use the correct attribute name


In [None]:
model_0 = FashionMNISTModelV0(
    input_shape=784,
    hidden_units=10,
    output_shape=len(class_names)
).to("cpu")  # Move model to CPU
print(model_0)

In [None]:
dummy_x = torch.rand([1, 1, 28, 28])
model_0(dummy_x)

In [None]:
model_0.state_dict()

## Setup loss, optimizer and evaluation metrics

* Loss function- since we're working with multi-class data, our loss function will be `nn.CrossEntropyLoss()`
* Optimizer - our optimizer `torch.optim.SGD()`
* Evaluation Metric- since this is a classification problem, we'll use Accuracy


In [None]:
import requests
from pathlib import Path

# Download helper function for accuracy from learn PyTorch.repo
if Path("helper_functions.py").is_file():
  print("helper_functions.py already exists, skipping download....")
else:
  print("Downloading helper_functions.py")
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

In [None]:
# Import accuracy metric
from helper_functions import accuracy_fn

In [None]:
accuracy_fn(torch.tensor([[0.2, 0.5, 0.3]]), torch.tensor([2]))

In [None]:
# Setup loss and optimizer functions
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.1)

## Creating a function to time our experiments

We need to be cognizant of the fact that Machine/Deep Learning is very experimental. These experiments can be very costly with respect to the resources that they require in terms of memory and GPU usage. When scaled up to very large jobs, you might find that the added complexity also comes at the cost of <u>*time*</u>.

Thus two main things we'll keep track  (we'll find there is a trade-off between these):
1. Model's performance (loss and accuracy values, etc.)
2. How fast model runs.

We are already tracking our model wrt lossfunction and accuracy, let's explore the time dimension below. Since we'll be using `timeit`, here's where to find the documentation: https://docs.python.org/3/library/timeit.html

The default timer, which is always `time.perf_counter()`, returns float seconds. An alternative, `time.perf_counter_ns`, returns integer nanoseconds.

```python
class timeit.Timer(stmt='pass', setup='pass', timer=<timer function>, globals=None)
```

In [None]:
from timeit import default_timer as timer

def print_train_time(start: float, end: float, device: torch.device = None):

    '''
    prints difference between start and end time
    '''
    total_time = end - start
    print(f"Train time on {device}: {total_time:.3f} seconds")
    return total_time

In [None]:
start_time = timer()
end_time = timer()
print_train_time(start=start_time, end=end_time, device=None)
print_train_time(start=start_time, end=end_time, device="cpu")

## Creating a training loop and training a model on batches of the data
remember: the optimizer will update a model's parameters once per batch rather than one per epoch....

key steps:
1. Loop through the epochs
2. Loop through training batches, perform training steps, calculate the loss *per batch*
3. Loop through testing batches, perform testing steps, calculate the loss *per batch*
4. print out what's happening
5. time it all

### NOTE Below we are iterating and keeping count of the accumulated `train_loss` below. Here are some specific details about the use of the `enumerate()` function and how it is being used:

1. `train_dataloader`: This is an iterable object, such as a PyTorch `DataLoader`, which provides batches of data (`X`) and corresponding labels (`y`) for training our machine learning model.

2. `enumerate(train_dataloader)`: The `enumerate` function iterates over `train_dataloader` and, in addition to yielding each batch of data `(X, y)`, it also provides an index (`batch`) for the current iteration. The `batch` variable represents the batch number, starting from 0 by default.

### Purpose of enumerate in this loop:
1. Tracking batch indices: The `batch` variable allows you to keep track of which batch is being processed. This can be useful for:

* Logging or debugging (e.g., printing the batch number during training).
* Performing specific actions at certain batch intervals (e.g., saving a model every 100 batches).
* Analyzing batch-specific metrics.
2. Improved readability: By using `enumerate`, you don't have to manually maintain a counter variable and increment it in each iteration. It keeps the code concise and clean.

**Here’s how it might be used in practice in the generic sense:**
```
for batch, (X, y) in enumerate(train_dataloader):
    print(f"Processing batch {batch}")
    # Perform training step
    output = model(X)
    loss = loss_function(output, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

```
Here:
* `batch` keeps track of the current batch number.
* `(X, y)` contains the features (independent vars) and labels (target) for that batch.
#### Using `enumerate` is a common practice in Python loops whenever you need both the index and the elements of an iterable.

In [None]:
# Import tqdm for progress bar
from tqdm.auto import tqdm

# Set the seed and start the timer
torch.manual_seed(42)
train_time_start_on_cpu = timer()

# Set the number of epochs (we'll keep this small for faster training times)
epochs = 3

# Create training and testing loop
for epoch in tqdm(range(epochs)):
    print(f"Epoch: {epoch}\n-------")
    ### Training
    train_loss = 0
    # Add a loop to loop through training batches
    for batch, (X, y) in enumerate(train_dataloader):
        model_0.train()
        # 1. Forward pass
        y_pred = model_0(X)

        # 2. Calculate loss (per batch)
        loss = loss_fn(y_pred, y)
        train_loss += loss # accumulatively add up the loss per epoch

        # 3. Optimizer zero grad
        optimizer.zero_grad()

        # 4. Loss backward
        loss.backward()

        # 5. Optimizer step
        optimizer.step()

        # Print out how many samples have been seen
        if batch % 400 == 0:
            print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples")

    # Divide total train loss by length of train dataloader (average loss per batch per epoch)
    train_loss /= len(train_dataloader)

    ### Testing
    # Setup variables for accumulatively adding up loss and accuracy
    test_loss, test_acc = 0, 0
    model_0.eval()
    with torch.inference_mode():
        for X, y in test_dataloader:
            # 1. Forward pass
            test_pred = model_0(X)

            # 2. Calculate loss (accumulatively)
            test_loss += loss_fn(test_pred, y) # accumulatively add up the loss per epoch

            # 3. Calculate accuracy (preds need to be same as y_true)
            test_acc += accuracy_fn(y_true=y, y_pred=test_pred.argmax(dim=1))

        # Calculations on test metrics need to happen inside torch.inference_mode()
        # Divide total test loss by length of test dataloader (per batch)
        test_loss /= len(test_dataloader)

        # Divide total accuracy by length of test dataloader (per batch)
        test_acc /= len(test_dataloader)

    ## Print out what's happening
    print(f"\nTrain loss: {train_loss:.5f} | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%\n")

# Calculate training time
train_time_end_on_cpu = timer()
total_train_time_model_0 = print_train_time(start=train_time_start_on_cpu, end=train_time_end_on_cpu,
                                device=str(next(model_0.parameters()).device))




## Evaluate the model_0 and make predictions: This is us functionalizing this step for use on any model

Also note that the argmax is finding the index of the highest logit value. The raw outputs of our model are logits and if we ant to convert them into labels we could use the softmax function but here we use the argmax.

In [None]:
torch.manual_seed(42)
def eval_model(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, loss_fn: torch.nn.Module, accuracy_fn):
    '''
    Returns a dictionary containing the results of model predicting on data_loader
    '''
    loss, acc = 0, 0
    model.eval()
    with torch.inference_mode():
        for X, y in tqdm(data_loader):
            # Make predictions
            y_pred = model(X) # note, don't have to specify model, see above

            # Accumulate the loss and acc values per batch
            loss += loss_fn(y_pred, y)
            acc += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1))

        # Scale loss and acc to find the average loss/acc per batch
        loss /= len(data_loader)
        acc /= len(data_loader)
    return {"model_name": model.__class__.__name__, # only works when model was created with a class
            "model_loss": loss.item(),
            "model_acc": acc}

# Calculate model 0 ewsuts on test dataset
model_0_results = eval_model(model=model_0,
                             data_loader=test_dataloader,
                             loss_fn=loss_fn,
                             accuracy_fn=accuracy_fn)
model_0_results

Set up device agnostic code to run on gpu

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

In [None]:
!nvidia-smi

## Our model did ok using no nonliearity, however now we will employ some non-liear functions

In past notebooks we've learned about the power of non-liearity in evaluating data. Let's put that to the test.

In [None]:
import torch
from torch import nn

class FashionMNISTModelV1(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        """
        Initializes the FashionMNISTModelV1 model.

        Args:
            input_shape (int): The number of input features (e.g., 28*28 for flattened images).
            hidden_units (int): The number of hidden units in the first linear layer.
            output_shape (int): The number of output features (e.g., 10 for FashionMNIST classes).
        """
        super(FashionMNISTModelV1, self).__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(),  # Flatten the inputs into a single vector
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units, out_features=output_shape),
            nn.ReLU(),
        )

    def forward(self, x: torch.Tensor):
        """
        Defines the forward pass of the model.
        """
        return self.layer_stack(x)





In [None]:
# Define parameters
input_shape = 28 * 28  # For flattened 28x28 images
hidden_units = 10  # Number of hidden units
output_shape = 10  # Number of classes in FashionMNIST (e.g., 10 classes)

# Define device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize the model
torch.manual_seed(42)
model_1 = FashionMNISTModelV1(input_shape=input_shape, hidden_units=hidden_units, output_shape=output_shape).to(device)

# Verify model device
print(next(model_1.parameters()).device)



In [None]:
# Setup loss and optimizer functions
loss_fn = nn.CrossEntropyLoss() # measures how far from test values our model is
optimizer = torch.optim.SGD(params=model_1.parameters(), lr=0.1) # tries to update our models parameters to improve performance/ reduce loss

## Funtionalize training/evaluation loop

Let's create a function for:
* training loop - `train_step()`
* testing loop - `test_step()`

In [None]:
def train_step(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               accuracy_fn,
               device: torch.device):

    """
    Performs a training step where the model learns on data_loader.

    Args:
        model (torch.nn.Module): The model to train.
        data_loader (torch.utils.data.DataLoader): DataLoader for training data.
        loss_fn (torch.nn.Module): Loss function to optimize.
        optimizer (torch.optim.Optimizer): Optimizer for model parameters.
        accuracy_fn (callable): Function to calculate accuracy.
        device (torch.device): Device to run training on (e.g., 'cuda' or 'cpu').
    """
    # Put the model into training mode
    model.train()

    # Initialize tracking metrics
    train_loss = 0
    train_acc = 0

    # Loop through the training batches
    for batch, (X, y) in enumerate(data_loader):

        # Move data to target device
        X, y = X.to(device), y.to(device)

        # Forward pass - outputs raw logits from the model
        y_pred = model(X)

        # Calculate the loss per batch
        loss = loss_fn(y_pred, y)
        train_loss += loss.item()  # Add scalar value to train_loss

        # Calculate the accuracy per batch
        train_acc += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1))  # Converts logits to labels

        # Zero gradients for the optimizer
        optimizer.zero_grad()

        # Backpropagation
        loss.backward()

        # Optimizer step - update model parameters
        optimizer.step()

    # Calculate average loss and accuracy across all batches
    train_loss /= len(data_loader)
    train_acc /= len(data_loader)

    # Print metrics
    print(f"Train loss: {train_loss:.5f} | Train accuracy: {train_acc:.2f}%")


In [None]:
def test_step(model: torch.nn.Module,
              data_loader: torch.utils.data.DataLoader,
              loss_fn: torch.nn.Module,
              accuracy_fn,
              device: torch.device):
    """
    Performs a testing loop on the given model over the data_loader.

    Args:
        model (torch.nn.Module): The model to test.
        data_loader (torch.utils.data.DataLoader): DataLoader for test data.
        loss_fn (torch.nn.Module): Loss function to calculate the loss.
        accuracy_fn (function): Function to calculate accuracy.
        device (torch.device): Device to run the testing on.
    """
    test_loss, test_acc = 0, 0
    model.eval()

    # Turn on inference mode context manager
    with torch.inference_mode():
        for X, y in data_loader:
            # Send the data to target device
            X, y = X.to(device), y.to(device)

            # Forward pass (raw logits)
            test_pred = model(X)

            # Calculate the loss (accumulated)
            test_loss += float(loss_fn(test_pred, y))

            # Calculate the accuracy (accumulated)
            test_acc += accuracy_fn(y_true=y, y_pred=test_pred.argmax(dim=1))

        # Adjust metrics
        test_loss /= len(data_loader)
        test_acc /= len(data_loader)

        # Print results (adjust based on accuracy_fn behavior)
        print(f"Test loss: {test_loss:.5f} | Test accuracy: {test_acc:.2f}%")


### Let's put our functions to use

In [None]:
import requests
from pathlib import Path

# Download helper function for accuracy from learn PyTorch.repo
if Path("helper_functions.py").is_file():
  print("helper_functions.py already exists, skipping download....")
else:
  print("Downloading helper_functions.py")
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

In [None]:
# Import accuracy metric
from helper_functions import accuracy_fn

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset
from tqdm.auto import tqdm
from timeit import default_timer as timer

# Assume train_dataloader and test_dataloader are already defined

# Example model
model_1 = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(28 * 28, 128),
    torch.nn.ReLU(),
    torch.nn.Linear(128, 10)
)

# Loss function and optimizer
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_1.parameters(), lr=0.001)

# Device configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move the model to the correct device
model_1.to(device)

# Training utilities
def train_step(model, data_loader, loss_fn, optimizer, accuracy_fn, device):
    model.train()
    train_loss, train_acc = 0, 0
    for X, y in data_loader:
        X, y = X.to(device), y.to(device)  # Move to device
        y_pred = model(X)
        loss = loss_fn(y_pred, y)
        train_loss += loss.item()
        train_acc += accuracy_fn(y, y_pred.argmax(dim=1))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    train_loss /= len(data_loader)
    train_acc /= len(data_loader)
    print(f"Train loss: {train_loss:.5f} | Train accuracy: {train_acc:.2f}%")

def test_step(model, data_loader, loss_fn, accuracy_fn, device):
    model.eval()
    test_loss, test_acc = 0, 0
    with torch.no_grad():
        for X, y in data_loader:
            X, y = X.to(device), y.to(device)  # Move to device
            y_pred = model(X)
            test_loss += loss_fn(y_pred, y).item()
            test_acc += accuracy_fn(y, y_pred.argmax(dim=1))
    test_loss /= len(data_loader)
    test_acc /= len(data_loader)
    print(f"Test loss: {test_loss:.5f} | Test accuracy: {test_acc:.2f}%")

def accuracy_fn(y_true, y_pred):
    return (y_true == y_pred).sum().item() / len(y_true) * 100

# Set seed
torch.manual_seed(42)

# Measure time
train_start_time_on_gpu = timer()

# Set epochs
epochs = 3

# Training loop
for epoch in tqdm(range(epochs)):
    print(f"Epoch: {epoch}\n-------")
    train_step(model=model_1,
               data_loader=train_dataloader,
               loss_fn=loss_fn,
               optimizer=optimizer,
               accuracy_fn=accuracy_fn,
               device=device)
    test_step(model=model_1,
              data_loader=test_dataloader,
              loss_fn=loss_fn,
              accuracy_fn=accuracy_fn,
              device=device)

train_end_time_on_gpu = timer()
total_train_time_model_1 = train_end_time_on_gpu - train_start_time_on_gpu
print(f"Total training time: {total_train_time_model_1:.2f} seconds")



In [None]:

model_0_results

In [None]:
total_train_time_model_0, total_train_time_model_1

### It's interesting to see that the model ran on the GPU took about the same time to run as the one on the cpu! This is likely down to the fact that this model isn't that large, and our code for setting up the layers in the CNN is also not that complex.

> **Note**: Sometimes, depending on your data/hardware you might find that your model trains faster on a CPU than a GPU

> Why is this?
> 1. It could be that the overheadfor copying data/model to and from the GPU outweighs the compute benefits offered by the GPU. So there is some extra time involved in copying the data to the GPU.
> 2. The hardware you're using has a better CPU in terms of its capability than the GPU.

See this article about gpu's:
https://horace.io/brrr_intro.html

#### Now to evaluate our model we need to remember to put the results on the gpu!

In [None]:
def eval_model(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, loss_fn: torch.nn.Module, accuracy_fn, device=device):
    '''
    Returns a dictionary containing the results of model predicting on data_loader
    '''
    loss, acc = 0, 0
    model.eval()
    with torch.inference_mode():
        for X, y in tqdm(data_loader):
            # Make our data device agnostic
            X, y = X.to(device), y.to(device)
            # Make predictions
            y_pred = model(X) # note, don't have to specify model, see above

            # Accumulate the loss and acc values per batch
            loss += loss_fn(y_pred, y)
            acc += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1))

        # Scale loss and acc to find the average loss/acc per batch
        loss /= len(data_loader)
        acc /= len(data_loader)
    return {"model_name": model.__class__.__name__, # only works when model was created with a class
            "model_loss": loss.item(),
            "model_acc": acc}


In [None]:
# Get model_1 results dictionary
model_1_results = eval_model(model=model_1,
                             data_loader=test_dataloader,
                             loss_fn=loss_fn,
                             accuracy_fn=accuracy_fn,
                             device=device)
model_1_results

compared with

In [None]:
model_0_results

## Model 2: Building Convolutional Neural Netowrk (CNN)

CNN's arre known as ConvNET's

CNN's are known for their capabilities to find patterns in visual data

#### Below is a table outlining critical pieces of a CNN

| **Hyperparameter/Layer Type**         | **What does it do?**                                                                 | **Typical Values**                                                                                      |
|----------------------------------------|-------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|
| **Input image(s)**                     | Target images you'd like to discover patterns in                                     | Whatever you can take a photo (or video) of                                                            |
| **Input layer**                        | Takes in target images and preprocesses them for further layers                     | `input_shape = [batch_size, image_height, image_width, color_channels]` (channels last) or              |
|                                        |                                                                                     | `input_shape = [batch_size, color_channels, image_height, image_width]` (channels first)               |
| **Convolution layer**                  | Extracts/learns the most important features from target images                      | Multiple, can create with `torch.nn.ConvXd()` (X can be multiple values)                               |
| **Hidden activation/non-linear activation** | Adds non-linearity to learned features (non-straight lines)                         | Usually ReLU (`torch.nn.ReLU()`), though can be many more                                              |
| **Pooling layer**                      | Reduces the dimensionality of learned image features                                | Max (`torch.nn.MaxPool2d()`) or Average (`torch.nn.AvgPool2d()`)                                       |
| **Output layer/linear layer**          | Takes learned features and outputs them in shape of target labels                   | `torch.nn.Linear(out_features=[number_of_classes])` (e.g., 3 for pizza, steak, or sushi)               |
| **Output activation**                  | Converts output logits to prediction probabilities                                  | `torch.sigmoid()` (binary classification) or `torch.softmax()` (multi-class classification)            |


To find out what's happening inside a CNN, see this website:

https://poloclub.github.io/cnn-explainer/

In [None]:
# Create a CNN
class FashionMNISTModelV2(nn.Module):
    '''
    Model architecture that replicates that replicates the TinyVGG
    model from the CNN Explainer website.
    '''
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1), # values we can set ourselves- hyperparameters
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2,
                         stride=2)
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(in_channels=hidden_units,
            out_channels=hidden_units,
            kernel_size=3,
            padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units,
            out_channels=hidden_units,
            kernel_size=3,
            padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2,)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units*7*7, # note what we are doing here to make the shapes match hidden units=10,shape =7x7 (see below)
            out_features=output_shape)
        )

    def forward(self, x):
        x = self.conv_block_1(x)
        # print(f"Output shape of conv_block_1: {x.shape}")
        x = self.conv_block_2(x)
        # print(f"Output shape of conv_block_2: {x.shape}")
        x = self.classifier(x)
        # print(f"Output shape of classifier: {x.shape}")
        return x

In [None]:
torch.manual_seed(42)
model_2 = FashionMNISTModelV2(input_shape=1, # since we're working with greyscale images the coloe channel is 1
                              hidden_units=10,
                              output_shape=len(class_names)).to(device)

## Stepping through `nn.Conv2d()`
```
classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)```

docs:

https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html


In [None]:
# model_2.state_dict()

In [None]:
torch.manual_seed(42)

# Create a batch of images
images = torch.randn(size=(32, 3, 64, 64))
test_image = images[0]

print(f"Image shape: {image.shape}")
print(f"Single image shape: {test_image.shape}")
print(f"Test image:\n{test_image}")

### `nn.Conv2d` Calculations Explained

The `nn.Conv2d` layer in PyTorch performs a 2D convolution operation, which is often used in image data processing. Below, we break down the key calculations involved:

#### **Input Parameters**
1. **Input Shape**: $(N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})$
   - $N$: Batch size
   - $C_{\text{in}}$: Number of input channels
   - $H_{\text{in}}$: Height of the input
   - $W_{\text{in}}$: Width of the input

2. **Kernel/Filter Size**: $(C_{\text{out}}, C_{\text{in}}, K_H, K_W)$
   - $C_{\text{out}}$: Number of output channels (filters)
   - $K_H$: Kernel height
   - $K_W$: Kernel width

3. **Stride ($s$)**: Determines how much the filter shifts after each operation.
4. **Padding ($p$)**: Adds zeros around the input to maintain or modify the output dimensions.
5. **Dilation ($d$)**: Spacing between elements of the kernel.

#### **Output Dimensions**
The formula for the output height $(H_{\text{out}})$ and output width $(W_{\text{out}})$ is as follows:

$$
\text{Output Dimension} = \left\lfloor \frac{\text{Input Dimension} + 2 \times \text{Padding} - \text{Dilation} \times (\text{Kernel Size} - 1) - 1}{\text{Stride}} + 1 \right\rfloor
$$

This can be rewritten explicitly for height and width:

$$
H_{\text{out}} = \left\lfloor \frac{H_{\text{in}} + 2p - d(K_H - 1) - 1}{s} + 1 \right\rfloor
$$

$$
W_{\text{out}} = \left\lfloor \frac{W_{\text{in}} + 2p - d(K_W - 1) - 1}{s} + 1 \right\rfloor
$$

Where:
- $p$ is the padding,
- $d$ is the dilation,
- $K_H$ and $K_W$ are the kernel dimensions,
- $s$ is the stride.

#### **Example**
Suppose we have the following parameters:
- Input shape: $(1, 3, 32, 32)$ (batch size of 1, 3 channels, 32x32 image)
- Kernel size: $(16, 3, 5, 5)$ (16 filters, each 3x5x5)
- Stride: $s = 1$
- Padding: $p = 0$
- Dilation: $d = 1$

The output height and width are computed as:

$$
H_{\text{out}} = \left\lfloor \frac{32 + 2(0) - 1(5 - 1) - 1}{1} + 1 \right\rfloor = 28
$$

$$
W_{\text{out}} = \left\lfloor \frac{32 + 2(0) - 1(5 - 1) - 1}{1} + 1 \right\rfloor = 28
$$

Thus, the output shape is $(1, 16, 28, 28)$.

#### **Summary**
The convolutional layer applies filters to the input, extracting spatial features using the parameters defined. Understanding these calculations is crucial for designing neural network architectures.


### What is the Derivation of the `nn.Conv2d` Equation?

The `nn.Conv2d` layer performs a convolution operation by sliding a kernel/filter over the input image. Let's derive the formula for calculating the output dimensions.

#### **Key Concepts**

1. **Input Dimensions**: $(H_{\text{in}}, W_{\text{in}})$, where:
   - $H_{\text{in}}$: Height of the input image
   - $W_{\text{in}}$: Width of the input image

2. **Kernel Dimensions**: $(K_H, K_W)$, where:
   - $K_H$: Height of the kernel
   - $K_W$: Width of the kernel

3. **Stride ($s$)**: The number of pixels by which the kernel shifts after each operation.

4. **Padding ($p$)**: Zeros added around the input to modify the effective size of the input.

5. **Dilation ($d$)**: Spacing between kernel elements, which effectively enlarges the kernel size.

#### **Steps to Derive the Formula**

1. **Effective Input Size with Padding**  
   Adding padding of size $p$ to both sides of the input increases the height and width of the input by $2p$ (padding is applied to both the top-bottom and left-right sides).  
   The effective input dimensions become:

   $$
   H_{\text{effective}} = H_{\text{in}} + 2p
   $$

   $$
   W_{\text{effective}} = W_{\text{in}} + 2p
   $$

2. **Effective Kernel Size with Dilation**  
   When dilation is applied, the kernel size effectively increases by $(d - 1)$ spaces between elements. The effective kernel dimensions are:

   $$
   K_H^{\text{effective}} = d \cdot (K_H - 1) + 1
   $$

   $$
   K_W^{\text{effective}} = d \cdot (K_W - 1) + 1
   $$

3. **Output Size Without Stride**  
   If the kernel slides over the input without stride ($s = 1$), the output dimensions are the number of positions the kernel fits into. This is calculated as:

   $$
   H_{\text{out}} = H_{\text{effective}} - K_H^{\text{effective}} + 1
   $$

   $$
   W_{\text{out}} = W_{\text{effective}} - K_W^{\text{effective}} + 1
   $$

4. **Incorporating Stride**  
   Stride determines how far the kernel moves with each step. To account for stride, the output dimensions are divided by the stride value, and we take the floor to ensure integer results:

   $$
   H_{\text{out}} = \left\lfloor \frac{H_{\text{effective}} - K_H^{\text{effective}} + 1}{s} \right\rfloor
   $$

   $$
   W_{\text{out}} = \left\lfloor \frac{W_{\text{effective}} - K_W^{\text{effective}} + 1}{s} \right\rfloor
   $$

5. **Substituting Effective Dimensions**  
   Substitute the values of $H_{\text{effective}}$ and $K_H^{\text{effective}}$:

   $$
   H_{\text{out}} = \left\lfloor \frac{H_{\text{in}} + 2p - d \cdot (K_H - 1) - 1}{s} + 1 \right\rfloor
   $$

   $$
   W_{\text{out}} = \left\lfloor \frac{W_{\text{in}} + 2p - d \cdot (K_W - 1) - 1}{s} + 1 \right\rfloor
   $$

#### **Final Formula**

The final formula for the output dimensions $(H_{\text{out}}, W_{\text{out}})$ is:

$$
H_{\text{out}} = \left\lfloor \frac{H_{\text{in}} + 2p - d \cdot (K_H - 1) - 1}{s} + 1 \right\rfloor
$$

$$
W_{\text{out}} = \left\lfloor \frac{W_{\text{in}} + 2p - d \cdot (K_W - 1) - 1}{s} + 1 \right\rfloor
$$

#### **Sometimes these equations make more sense when we know where they come from. However, this is just my own curiosity**

We see the formula takes into account the input dimensions, kernel size, stride, padding, and dilation to calculate the output dimensions of a 2D convolution operation. Understanding this derivation can (maybe) provide some clarity on how convolutional layers work in PyTorch.


# Understanding "Number of Input Channels" in CNNs

In the context of Convolutional Neural Networks (CNNs), **input channels** refer to the number of distinct data "layers" or "features" present in the input tensor. This concept is crucial for understanding how CNNs process data, particularly when working with images.

---

## What is a Channel?

A **channel** is a dimension of the input data that represents a specific type of information. The concept is most commonly associated with images, where each channel corresponds to a particular aspect of the image's color or intensity.

### Example: RGB Images
For a typical color image, the channels represent:
- **Red (R)**
- **Green (G)**
- **Blue (B)**

Thus, an RGB image has **3 channels**, and its dimensions are typically structured as:
- `[Height, Width, Channels]` (e.g., `[256, 256, 3]` for a 256x256 color image).

### Grayscale Images
A grayscale image has only **1 channel** because it represents intensity values without any color information:
- `[Height, Width, 1]` (e.g., `[256, 256, 1]` for a 256x256 grayscale image).

---

## Input Channels in CNNs

When feeding data into a CNN, the **number of input channels** corresponds to the last dimension of the input tensor. For PyTorch and TensorFlow, the input tensor shapes are:
- **PyTorch**: `[Batch Size, Channels, Height, Width]`
- **TensorFlow**: `[Batch Size, Height, Width, Channels]`

### Example
- For an RGB image: 3 input channels.
- For a grayscale image: 1 input channel.

If you're working with data that has more features (e.g., hyperspectral imaging or multiple time-series features), the number of input channels will match the number of features.

---

## Why Are Channels Important?

1. **Filters Operate Per Channel**:
   - Each filter in a CNN has a depth that matches the number of input channels.
   - For example, a filter of size `[3x3]` for an RGB image will have dimensions `[3x3x3]`.

2. **Feature Extraction**:
   - Different channels can represent different features. For images, this could be colors (RGB), and for non-image data, it could represent any set of measurable quantities.

3. **Scalability**:
   - The network can handle inputs with multiple channels and extract meaningful features from all of them simultaneously.

---

## Channels Beyond Images

Channels are not limited to images. In other types of data:
- **Audio Data**: Channels could represent different frequency bands or stereo channels (left and right).
- **Time-Series Data**: Channels could represent multiple variables measured over time (e.g., temperature, pressure, and humidity).
- **Medical Imaging**: Channels could represent different imaging modalities (e.g., MRI, CT scans).

---

## Practical Notes

1. **Mismatch in Channels**:
   - The number of input channels must match the depth of the CNN's first layer. If they don't match, you'll get an error.
   - Use layers like `torch.nn.Conv2d` (PyTorch) or `tf.keras.layers.Conv2D` (TensorFlow) to define filters that account for the number of input channels.

2. **Expanding Channels**:
   - For single-channel data like grayscale images, you may need to replicate or transform channels to match the input requirements of a pre-trained model (e.g., converting 1 channel to 3 channels for a model expecting RGB).

---

## Summary

- A **channel** represents a distinct feature layer in the input data.
- The **number of input channels** corresponds to the depth of the input tensor.
- Channels are essential for feature extraction in CNNs and vary depending on the type of data being processed.


In [None]:
# Create a single conv2d
torch.manual_seed(42)
conv_layer = nn.Conv2d(in_channels=3,
                       out_channels=10,
                       kernel_size=(3,3), # kernel also known as a filter (3x3)
                       stride=1,
                       padding=1)

# pass the data through the convolutional layer
conv_output = conv_layer(test_image.unsqueeze(0))
conv_output.shape

In [None]:
!nvidia-smi

## Stepping through the `nn.MaxPool2d` part
```
classtorch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
```
docs:

https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#maxpool2d

In [None]:
test_image.shape

# Understanding the Max Pooling Layer in CNNs

When we pass data through a **Max Pooling Layer**, the goal is to reduce the spatial dimensions (Height and Width) of the input data while retaining the most important features. Below, we explain what happens when we apply a max pooling operation to an image after a convolutional layer.

---

## What is Max Pooling?

**Max Pooling** is a down-sampling operation that:
- Divides the input into smaller regions (defined by the pooling kernel size).
- Outputs the maximum value from each region.

This operation helps:
1. **Reduce Computational Complexity**: By shrinking the spatial dimensions, max pooling reduces the number of parameters in the network.
2. **Extract Dominant Features**: By keeping only the maximum values, it emphasizes the strongest activations in a region.
3. **Improve Translational Invariance**: Small shifts in the input image won't drastically change the pooled output.

---

## Example: Max Pooling After a Convolutional Layer

### Code Used in the Class

```python
# Create a single conv2d
torch.manual_seed(42)
conv_layer = nn.Conv2d(in_channels=3,
                       out_channels=10,
                       kernel_size=(3,3), # kernel also known as a filter (3x3)
                       stride=1,
                       padding=1)

# Pass the data through the convolutional layer
conv_output = conv_layer(test_image.unsqueeze(0))
conv_output.shape


In [None]:
# Print out the original image shape without unsqueeze
print(f"Test image original shape {test_image.shape}")
print(f"Test image unsqueezed shape {test_image.unsqueeze(0).shape}")

# Create a sample nn.MaxPool2d layer
max_pool_layer = nn.MaxPool2d(kernel_size=2) # 2x2 square

# Pass data through the conv layer only
test_image_through_conv = conv_layer(test_image.unsqueeze(dim=0))
print(f"Shape after going through conv_layer(): {test_image_through_conv.shape}")

# Pass data through the max pool layer- taking max of values
test_image_through_conv_and_max_pool = max_pool_layer(test_image_through_conv)
print(f"Shape after going through max_pool_layer(): {test_image_through_conv_and_max_pool.shape}")


Let's take a look at this MaxPool function in a bit more detail

In [None]:
torch.manual_seed(42)
# Create a random tensor with a similar number of dimensions (4d)
random_tensor = torch.rand(size=(1, 1, 2, 2))
print(f"\nOur random tensor is:\n {random_tensor}")
print(f"Shape of our random tensor: {random_tensor.shape}")

# Create a max pool layer
max_pool_layer = nn.MaxPool2d(kernel_size=2)

# Pass the random tensor through the max pool layer
max_pool_tensor = max_pool_layer(random_tensor)
print(f"\nTensor after max pooling:\n {max_pool_tensor}")
print(f"Shape of tensor after max pooling: {max_pool_tensor.shape}")


In [None]:
plt.imshow(image.squeeze(), cmap = "magma")

In [None]:
# Pass image through model
image.shape

In [None]:
rand_image_tensor = torch.randn(1, 28, 28)
rand_image_tensor.shape

### you've got to remeber to put this on the device and we have accounted for the potential shape mismatch in the 7*7 in the forward pass above

In [None]:
model_2(rand_image_tensor.unsqueeze(0).to(device))

### e can see that everything is fine up until we get to our output layer, then we get a shape mismatch.

our output from conv_block_2 to the output is flattened

## Set up loss function and optimizer for model_2

In [None]:
from helper_functions import accuracy_fn

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_2.parameters(),
                            lr=0.1)

In [None]:
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Measure time
from timeit import default_timer as timer
train_time_start_model_2 = timer()

# Set epochs
epochs = 3

# Create optimimization loop using train_step() and test_step()
for epoch in tqdm(range(epochs)):
    print(f"Epoch: {epoch}\n-------")
    train_step(model=model_2,
               data_loader=train_dataloader,
               loss_fn=loss_fn,
               optimizer=optimizer,
               accuracy_fn=accuracy_fn,
               device=device)
    test_step(model=model_2,
              data_loader=test_dataloader,
              loss_fn=loss_fn,
              accuracy_fn=accuracy_fn,
              device=device)

train_time_end_model_2 = timer()
total_train_time_model_2 = print_train_time(start=train_time_start_model_2,      end=train_time_end_model_2, device=device)

In [None]:
 model_2_results = eval_model(model=model_2,
                            data_loader=test_dataloader,
                            loss_fn=loss_fn,
                            accuracy_fn=accuracy_fn,
                            device=device)
 model_2_results

Compare results across experiments

In [None]:
import pandas as pd
compare_results = pd.DataFrame([model_0_results,
                                model_1_results,
                                model_2_results])
compare_results

In [None]:
# Add training time to results comparison
compare_results["Training Time"] = [f"{total_train_time_model_0:.3f}",
                                     f"{total_train_time_model_1:.3f}",
                                     f"{total_train_time_model_2:.3f}"]
compare_results

In [None]:
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
plt.subplots(figsize=(8, 8))
df_2dhist = pd.DataFrame({
    x_label: grp['Training Time'].value_counts()
    for x_label, grp in compare_results.groupby('model_name')
})
sns.heatmap(df_2dhist, cmap='magma')
plt.xlabel('model_name')
_ = plt.ylabel('Training Time')

In [None]:
# prompt: Using dataframe compare_results: training time vs accuracy

import altair as alt

# Create a scatter plot with training time on the x-axis and accuracy on the y-axis
alt.Chart(compare_results).mark_circle().encode(
    x='Training Time',
    y='model_acc',
    tooltip=['model_name', 'model_acc', 'Training Time']  # Add tooltips for more information
).properties(
    title='Training Time vs. Accuracy'  # Add a title to the chart
)


## Visualize Model Results

In [None]:
compare_results.set_index("model_name")["model_acc"].plot(kind="barh")
plt.xlabel("accruracy (%)")
plt.ylabel("model");

### Ultimately we want to viaulaize predictions
**so let's use our best performing model to make predictions on random samples from the test_dataset**

In [None]:
def make_predictions(model: torch.nn.Module, data: list, device: torch.device = device):
    pred_probs = []
    model.eval()
    with torch.inference_mode():
        for sample in data:
            # Prepare sample
            sample = torch.unsqueeze(sample, dim=0).to(device) # Add an extra dimension and send sample to device

            # Forward pass (model outputs raw logit)
            pred_logit = model(sample)

            # Get prediction probability (logit -> prediction probability)
            pred_prob = torch.softmax(pred_logit.squeeze(), dim=0) # note: perform softmax on the "logits" dimension, not "batch" dimension (in this case we have a batch size of 1, so can perform on dim=0)

            # Get pred_prob off GPU for further calculations
            pred_probs.append(pred_prob.cpu())

    # Stack the pred_probs to turn list into a tensor
    return torch.stack(pred_probs)


In [None]:
import random
random.seed(42)
test_samples = []
test_labels = []
for sample, label in random.sample(list(test_data), k=9):
    test_samples.append(sample)
    test_labels.append(label)

# View the first test sample shape and label
print(f"Test sample image shape: {test_samples[0].shape}\nTest sample label: {test_labels[0]} ({class_names[test_labels[0]]})")

In [None]:
# Make predictions on test samples with model 2
pred_probs= make_predictions(model=model_2,
                             data=test_samples)

# View first two prediction probabilities list
pred_probs[:2]

In [None]:
# Make predictions on test samples with model 2
pred_probs= make_predictions(model=model_2,
                             data=test_samples)

# View first two prediction probabilities list
pred_probs[:2]

In [None]:
# Make Predictions
pred_probs = make_predictions(model=model_2,
                              data=test_samples,
                              device=device)
# View the first two prediction probabilities list
pred_probs[:2]

### To get the test labels we will be using argmax()- see above

In [None]:
# Convert prediction probabilities to labels
# Turn the prediction probabilities into prediction labels by taking the argmax()
pred_classes = pred_probs.argmax(dim=1)
pred_classes

**are they in the same format as our test labels?**
✅

In [None]:
# Are our predictions in the same form as our test labels?
test_labels, pred_classes

### Since we are making preditions on images let's plot those images along with the predictions

In [None]:
# Plot predictions
plt.figure(figsize=(9, 9))
nrows = 3
ncols = 3
for i, sample in enumerate(test_samples):
  # Create a subplot
  plt.subplot(nrows, ncols, i+1)

  # Plot the target image
  plt.imshow(sample.squeeze(), cmap="gray")

  # Find the prediction label (in text form, e.g. "Sandal")
  pred_label = class_names[pred_classes[i]]

  # Get the truth label (in text form, e.g. "T-shirt")
  truth_label = class_names[test_labels[i]]

  # Create the title text of the plot
  title_text = f"Pred: {pred_label} | Truth: {truth_label}"

  # Check for equality and change title colour accordingly
  if pred_label == truth_label:
      plt.title(title_text, fontsize=10, c="g") # green text if correct
  else:
      plt.title(title_text, fontsize=10, c="r") # red text if wrong
  plt.axis(False);

### So all of our pred_labels and truth_labels just happen to match up here, there are some in our dataset that don't match up

## Make a confusion matrix to evaluate our results futher

A confusion matrix is a great way to evaluate classsification models visually

```
classignite.metrics.confusion_matrix.ConfusionMatrix(num_classes, average=None, output_transform=<function ConfusionMatrix.<lambda>>, device=device(type='cpu'), skip_unrolling=True)
```

```
metric = ConfusionMatrix(num_classes=3)
metric.attach(default_evaluator, 'cm')
y_true = torch.tensor([0, 1, 0, 1, 2])
y_pred = torch.tensor([
    [0.0, 1.0, 0.0],
    [0.0, 1.0, 0.0],
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 1.0, 0.0],
])
state = default_evaluator.run([[y_pred, y_true]])
print(state.metrics['cm'])
```

1. Make predictions with our trained model on our test datset
2. Make a confusion matrix with `torch,etrics.ConfusionMatrix`
3. Plot the confusion matrix using `mixtend.plotting.plot_confusion_matrix()`


In [None]:
# Import tqdm for progress bar
from tqdm.auto import tqdm

# 1. Make predictions with trained model
y_preds = []
model_2.eval()
with torch.inference_mode():
  for X, y in tqdm(test_dataloader, desc="Making predictions"):
    # Send data and targets to target device
    X, y = X.to(device), y.to(device)
    # Do the forward pass
    y_logit = model_2(X)
    # Turn predictions from logits -> prediction probabilities -> predictions labels
    y_pred = torch.softmax(y_logit, dim=1).argmax(dim=1) # note: perform softmax on the "logits" dimension, not "batch" dimension (in this case we have a batch size of 32, so can perform on dim=1)
    # Put predictions on CPU for evaluation
    y_preds.append(y_pred.cpu())
# Concatenate list of predictions into a tensor
y_pred_tensor = torch.cat(y_preds)

In [None]:
len(y_pred_tensor)

### Install torchmetrics

In [None]:
!pip install torchmetrics

#### Torchmetrics does not seem to install with present torch load

In [None]:
try:
    import torchmetrics, mlxtend
    print(f"mkxtend cersion: {mlxtend.__version__}")
    assert int(mlxtend.__version__.split(".") >=19, "mlxtend version should be higher")
except:
    !pip install torchmetrics -U mlxtend
    import torchmetrics, mlxtend
    print(f"mkxtend version: {mlxtend.__version__}")

In [None]:
import mlxtend
print(f"mlxtend version: {mlxtend.__version__}")

### <u>Remember</u>: *when you are working with confusion matrices and are working with matplotlib, you will have to turn the torch tesnsors back to NumPy format*

In [None]:
from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix
import matplotlib.pyplot as plt

# Ensure y_pred_tensor contains predicted class indices
if y_pred_tensor.ndim > 1:  # If y_pred_tensor contains logits or probabilities
    y_pred_tensor = torch.argmax(y_pred_tensor, dim=1)

# Check shape compatibility
assert y_pred_tensor.shape == test_data.targets.shape, (
    f"Shape mismatch: preds={y_pred_tensor.shape}, targets={test_data.targets.shape}"
)

# Initialize confusion matrix metric
confmat = ConfusionMatrix(task="multiclass", num_classes=len(class_names))

# Compute confusion matrix
confmat_tensor = confmat(preds=y_pred_tensor, target=test_data.targets)

# Display confusion matrix tensor
print(confmat_tensor)

# Plot confusion matrix
confmat_array = confmat_tensor.numpy()
fig, ax = plot_confusion_matrix(conf_mat=confmat_array,
                                class_names=class_names,
                                figsize=(10, 7))
plt.show()


### Making confusion matrix look better with seaborn and cmap that highlights colors better and normalization- can be important for presentations to have colors with different values be more visible. Altering the cmap paramter can help you accomplish this.

In [None]:
import seaborn as sns
import numpy as np
from torchmetrics import ConfusionMatrix
import matplotlib.pyplot as plt

# Ensure y_pred_tensor contains predicted class indices
if y_pred_tensor.ndim > 1:  # If y_pred_tensor contains logits or probabilities
    y_pred_tensor = torch.argmax(y_pred_tensor, dim=1)

# Check shape compatibility
assert y_pred_tensor.shape == test_data.targets.shape, (
    f"Shape mismatch: preds={y_pred_tensor.shape}, targets={test_data.targets.shape}"
)

# Initialize confusion matrix metric
confmat = ConfusionMatrix(task="multiclass", num_classes=len(class_names))

# Compute confusion matrix
confmat_tensor = confmat(preds=y_pred_tensor, target=test_data.targets)

# Convert to NumPy array
confmat_array = confmat_tensor.numpy()

# Normalize the confusion matrix (optional)
confmat_normalized = confmat_array / confmat_array.sum(axis=1, keepdims=True)

# Create a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(confmat_normalized,
            annot=True,             # Annotate with numbers
            fmt=".2f",              # Format numbers to two decimal places
            cmap="rocket_r",           # Use a blue color map
            xticklabels=class_names,  # Class names for x-axis
            yticklabels=class_names,  # Class names for y-axis
            cbar_kws={'label': 'Proportion'})  # Add color bar label

# Add axis labels and a title
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Normalized Confusion Matrix")
plt.tight_layout()  # Adjust layout to prevent clipping
plt.show()


### Just output- this is best for EDA applications, takes up smaller amount of memory and clearly states results for the EDA purpose.

In [None]:
confmat_tensor

What does the confusion matrix tell us? It looks like our model is predicting high values of shirt, when the true label is T-shirt/top, the model is also predicting a shirt, when the true label is coat.