<a href="https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/03_pytorch_computer_vision_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 03. PyTorch Computer Vision Exercises

The following is a collection of exercises based on computer vision fundamentals in PyTorch.

They're a bunch of fun.

You're going to get to write plenty of code!

## Resources

1. These exercises are based on [notebook 03 of the Learn PyTorch for Deep Learning course](https://www.learnpytorch.io/03_pytorch_computer_vision/). 
2. See a live [walkthrough of the solutions (errors and all) on YouTube](https://youtu.be/_PibmqpEyhA). 
  * **Note:** Going through these exercises took me just over 3 hours of solid coding, so you should expect around the same.
3. See [other solutions on the course GitHub](https://github.com/mrdbourke/pytorch-deep-learning/tree/main/extras/solutions).

In [None]:
# Check for GPU (passes)
# !nvidia-smi

In [None]:
# Import torch
import torch
from torch import nn

# Exercises require PyTorch > 1.10.0
print(torch.__version__)

# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

## 1. What are 3 areas in industry where computer vision is currently being used?

In [None]:
# self driving cars
# Healthcare
# Robots

## 2. Search "what is overfitting in machine learning" and write down a sentence about what you find. 

In [None]:
# When the model matches the training data but performs terribly on the test data

## 3. Search "ways to prevent overfitting in machine learning", write down 3 of the things you find and a sentence about each. 
> **Note:** there are lots of these, so don't worry too much about all of them, just pick 3 and start with those.

In [None]:
# From copilot:
# Data Augmentation: Increase the diversity of your training set by applying random transformations (e.g., rotation, scaling, cropping, flipping) to the input images. This helps the model generalize better.

# Regularization: Apply regularization techniques such as L1 and L2 regularization to penalize large weights in the model, which can reduce overfitting by preventing the model from becoming too complex.

# Dropout: Randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much.

# Early Stopping: Monitor the model's performance on a validation set and stop training when the performance starts to degrade. This ensures that the model doesn't overfit by training for too many epochs.

# Reduce Model Complexity: If your model is too complex (has too many parameters) relative to the amount of training data, consider using a simpler model or reducing the complexity of your current model.

# Use More Data: If possible, collect more data or use data augmentation techniques to increase the size of the training set. More data can help the model learn better generalizations.

# Batch Normalization: Although primarily used to help with training stability and convergence, batch normalization can also have a regularizing effect, potentially reducing overfitting.

# Cross-validation: Use cross-validation techniques to better estimate the model's performance on unseen data. This can help in tuning the model and regularization parameters more effectively.

# Use Pretrained Models: For deep learning tasks, consider using a pretrained model and fine-tuning it on your specific task. Pretrained models have already learned a lot of useful features from large datasets and can generalize better.

# Ensemble Methods: Combine the predictions of several models to improve the generalization ability. Ensembles of models often perform better than individual models.



## 4. Spend 20-minutes reading and clicking through the [CNN Explainer website](https://poloclub.github.io/cnn-explainer/).

* Upload your own example image using the "upload" button on the website and see what happens in each layer of a CNN as your image passes through it.

## 5. Load the [`torchvision.datasets.MNIST()`](https://pytorch.org/vision/stable/generated/torchvision.datasets.MNIST.html#torchvision.datasets.MNIST) train and test datasets.

In [None]:
# Import PyTorch
import torch
from torch import nn

# Import torchvision 
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor

# Import matplotlib for visualization
import matplotlib.pyplot as plt
import tqdm

# Check versions
# Note: your PyTorch version shouldn't be lower than 1.10.0 and torchvision version shouldn't be lower than 0.11
print(f"PyTorch version: {torch.__version__}\ntorchvision version: {torchvision.__version__}")

In [None]:
import requests
from pathlib import Path 

# Download helper functions from Learn PyTorch repo (if not already downloaded)
if Path("helper_functions.py").is_file():
  print("helper_functions.py already exists, skipping download")
else:
  print("Downloading helper_functions.py")
  # Note: you need the "raw" GitHub URL for this to work
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

In [None]:
from timeit import default_timer as timer 
def print_train_time(start: float, end: float, device: torch.device = None):
    """Prints difference between start and end time.

    Args:
        start (float): Start time of computation (preferred in timeit format). 
        end (float): End time of computation.
        device ([type], optional): Device that compute is running on. Defaults to None.

    Returns:
        float: time between start and end in seconds (higher is longer).
    """
    total_time = end - start
    print(f"Train time on {device}: {total_time:.3f} seconds")
    return total_time

In [None]:
# Setup training data
fashion_train_data = datasets.FashionMNIST(
    root="data", # where to download data to?
    train=True, # get training data
    download=True, # download data if it doesn't exist on disk
    transform=ToTensor(), # images come as PIL format, we want to turn into Torch tensors
    target_transform=None # you can transform labels as well
)

# Setup testing data
fashion_test_data = datasets.FashionMNIST(
    root="data",
    train=False, # get test data
    download=True,
    transform=ToTensor()
)

In [None]:
# See first training sample
image, label = fashion_train_data[0]
image, label

In [None]:
# How many samples are there? (no data attribute)
len(fashion_train_data), len(fashion_test_data)

In [None]:
import torchvision.transforms as transforms

# Create the Grayscale transform
grayscale_transform = transforms.Grayscale(num_output_channels=1)

import matplotlib.pyplot as plt
image, label = fashion_train_data[0]
image = grayscale_transform(image)
print(f"Image shape: {image.shape}")
plt.imshow(image.squeeze()) # image shape is [3, 695, 1024] (colour channels, height, width)
plt.title(label)

In [None]:
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder

height = 224
width = 224

# Setup training data
aircraft_train_data = datasets.FGVCAircraft(
    root="data", # where to download data to?
    split="train", # get training data
    download=True, # download data if it doesn't exist on disk
    transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.ToPILImage(),  # Convert tensor images to PIL images
    # transforms.Grayscale(num_output_channels=1),  # Convert to grayscale
    transforms.Resize((height, width)),  # Resize the image to 224x224 pixels
    transforms.ToTensor()  # Convert PIL images back to tensors
]), # images come as PIL format, we want to turn into Torch tensors
    target_transform=None # you can transform labels as well
)

# Setup testing data
aircraft_test_data = datasets.FGVCAircraft(
    root="data",
    split="test", # get test data
    download=True,
    transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.ToPILImage(),  # Convert tensor images to PIL images
    # transforms.Grayscale(num_output_channels=1),  # Convert to grayscale
    transforms.Resize((height, width)),  # Resize the image to 224x224 pixels
    transforms.ToTensor()  # Convert PIL images back to tensors
])
)

In [None]:
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

# Define your transform
transform = transforms.Compose([
    transforms.Resize((height, width)),  # Resize to 224x224 pixels
    transforms.ToTensor()
])

# Apply transform to your dataset
dataset = ImageFolder(root='data\\fgvc-aircraft-2013b\\data', transform=transform)

# Create DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Now, dataloader will yield batches of images resized to 224x224 pixels

In [None]:
# Example: Iterate over the dataloader and print the shape of each batch
for images, labels in dataloader:
    print(images.shape)  # Should print [batch_size, channels, 224, 224]
    break

In [None]:
# See first training sample
image, label = aircraft_train_data[0]
image, label

In [None]:
# See classes
class_names = aircraft_train_data.classes
class_names

In [None]:
# What's the shape of the image?
image.shape

In [None]:
# How many samples are there? (no data attribute)
len(aircraft_train_data), len(aircraft_test_data)

## 6. Visualize at least 5 different samples of the MNIST training dataset.

In [None]:
# Just going to do 2
import torchvision.transforms as transforms

# Create the Grayscale transform
# grayscale_transform = transforms.Grayscale(num_output_channels=1)

import matplotlib.pyplot as plt
test_image, label = aircraft_train_data[0]
# image = grayscale_transform(image)
# print(f"Image shape: {image.shape} -> [batch_size, color_channels, height, width]")
print(f"Single image shape: {test_image.shape} -> [color_channels, height, width]") 
print(f"Single image pixel values:\n{test_image}")
test_image_permuted = test_image.permute(1, 2, 0)
plt.imshow(test_image_permuted) # image shape is [3, 400, 400] (colour channels, height, width), channels should be last
plt.title(label)

In [None]:
torch.manual_seed(42)

# Create a convolutional layer with same dimensions as TinyVGG 
# (try changing any of the parameters and see what happens)
conv_layer = nn.Conv2d(in_channels=3,
                       out_channels=10,
                       kernel_size=3,
                       stride=1,
                       padding=0) # also try using "valid" or "same" here 

# Pass the data through the convolutional layer
conv_layer(test_image) # Note: If running PyTorch <1.11.0, this will error because of shape issues (nn.Conv.2d() expects a 4d tensor as input) 

## 7. Turn the MNIST train and test datasets into dataloaders using `torch.utils.data.DataLoader`, set the `batch_size=32`.

In [None]:
from torch.utils.data import DataLoader

# Setup the batch size hyperparameter
BATCH_SIZE = 32

# Turn datasets into iterables (batches)
aircraft_train_dataloader = DataLoader(aircraft_train_data, # dataset to turn into iterable
    batch_size=BATCH_SIZE, # how many samples per batch? 
    shuffle=True # shuffle data every epoch?
)

aircraft_test_dataloader = DataLoader(aircraft_test_data,
    batch_size=BATCH_SIZE,
    shuffle=False # don't necessarily have to shuffle the testing data
)

# Let's check out what we've created
print(f"Dataloaders: {aircraft_train_dataloader, aircraft_test_dataloader}") 
print(f"Length of train dataloader: {len(aircraft_train_dataloader)} batches of {BATCH_SIZE}")
print(f"Length of test dataloader: {len(aircraft_test_dataloader)} batches of {BATCH_SIZE}")

## 8. Recreate `model_2` used in notebook 03 (the same model from the [CNN Explainer website](https://poloclub.github.io/cnn-explainer/), also known as TinyVGG) capable of fitting on the MNIST dataset.

In [None]:
# Create a convolutional neural network 
class AircraftMNISTModelV2(nn.Module):
    """
    Model architecture copying TinyVGG from: 
    https://poloclub.github.io/cnn-explainer/
    """
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.kernel_size = 3
        self.stride = 1
        self.padding = 1
        print(f"input shape (how many channels): {input_shape}\nhidden units (how many output channels): {hidden_units}\noutput shape: {output_shape}")
        self.block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, 
                      out_channels=hidden_units, 
                      kernel_size=self.kernel_size, # how big is the square that's going over the image?
                      stride=self.stride, # default
                      padding=self.padding),# options = "valid" (no padding) or "same" (output has same shape as input) or int for specific number 
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, 
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2,
                         stride=2) # default stride value is same as kernel_size
        )
        self.block_2 = nn.Sequential(
            nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            # Where did this in_features shape come from? 
            # It's because each layer of our network compresses and changes the shape of our inputs data.
            nn.Linear(in_features=200704, 
                      out_features=output_shape)
        )
    
    def forward(self, x: torch.Tensor):
        x = self.block_1(x)
        # print(x.shape)
        x = self.block_2(x)
        # print(x.shape)
        x = self.classifier(x)
        # print(x.shape)
        return x

torch.manual_seed(42)
model_2 = AircraftMNISTModelV2(input_shape=3, 
    hidden_units=64, # used to be 10
    output_shape=len(aircraft_train_data.classes)).to(device)
model_2

In [None]:
# Import accuracy metric
from helper_functions import accuracy_fn # Note: could also use torchmetrics.Accuracy(task = 'multiclass', num_classes=len(class_names)).to(device)

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss() # this is also called "criterion"/"cost function" in some places
optimizer = torch.optim.SGD(params=model_2.parameters(), lr=0.1)

In [None]:
def train_step(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               accuracy_fn,
               device: torch.device = device):
    train_loss, train_acc = 0, 0
    model.to(device)
    for batch, (X, y) in enumerate(data_loader):
        # Send data to GPU
        X, y = X.to(device), y.to(device)

        # 1. Forward pass
        y_pred = model(X)

        # 2. Calculate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss
        train_acc += accuracy_fn(y_true=y,
                                 y_pred=y_pred.argmax(dim=1)) # Go from logits -> pred labels

        # 3. Optimizer zero grad
        optimizer.zero_grad()

        # 4. Loss backward
        loss.backward()

        # 5. Optimizer step
        optimizer.step()

    # Calculate loss and accuracy per epoch and print out what's happening
    train_loss /= len(data_loader)
    train_acc /= len(data_loader)
    print(f"Train loss: {train_loss:.5f} | Train accuracy: {train_acc:.2f}%")

def test_step(data_loader: torch.utils.data.DataLoader,
              model: torch.nn.Module,
              loss_fn: torch.nn.Module,
              accuracy_fn,
              device: torch.device = device):
    test_loss, test_acc = 0, 0
    model.to(device)
    model.eval() # put model in eval mode
    # Turn on inference context manager
    with torch.inference_mode(): 
        for X, y in data_loader:
            # Send data to GPU
            X, y = X.to(device), y.to(device)
            
            # 1. Forward pass
            test_pred = model(X)
            
            # 2. Calculate loss and accuracy
            test_loss += loss_fn(test_pred, y)
            test_acc += accuracy_fn(y_true=y,
                y_pred=test_pred.argmax(dim=1) # Go from logits -> pred labels
            )
        
        # Adjust metrics and print out
        test_loss /= len(data_loader)
        test_acc /= len(data_loader)
        print(f"Test loss: {test_loss:.5f} | Test accuracy: {test_acc:.2f}%\n")

## 9. Train the model you built in exercise 8. for 5 epochs on CPU and GPU and see how long it takes on each.

In [None]:
# Import tqdm for progress bar
from tqdm.auto import tqdm

torch.manual_seed(42)

# Measure time
from timeit import default_timer as timer
train_time_start_model_2 = timer()

if not MODEL_SAVE_PATH.exists():
    # Train and test model 
    epochs = 10
    for epoch in tqdm(range(epochs)):
        print(f"Epoch: {epoch}\n---------")
        train_step(data_loader=aircraft_train_dataloader, 
            model=model_2, 
            loss_fn=loss_fn,
            optimizer=optimizer,
            accuracy_fn=accuracy_fn,
            device=device
        )
        test_step(data_loader=aircraft_test_dataloader,
            model=model_2,
            loss_fn=loss_fn,
            accuracy_fn=accuracy_fn,
            device=device
        )

    train_time_end_model_2 = timer()
    total_train_time_model_2 = print_train_time(start=train_time_start_model_2,
                                            end=train_time_end_model_2,
                                            device=device)

In [None]:
from pathlib import Path

# 1. Create models directory 
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# 2. Create model save path 
MODEL_NAME = "03_pytorch_computer_vision_model_2.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# 3. Save the model state dict 
if not MODEL_SAVE_PATH.exists():
    print(f"Saving model to: {MODEL_SAVE_PATH}")
    torch.save(obj=model_2.state_dict(), f=MODEL_SAVE_PATH) # only saving the state_dict() only saves the models learned parameters
    print("Model saved successfully.")
else:
    print("Model already exists.")

## 10. Make predictions using your trained model and visualize at least 5 of them comparing the prediciton to the target label.

In [None]:
from pathlib import Path

# 1. Create models directory 
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# 2. Create model save path 
MODEL_NAME = "03_pytorch_computer_vision_model_2.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# 3. Save the model state dict 
if not MODEL_SAVE_PATH.exists():
    print(f"Saving model to: {MODEL_SAVE_PATH}")
    torch.save(obj=model_2.state_dict(), f=MODEL_SAVE_PATH) # only saving the state_dict() only saves the models learned parameters
    print("Model saved successfully.")
else:
    print("Model already exists.")

In [None]:
# Instantiate a fresh instance of LinearRegressionModelV2
loaded_model_1 = AircraftMNISTModelV2(input_shape=3, 
    hidden_units=64, # used to be 10
    output_shape=len(aircraft_train_data.classes)).to(device)

# Load model state dict 
loaded_model_1.load_state_dict(torch.load(MODEL_SAVE_PATH))

# Put model to target device (if your data is on GPU, model will have to be on GPU to make predictions)
loaded_model_1.to(device)

print(f"Loaded model:\n{loaded_model_1}")
print(f"Model on device:\n{next(loaded_model_1.parameters()).device}")

In [None]:
# Import torch
import torch

# Import tqdm for progress bar
from tqdm.auto import tqdm

# 1. Make predictions with trained model
y_preds = []
model_2.eval()
with torch.inference_mode():
  for X, y in tqdm(aircraft_test_dataloader, desc="Making predictions"):
    # Send data and targets to target device
    X, y = X.to(device), y.to(device)
    # Do the forward pass
    y_logit = model_2(X)
    # Turn predictions from logits -> prediction probabilities -> predictions labels
    y_pred = torch.softmax(y_logit, dim=1).argmax(dim=1) # note: perform softmax on the "logits" dimension, not "batch" dimension (in this case we have a batch size of 32, so can perform on dim=1)
    # Put predictions on CPU for evaluation
    y_preds.append(y_pred.cpu())
# Concatenate list of predictions into a tensor
y_pred_tensor = torch.cat(y_preds)

In [None]:
y_pred_tensor

## 11. Plot a confusion matrix comparing your model's predictions to the truth labels.

In [None]:
from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix

# 2. Setup confusion matrix instance and compare predictions to targets
confmat = ConfusionMatrix(num_classes=len(class_names), task='multiclass')
confmat_tensor = confmat(preds=y_pred_tensor,
                         target=aircraft_test_data.targets)

# 3. Plot the confusion matrix
fig, ax = plot_confusion_matrix(
    conf_mat=confmat_tensor.numpy(), # matplotlib likes working with NumPy 
    class_names=class_names, # turn the row and column labels into class names
    figsize=(10, 7)
)

## 12. Create a random tensor of shape `[1, 3, 64, 64]` and pass it through a `nn.Conv2d()` layer with various hyperparameter settings (these can be any settings you choose), what do you notice if the `kernel_size` parameter goes up and down?

## 13. Use a model similar to the trained `model_2` from notebook 03 to make predictions on the test [`torchvision.datasets.FashionMNIST`](https://pytorch.org/vision/main/generated/torchvision.datasets.FashionMNIST.html) dataset. 
* Then plot some predictions where the model was wrong alongside what the label of the image should've been. 
* After visualing these predictions do you think it's more of a modelling error or a data error? 
* As in, could the model do better or are the labels of the data too close to each other (e.g. a "Shirt" label is too close to "T-shirt/top")?