# Computer vision with PyTorch

- Task: Image classification - Task of assigning a label or class to an entire image.

- Data: FashionMNIST - A dataset of Zalando's (online retailer) garment images with their labels.

- Model: A very simple model of three layers for illustrative purposes.

- Framework: PyTorch - A machine learning framework based on the Torch library, used for creating deep neural networks applications, such as computer vision tasks.

![](figures/ComputerVisionTask.png)

## Step 1. Import useful libraries

In [None]:
# Import PyTorch: Framework for developing code for artificial neural network models
import torch
from torch import nn

# Import torchvision that consists of popular datasets, model architectures, and common image transformations for computer vision.
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor

# Import matplotlib for visualisations (static, animated, and interactive)
import matplotlib.pyplot as plt

# Import tqdm for progress bar (loops show a progress meter)
from tqdm.auto import tqdm

# Generate random numbers
import random

## Step 2. Set-up the train and the test data

In this example we are using the FashionMNIST dataset.

It has a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

---
**Note:**

The FashionMNIST is a typical dataset used for benchmarking in computer vision and it exists in the `torchvision.datasets` library, hence we will load it from there. In real world applications where we have our own data we have to load them from a storage space or an online source.

In [None]:
# Set-up training data - Look at the data folder that will be created and its contents
train_data = datasets.FashionMNIST(
    root="data",  # location where the downloaded data should be stored
    train=True,  # get the training data
    download=True,  # download the data in the right folder if they don't already exist
    transform=ToTensor(),  # transform images from PIL format to Torch tensors
    target_transform=None,  # apply transformation to labels
)

# Setup testing data
test_data = datasets.FashionMNIST(
    root="data", train=False, download=True, transform=ToTensor()  # get the test data
)

In [None]:
# Check the size of the train and test data
len(train_data.data), len(train_data.targets), len(test_data.data), len(test_data.targets)

In [None]:
# Look at the type of the train and test data
print(type(train_data))
print(type(test_data))

Both datasets have the type `torch.dataset`.
This type stores the samples and their corresponding labels.

In [None]:
# What are the available 10 classes of garments?
class_names = train_data.classes
class_names

In [None]:
# Let's look at the type of a sample image, its content and its shape
image, label = train_data[0]
image, label

print("The type of the image is a tensor:", type(image))
print(image)
print(label)

print(image.shape)


In [None]:
# Let's visualise one image
image, label = train_data[3]
plt.imshow(image.squeeze(), cmap="gray")
plt.title(class_names[label])

### Create the dataloaders

The `torch.dataset` type stores the samples and their corresponding labels. However, while training a model we need to be able to handle it efficiently e.g. iterate over it and split it in batches.

Hence, we convert the Dataset to a Dataloader. A DataLoader wraps an iterable around the Dataset to enable easy access to the samples and help with the internal processing that happens in PyTorch.

In [None]:
from torch.utils.data import DataLoader

# Setup the batch size hyperparameter
BATCH_SIZE = 32

# Turn datasets into iterables (batches)
train_dataloader = DataLoader(
    train_data,  # dataset to turn into iterable
    batch_size=BATCH_SIZE,  # how many samples per batch?
    shuffle=True,  # shuffle data every epoch?
)

test_dataloader = DataLoader(
    test_data,
    batch_size=BATCH_SIZE,
    shuffle=False,  # don't necessarily have to shuffle the testing data
)

In [None]:
# Let's check out what we've created
print(f"Dataloaders: {train_dataloader, test_dataloader}")
print(f"Length of train dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}")
print(f"Length of test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}")

In [None]:
# Check out what's inside the training dataloader
train_features_batch, train_labels_batch = next(iter(train_dataloader))
train_features_batch.shape, train_labels_batch.shape

## Step 3. Modelling

### a. Identify the available device

Deep learning models perform thousands of computations and require high speed. GPUs (Graphics Processing Unit) enable multiple, simultaneous computations and speed up the training process. Hence, we need to check if a GPU is available so as to use it, else we proceed with our CPU.

**Note:**

In the simple examples of the widely used benchmark datasets and simple models training on a CPU will be completed quite quickly. In real-world problems training even on a GPU might require hours or even days so performing this on a CPU would be inefficient and time-consuming, and could even fail.

In [None]:
# Setup device agnostic code
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
device

### b. Define the model

First, for illustrative purposes, in the cell below we define a very very simple model (would never be used in real life problems) that consists of a single layer to see how passing the input through a layer affects its shape. Next, we will create a 'more complex' model.

In [None]:
# Create a flatten layer
flatten_model = (
    nn.Flatten()
)  # all nn modules function as a model (can do a forward pass)

# Get a single sample
x = train_features_batch[0]

# Flatten the sample
output = flatten_model(x)  # perform forward pass

# Print out what happened
print(f"Shape before flattening: {x.shape} -> [color_channels, height, width]")
print(f"Shape after flattening: {output.shape} -> [color_channels, height*width]")


Now, let's define a more 'complex' model

In [None]:
from torch import nn


class FashionMNISTBaselineModel(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(),  # neural networks like their inputs in vector form
            nn.Linear(
                in_features=input_shape, out_features=hidden_units
            ),  # in_features = number of features in a data sample (784 pixels)
            nn.Linear(in_features=hidden_units, out_features=output_shape),
        )

    def forward(self, x):
        return self.layer_stack(x)

After we defined the model class, we will instantiate an object of that class, by passing the appropriate parameters.

In [None]:
# Need to setup model with input parameters
model_0 = FashionMNISTBaselineModel(
    input_shape=784,  # one for every pixel (28x28)
    hidden_units=10,  # how many units in the hiden layer
    output_shape=len(class_names),  # one for every class
)

### c. Setup the loss and the optimiser

Cross entropy loss is a metric used in machine learning to measure how well a classification model performs. It is a number between 0 and 1. The closest to 0 the best the model.

The optimizer is an algorithm that adjusts the parameters of the neural network (e.g. weights, biases, learning rate), in order to reduce the overall loss and improve the accuracy.


In [None]:
# Setup loss function and optimizer
loss_fn = (
    nn.CrossEntropyLoss()
)  # this is also called "criterion"/"cost function"
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.1)


### d. Define the accuracy

The accuracy is used to measure how well the model performs. It calculates how many correct predictions we had.

In [None]:
# Calculate the accuracy (a classification metric)
def accuracy_fn(y_true, y_pred):
    """Calculates accuracy between truth labels and predictions.

    Args:
        y_true (torch.Tensor): Truth labels for predictions.
        y_pred (torch.Tensor): Predictions to be compared to predictions.

    Returns:
        [torch.float]: Accuracy value between y_true and y_pred, e.g. 78.45
    """
    correct = torch.eq(y_true, y_pred).sum().item()
    acc = (correct / len(y_pred)) * 100
    return acc

**Note** The *accuracy* measures the percentage of correct predictions made by a model, while the *Cross Entropy* measures the difference between the predicted output and the ground truth.

### e. Create the training and the test loop

This is where we train and test our model

In [None]:
# Set the number of epochs
epochs = 3 # We use a small number to have faster training time.

# Create training and testing loop
for epoch in tqdm(range(epochs)): #tqdm visualises the progress of the loop
    print(f"Epoch: {epoch}\n")

    ### Training loop
    train_loss = 0

    for X, y in train_dataloader:
        # print(batch)
        model_0.train()

        # 1. Forward pass - calculation of the values of the output layers from the input data
        y_pred = model_0(X)

        # 2. Calculate loss (per batch)
        loss = loss_fn(y_pred, y)
        train_loss += loss  # accumulate the loss per epoch

        # 3. For the Optimizer set gradients of all model parameters to zero
        optimizer.zero_grad()

        # 4. Loss backward - backpropagation (computes dloss/dx for every parameter)
        loss.backward()

        # 5. Optimizer step (performs a single optimization step - parameter update)
        optimizer.step()

    # Calculate the average loss per batch per epoch
    train_loss /= len(train_dataloader)

    ### Testing loop
    # Initialise the variables that will accumulate loss and accuracy
    test_loss, test_acc = 0, 0

    # Layers and parts of the model behave differently during training and inference
    model_0.eval()
    # Context manager that ensures the operations have no interactions with training characteristics e.g. gradient calculations
    with torch.inference_mode():
        for X, y in test_dataloader:
            # 1. Forward pass - calculation of the values of the output layers from the input data
            test_pred = model_0(X)

            # 2. Calculate the loss
            test_loss += loss_fn(test_pred, y)  # accumulate the loss per epoch

            # 3. Calculate the accuracy
            test_acc += accuracy_fn(y_true=y, y_pred=test_pred.argmax(dim=1)) # accumulate per epoch

        # Within the torch.inference_mode() calculate the metrics (test loss and test accuracy)
        test_loss /= len(test_dataloader) # per batch, as the len(test_dataloader) = no of batches
        test_acc /= len(test_dataloader) # per batch

    # Print out the train and the test loss, and the accuracy.
    # As the epochs progress the losses should be reducing and the accuracy increasing
    print(f"\nTrain loss: {train_loss:.4f} | Test loss: {test_loss:.4f}, Test acc: {test_acc:.2f}%\n")

## Step 4. Make predictions

In [None]:
def make_predictions(model: torch.nn.Module, img: torch.Tensor, device: torch.device = device):
    pred_probs = []
    model.eval()
    with torch.inference_mode():
          # Prepare the sample image
          img = torch.unsqueeze(img, dim=0).to(device) # Add an extra dimension and send sample to device

          # Forward pass to get the predictions
          # The model outputs raw logits which are unnormalised and not easy to interpret
          pred_logit = model(img)
          # print(f"\nTrain loss: {pred_logit}") # Uncomment to see what the logits looks like

          # Get the prediction probabilities using Softmax
          pred_prob = torch.softmax(pred_logit.squeeze(), dim=0)

          # Remove the predictions from the GPU if you later want to apply additional calculations
          pred_probs.append(pred_prob.cpu())

    # Stack the pred_probs to turn them into a tensor
    return torch.stack(pred_probs)

In [None]:
# Make predictions on test samples with the baseline model

# Get a random index for an image and select the image from the train_data
random_garment_idx = random.randint(0, len(train_data))
sample_image, ground_truth_label = train_data[random_garment_idx]

# Calculate the predicted probabilities by calling the `make_predictions` method that we previously defined
pred_probs = make_predictions(model=model_0, img=sample_image)

# Get the predicted class which is the one that had the highest probability
pred_class = pred_probs.argmax(dim=1)
print(f" The predicted class is: {pred_class.item()}, {class_names[pred_class.item()]}")

In [None]:
# Check what is the actual ground truth label.
# If it is the same as the predicted, the model had a correct prediction.
print(f" The actual class is: {ground_truth_label}, {class_names[ground_truth_label]}")

## Further improvements, additional steps and point to consider

- Save the model for further usage using `torch.save()` and saving the `state_dict()` of the model.

- Use functions for code re-usability

- Use more sophisticated NN architectures

- Use experiment tracking e.g. MLFlow

- Compare model performance for various models to select the best performing model

- Use pre-trained models + transfer learning + fine-tuning to make them more specific to the task of interest

- In deployment: track model performance over time, re-train