# 07. PyTorch Experiment Tracking

https://www.learnpytorch.io/07_pytorch_experiment_tracking/

## Table of Contents

- [All Links in Document](#links)
- [Prepare Data and Model](#prep)
- [Training a Model](#trainmodel)
- [Using TensorBoard](#usetb)

## All Links in Document <a name="links" />

- https://www.tensorflow.org/tensorboard/
- https://code.visualstudio.com/docs/datascience/pytorch-support#_tensorboard-integration
- c

## PyTorch Experiment Tracking

In [1]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch
from torch import nn
import torchvision
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from torchmetrics import Accuracy
from torchinfo import summary
import os
from PIL import Image
from pathlib import Path

In [2]:
print(torch.__version__)
print(torchvision.__version__)

1.13.1+cu117
0.14.1+cu117


So far, plenty of various models have been created, experimented with, and improved over multiple iterations. So far, keeping track of results was done by manually creating dictionaries. This is a rather simple and effective, but rookie approach. Such simple methods may quickly become unsustainable as models grow, or in scenarios where many models are run simultaneously.

The concept of experiment tracking is extremely important and integral to machine learning and deep learning as they are naturally experimental. As experiments and changes are tinkered with, it's necessary to keep track of everything that is going on.

<img src="images/07_experiment_tracking_examples.png" />

The above image illustrates various example of experiment tracking. As stated, it's possible to do it manually through various dictionaries and files, but there are more professional methods available too. This Notebook will focus on TensorBoard due to its widespread use and integration with PyTorch. It's free to use and will only require the installation of the `tensorboard` package. More information about TensorBoard can be found on the Tensorflow website: https://www.tensorflow.org/tensorboard/

The principles to be followed remain the same overall, but some new elements will be included:

- Acquire data
- Create datasets and dataloaders
- Load and customize a pretrained modele
- Train model and track results
- View results in TensorBoard
- Create a helper function to track experiments
- Set up a series of modelling experiments
- View modelling experiments' results in TensorBoard
- Load the best performing model and make predictions with it

*NOTE: Code is handwritten again for even more practice!*

## Prepare Data and Model <a name="prep" />

In [3]:
# Creating data paths
IMAGE_PATH = Path("data/PizzaSteakSushi")
TRAIN_DIR = Path(f"{IMAGE_PATH}/train")
TEST_DIR = Path(f"{IMAGE_PATH}/test")

As a pretrained model is to be used, and the creation of datasets requires the usage of the same transform as the pretrained model's transform, it's first necessary to load the pretrained model and its weights.

In [4]:
# Loading pretrained model and weights
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
model = torchvision.models.efficientnet_b0(weights=weights)

In [5]:
# Loading the pretrained model's transforms
auto_transforms = weights.transforms()
auto_transforms

ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)

In [6]:
# Create datasets
train_data = datasets.ImageFolder(root=TRAIN_DIR, transform=auto_transforms)
test_data = datasets.ImageFolder(root=TEST_DIR, transform=auto_transforms)

In [7]:
# Create dataloaders and class names
train_dataloader = DataLoader(dataset=train_data, batch_size=32, num_workers=os.cpu_count(), shuffle=True)
test_dataloader = DataLoader(dataset=test_data, batch_size=32, num_workers=os.cpu_count(), shuffle=False)
class_names = train_data.classes

In [8]:
# Freezing the model's base layers
for param in model.features.parameters():
    param.requires_grad = False

In [9]:
# Checking model's default classifier
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

In [10]:
# Updating model's default classifier to suit the custom problem
model.classifier = nn.Sequential(
    nn.Dropout(p=0.2, inplace=True),
    nn.Linear(in_features=1280, out_features=len(class_names), bias=True)
)

In [11]:
# Checking model's updated classifier
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=3, bias=True)
)

In [12]:
# Checking summary of the model
summary(model=model,
        input_size=[32, 3, 224, 224],
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=19, 
        device="cpu")

Layer (type:depth-idx)                                  Input Shape         Output Shape        Param #             Trainable
EfficientNet                                            [32, 3, 224, 224]   [32, 3]             --                  Partial
├─Sequential: 1-1                                       [32, 3, 224, 224]   [32, 1280, 7, 7]    --                  False
│    └─Conv2dNormActivation: 2-1                        [32, 3, 224, 224]   [32, 32, 112, 112]  --                  False
│    │    └─Conv2d: 3-1                                 [32, 3, 224, 224]   [32, 32, 112, 112]  (864)               False
│    │    └─BatchNorm2d: 3-2                            [32, 32, 112, 112]  [32, 32, 112, 112]  (64)                False
│    │    └─SiLU: 3-3                                   [32, 32, 112, 112]  [32, 32, 112, 112]  --                  --
│    └─Sequential: 2-2                                  [32, 32, 112, 112]  [32, 16, 112, 112]  --                  False
│    │    └─MBConv: 3

The summary correctly shows how most of the params are non-trainable (as they were frozen earlier), and the output layer is correctly transformed to better suit the custom problem of the PizzaSteakSushi dataset.

## Training a Model <a name="trainmodel" />

In [13]:
# Creating loss function, optimizer, and metric
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)
metric_fn = Accuracy(task="multiclass", num_classes=len(class_names))

Previously, this is where multiple lists and dictionaries would be created to keep track of results during the training and testing loops. However, this is where `SummaryWriter` from `torch.utils.tensorboard` can have a chance to shine. By default, the `SummaryWriter` saves various information about a model to a file set by its `log_dir` parameter. The default location for these logs is `runs/CURRENT_DATETIME_HOSTNAME`, where the `HOSTNAME` is the name of the computer. It's of course customizable where experiments are tracked.

The outputs of the `SummaryWriter()` are saved in TensorBoard format.

In [14]:
# Create a TensorBoard writer with all default settings
writer = SummaryWriter()

To make use of this writer, it's necessary to make several changes to the training and testing loops. To save loss and accuracy values, it's possible to use the `add_scalars(main_tag, tag_scalar_dict)` method of the writer where:

- `main_tag` (string) is the name for the scalars being tracked, e.g., "Accuracy"
- `tag_scalar_dict` (dict) is a dictionary of the values being tracked, e.g., `{"train_loss": 0.1234}`

This method is called `add_scalars()` as the loss and accuracy values are generally scalars (single values). Afterwards, it's necessary to call `writer.close()` to tell the writer to stop looking for values to track.

In [15]:
# Updated training and testing loop to accommodate SummaryWriter
torch.manual_seed(42)
NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
    train_loss, train_acc = 0, 0
    test_loss, test_acc = 0, 0
    
    model.train()
    for batch, (X, y) in enumerate(train_dataloader):
        y_logits = model(X)
        y_pred = torch.softmax(y_logits, dim=1).argmax(dim=1)
        loss = loss_fn(y_logits, y)
        train_loss += loss
        train_acc += metric_fn(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    train_loss /= len(train_dataloader)
    train_acc /= len(train_dataloader)
    
    model.eval()
    with torch.inference_mode():
        for batch, (X, y) in enumerate(test_dataloader):
            test_logits = model(X)
            test_pred = torch.softmax(test_logits, dim=1).argmax(dim=1)
            test_loss += loss_fn(test_logits, y)
            test_acc += metric_fn(test_pred, y)
            
        test_loss /= len(test_dataloader)
        test_acc /= len(test_dataloader)
    
    print(f"Epoch: {epoch}")
    print(f"Train Loss: {train_loss:.5f}, Train Acc: {train_acc:.2f} | Test Loss: {test_loss:.5f}, Test Acc: {test_acc:.2f}")

    writer.add_scalars(main_tag="Loss", global_step=epoch, tag_scalar_dict={"train_loss": train_loss, 
                                                                            "test_loss": test_loss})
    
    writer.add_scalars(main_tag="Accuracy", global_step=epoch, tag_scalar_dict={"train_acc": train_acc, 
                                                                                "test_acc": test_acc})
    
    writer.add_graph(model=model, input_to_model=torch.randn(32, 3, 224, 224))

writer.close()

Epoch: 0
Train Loss: 1.10574, Train Acc: 0.37 | Test Loss: 0.92652, Test Acc: 0.61
Epoch: 1
Train Loss: 0.91985, Train Acc: 0.66 | Test Loss: 0.84583, Test Acc: 0.64
Epoch: 2
Train Loss: 0.79064, Train Acc: 0.75 | Test Loss: 0.67777, Test Acc: 0.91
Epoch: 3
Train Loss: 0.68648, Train Acc: 0.79 | Test Loss: 0.64276, Test Acc: 0.86
Epoch: 4
Train Loss: 0.65532, Train Acc: 0.80 | Test Loss: 0.62078, Test Acc: 0.85


## Using TensorBoard <a name="usetb" />

The `SummaryWriter()` class stores the model's results in a directory called `runs/` in TensorBoard format by default. TensorBoard is a visualization program created by the TensorFlow team to view and inspect information about models and data. It's possible to view this TensorBoard in multiple environments:

- Visual Studio Code
- Jupyter and Colab Notebooks

For the Visual Studio Code method, press `SHIFT + CMD + P` to open the command palette and search for the command: `Python: Launch TensorBoard`. More information about using Visual Studio Code for this purpose can be found here: https://code.visualstudio.com/docs/datascience/pytorch-support#_tensorboard-integration

For the Jupyter and Colab Notebooks method, it's necessary that the `tensorboard` extension is installed. It's then possible to start an interactive TensorBoard session to view TensorBoard files in the runs/ directory. This is done with the following code:

In [16]:
%load_ext tensorboard
%tensorboard --logdir runs