# 07 PyTorch Experiment Tracking

Machine Learning is very experimental.

In order to figure out which ML experiments are worth pursuing, that's where **experiment tracking** comes into play.

It helps you figure out what doesn't work and what does work.

In this notebook, we're going to see an example of programmatically tracking experiments.

In [1]:
import torch
import torchvision

print(torch.__version__)
print(torchvision.__version__)

2.3.1+cu121
0.18.1+cu121


In [2]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install if it doesn't work
try:
    from torchinfo import summary
    print("torchinfo imported.")
except:
    print("[INFO] Couldn't find torchinfo... installing it")
    !pip install torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular import data_setup, engine, get_data
    print("going_modular modules imported successfully.")
except:
    # Get the going modular scripts
    print("Going modular not found.")

[INFO] Couldn't find torchinfo... installing it
Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl.metadata (21 kB)
Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0
Going modular not found.


In [6]:
# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

In [4]:
# Set seeds
def set_seeds(seed: int=42):
    """Sets random sets for torch operations.

    Args:
        seed (int, optional): Random seed to set. Defaults to 42.
    """
    # Set the seed for general torch operations
    torch.manual_seed(seed)
    # Set the seed for CUDA torch operations (ones that happen on the GPU)
    torch.cuda.manual_seed(seed)

set_seeds()

## 1. Get data

We want to get pizza, steak, sushi images.

So we can run experiments building FoodVision Mini and see which model performs the best.

In [5]:
train_dir , test_dir = get_data.download_data(
    raw_url_to_dataset="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip"
    )

NameError: name 'get_data' is not defined

## 2. Create Datasets and DataLoaders

### 2.1 Create DataLoaders with manual transforms

The goal with transforms is to ensure your custom data is formatted in a reproducible way as well as a way that will suit pretrained models.

In [None]:
# Setup the directories
train_dir, test_dir

In [None]:
# Setup ImagNet normalization levels
# See here for documentation: https://pytorch.org/vision/0.12/models.html
from torchvision import transforms

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

manual_transform = transforms.Compose([
    transforms.Resize(size = (224,224)),
    transforms.ToTensor,
    normalize
])

print([f"Manually created transforms: {manual_transform}"])

# Create DataLoaders
from going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
                                                                               test_dir=test_dir,
                                                                               train_transform=manual_transform,
                                                                               test_transform=manual_transform,
                                                                               batch_size=32,
                                                                               num_workers=0)

train_dataloader, test_dataloader, class_names

## 2. Create Datasets and DataLoaders

### 2.1 Create DataLoaders with manual transforms

The goal with transforms is to ensure your custom data is formatted in a reproducible way as well as a way that will suit pretrained models.

In [None]:
# Setup the directories
train_dir, test_dir

In [None]:
# Setup ImagNet normalization levels
# See here for documentation: https://pytorch.org/vision/0.12/models.html
from torchvision import transforms

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

manual_transform = transforms.Compose([
    transforms.Resize(size = (224,224)),
    transforms.ToTensor(),
    normalize
])

print([f"Manually created transforms: {manual_transform}"])

# Create DataLoaders
from going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
                                                                               test_dir=test_dir,
                                                                               train_transform=manual_transform,
                                                                               test_transform=manual_transform,
                                                                               batch_size=32,
                                                                               num_workers=0)

train_dataloader, test_dataloader, class_names

### 2.2 Creating DataLoaders using automatically created transforms

The same principle applies for automatic transforms: we want our custom data in the same format as a pretrained model was trained on.

In [None]:
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # "DEFAULT" means best available weights
weights

In [None]:
# Get the transforms used to create our pretrained weights
auto_transforms = weights.transforms()
auto_transforms

In [None]:
# Create DataLoaders using automatic transforms
from going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
    train_dir=train_dir,
    test_dir = test_dir,
    train_transform=auto_transforms,
    test_transform= auto_transforms,
    batch_size=32,
    num_workers=0
)

train_dataloader, test_dataloader, class_names

## 3. Getting a pretrained model, freeze the base layers and change the classifier head.

In [None]:
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

model = torchvision.models.efficientnet_b0(weights=weights).to(device)

In [None]:
# Print the model information with torchinfo
from torchinfo import summary

summary(model=model,
        input_size=(1,3,224,224),
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"])

In [None]:
# Freezing the model
for param in model.features.parameters():
    param.requires_grad = False

In [None]:
# Updating the classifier head of our model to suit our problem
from torch import nn

torch.manual_seed(42)
torch.cuda.manual_seed(42)
model.classifier = nn.Sequential(
    nn.Dropout(p=0.2, inplace = True),
    nn.Linear(in_features=1280,
              out_features=len(class_names))
).to(device)

model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=3, bias=True)
)

In [None]:
summary(model=model,
        input_size=(1,3,224,224),
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [1, 3, 224, 224]     [1, 3]               --                   Partial
├─Sequential (features)                                      [1, 3, 224, 224]     [1, 1280, 7, 7]      --                   False
│    └─Conv2dNormActivation (0)                              [1, 3, 224, 224]     [1, 32, 112, 112]    --                   False
│    │    └─Conv2d (0)                                       [1, 3, 224, 224]     [1, 32, 112, 112]    (864)                False
│    │    └─BatchNorm2d (1)                                  [1, 32, 112, 112]    [1, 32, 112, 112]    (64)                 False
│    │    └─SiLU (2)                                         [1, 32, 112, 112]    [1, 32, 112, 112]    --                   --
│    └─Sequential (1)                                        [1, 32, 112, 112]    [1, 1

## 4. Train a single model and track results

In [None]:
# Define the loss_fn and optimizer
loss_fn = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(params=model.parameters(),
                             lr = 0.001)

To track experiments, we're going to bee using TensorBoard: https://www.tensorflow.org/tensorboard/

And to imteract with TensorBoard, we can use PyTorch's SummaryWriter: https://pytorch.org/docs/stable/tensorboard.html

* Also, see here: https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter

In [None]:
# Setup a SummaryWriter
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter(log_dir='runs')
writer

<torch.utils.tensorboard.writer.SummaryWriter at 0x188eec18790>

In [None]:
from going_modular.engine import train_step, test_step

from typing import Dict, List, Tuple
import torch
from tqdm.auto import tqdm

def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device) -> Dict[str, List[float]]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Args:
      model: A PyTorch model to be trained and tested.
      train_dataloader: A DataLoader instance for the model to be trained on.
      test_dataloader: A DataLoader instance for the model to be tested on.
      optimizer: A PyTorch optimizer to help minimize the loss function.
      loss_fn: A PyTorch loss function to calculate loss on both datasets.
      epochs: An integer indicating how many epochs to train for.
      device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
      A dictionary of training and testing loss as well as training and
      testing accuracy metrics. Each metric has a value in a list for
      each epoch.
      In the form: {train_loss: [...],
                    train_acc: [...],
                    test_loss: [...],
                    test_acc: [...]}
      For example if training for epochs=2:
                  {train_loss: [2.0616, 1.0537],
                    train_acc: [0.3945, 0.3945],
                    test_loss: [1.2641, 1.5706],
                    test_acc: [0.3400, 0.2973]}
    """
    # Create empty results dictionary
    results = {"train_loss": [],
        "train_acc": [],
        "test_loss": [],
        "test_acc": []
    }

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                            dataloader=train_dataloader,
                                            loss_fn=loss_fn,
                                            optimizer=optimizer,
                                            device=device)
        test_loss, test_acc = test_step(model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn,
            device=device)

        # Print out what's happening
        print(
            f"Epoch: {epoch+1} | "
            f"train_loss: {train_loss:.4f} | "
            f"train_acc: {train_acc:.4f} | "
            f"test_loss: {test_loss:.4f} | "
            f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

        ### New: Experiment tracking ###
        writer.add_scalars(main_tag = "Loss",
                           tag_scalar_dict = {"train_loss":train_loss,
                                              "test_loss":test_loss},
                           global_step = epoch)

        writer.add_scalars(main_tag = "Accuracy",
                           tag_scalar_dict = {"train_acc":train_acc,
                                              "test_acc" :test_acc},
                           global_step = epoch)

        writer.add_graph(model=model,
                         input_to_model = torch.randn(32,3,224,224).to(device))

        # Close the writer
        writer.close()
        ### End New ###

    # Return the filled results at the end of the epochs
    return results

In [None]:
# Train model
# Note: not using engine.py or engine.train() funtion above

set_seeds()
results = train(model=model,
train_dataloader = train_dataloader,
test_dataloader = test_dataloader,
optimizer = optimizer,
loss_fn = loss_fn,
epochs = 25,
device = device)

  0%|          | 0/25 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.2516 | train_acc: 0.9727 | test_loss: 0.3690 | test_acc: 0.8561


  4%|▍         | 1/25 [00:07<03:00,  7.52s/it]

Epoch: 2 | train_loss: 0.3598 | train_acc: 0.8320 | test_loss: 0.4157 | test_acc: 0.8456


  8%|▊         | 2/25 [00:15<03:02,  7.93s/it]

Epoch: 3 | train_loss: 0.3332 | train_acc: 0.8516 | test_loss: 0.3752 | test_acc: 0.8665


 12%|█▏        | 3/25 [00:24<03:00,  8.22s/it]

Epoch: 4 | train_loss: 0.2891 | train_acc: 0.9727 | test_loss: 0.3662 | test_acc: 0.8665


 16%|█▌        | 4/25 [00:32<02:52,  8.24s/it]

Epoch: 5 | train_loss: 0.3146 | train_acc: 0.8359 | test_loss: 0.4046 | test_acc: 0.8769


 20%|██        | 5/25 [00:41<02:50,  8.52s/it]

Epoch: 6 | train_loss: 0.2657 | train_acc: 0.9609 | test_loss: 0.3725 | test_acc: 0.8665


 24%|██▍       | 6/25 [00:51<02:51,  9.03s/it]

Epoch: 7 | train_loss: 0.2553 | train_acc: 0.9688 | test_loss: 0.3585 | test_acc: 0.8769


 28%|██▊       | 7/25 [01:00<02:39,  8.87s/it]

## 5. View our model's results with TensorBoard

In [None]:
# Let's view our expreiments from within the notebook
%load_ext tensorboard
#%reload_ext tensorboard
%tensorboard --logdir runs