# 07 PyTorch Experiment Tracking

Machine learning is very experimental.

In order to figure out which experiment are worth pursuing, that's where **experiment tracking** comes in, it help you to figure out what doesn't
work so you can figure out what does work


In this notebook, we're going to see an example of programmatically tracking experiments

Resources:
* Book version of notebook: https://www.learnpytorch.io/07_pytorch_experiment_tracking/
* Extra-curriculum: https://madewithml.com/courses/mlops/experiment-tracking/

In [1]:
import torch
import torchvision


In [2]:
# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

In [3]:
def set_seed(seed: int= 42):
  """Sets random sets for torch operations.

   Args:
      seed (int, optional): Random seed to set. Default to 42.
  """

  # Set the seed for general torch operations
  torch.manual_seed(42)

  # Set the seed for CUDA torch operations (ones that happen on the GPU)
  torch.cuda.manual_seed(42)

In [4]:
set_seed()

## 1. Get data

Want to get pizza, steak, sushi images.

So we can run experiments building FoodVision Mini and see which models performs best.

In [5]:
import zipfile
import os
import requests

from pathlib import Path

def download_data(source: str,
                  destination: str,
                  remove_source: bool = True
                  ) -> Path:
    """Download the zipped dataset from source and unzips to destination."""

    # Setup path to data folder
    data_path = Path("data/")
    image_path = data_path / destination

    # If the image folder doesn't exist create it
    if image_path.is_dir():
      print(f'[INFO] {image_path} directory already exists, skipping download')
    else:
      print(f"[INFO] Did not find {image_path} directory, creating one...")
      image_path.mkdir(parents=True, exist_ok=True)

      # Download the target data
      target_file = Path(source).name

      with open(data_path / target_file, "wb") as f:
        request = requests.get(source)
        print(f"[INFO] Downloading {target_file} from {source}...")
        f.write(request.content)

      # Unzip target file
      with zipfile.ZipFile(data_path / target_file, 'r') as zip_ref:
        print(f"[INFO] Unzipping {target_file} data...")
        zip_ref.extractall(image_path)

      # Remove .zip file if needed
      if remove_source:
        os.remove(data_path/ target_file)

    return image_path

In [6]:
image_path = download_data(source = "https://github.com/anuragsingh17ai/Deep-Learning/raw/main/data/pizza_steak_sushi.zip",
                           destination = "pizza_steak_sushi")

[INFO] Did not find data/pizza_steak_sushi directory, creating one...
[INFO] Downloading pizza_steak_sushi.zip from https://github.com/anuragsingh17ai/Deep-Learning/raw/main/data/pizza_steak_sushi.zip...
[INFO] Unzipping pizza_steak_sushi.zip data...


## 2. Create Datasets and DataLoaders

### 2.1 create DataLoaders with manual transforms

The goal with transform is ensure your custom data is formatted in a reproducible way as well as way that suit pretrained models.

In [34]:
try:
  from torchinfo import summary
  print("torchinfo found !.. skipping install")
except:
  print("Installing torchinfo...")
  !pip install torchinfo

try:
  from going_modular import data_setup, engine
  print('going_modular is already present! Skipping download...')
except:
  print("[INFO] Couldn't find going_modular script... downloading them from Github")
  !git clone https://github.com/anuragsingh17ai/Deep-Learning.git
  !mv Deep-Learning/going_modular .
  !rm -rf Deep-Learning
  from going_modular import data_setup, engine

Installing torchinfo...
Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0
going_modular is already present! Skipping download...


In [10]:
# Setup directories
train_dir = image_path / "train"
test_dir = image_path / "test"

test_dir, train_dir

(PosixPath('data/pizza_steak_sushi/test'),
 PosixPath('data/pizza_steak_sushi/train'))

In [13]:
# Setup ImageNet normalization levels
from torchvision.transforms import v2
from torchvision import datasets

manual_transforms = v2.Compose([
                                v2.Resize(224,224),
                                v2.ToImage(),
                                v2.ToDtype(torch.float),
                                v2.Normalize(mean=[0.485, 0.456,0.406],
                                             std = [0.229, 0.224, 0.225]
                                             )
                                ])


print(f"Manually created transfoms: {manual_transforms}")

#Create DataLoaders
from going_modular import data_setup
train_dataloader, test_dataloader, class_names  = data_setup.create_dataloaders(train_dir = train_dir,
                                                                                test_dir=test_dir,
                                                                                transform = manual_transforms,
                                                                                batch_size = 32)


Manually created transfoms: Compose(
      Resize(size=[224], interpolation=224, antialias=True)
      ToImage()
      ToDtype(scale=False)
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)


### 2.2 Create DataLoaders using  automatically created transforms

The same principle applies for automatic transforms: we want our custom data in the same format as a pretrained model trained on

In [22]:
# Setpu dirs
train_dir = image_path / "train"
test_dir = image_path / "test"


# Setup pretrained weights (plenty of these weights available in torchvision.models)
from torchvision.models import EfficientNet_B0_Weights

weights = EfficientNet_B0_Weights.DEFAULT # "DEFAULT" = best available

# Get transfoms from weight (these are the transfoms used to train a particular or obtain a particular set of weights)
automatic_transforms = weights.transforms()
print(f"Automatically created transforms: {automatic_transforms}")

# Create DataLoaders
train_dataloader , test_dataloader, class_names = data_setup.create_dataloaders(train_dir = train_dir,
                                                                                 test_dir= test_dir,
                                                                                 transform= automatic_transforms,
                                                                                 batch_size = 32
                                                                                 )
train_dataloader, test_dataloader,class_names

Automatically created transforms: ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)


(<torch.utils.data.dataloader.DataLoader at 0x78122a619270>,
 <torch.utils.data.dataloader.DataLoader at 0x78122a6c69e0>,
 ['pizza', 'steak', 'sushi'])

## 3. Getting a pretrained model, freeze the base layers and change the classifier head

In [29]:
# Note: This is how a pretrained model would be created prior to torchvision v0.13
# model = torchvision.models.efficientnet_b0(pretrained=True).to(device) # OLD

# Download the pretrained weights for EfficientNet_B0
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # "DEFAULT" = best available weights

# Setup the model with the pretrained weights and send it to the target device
model = torchvision.models.efficientnet_b0(weights=weights).to(device)
# model

In [30]:
# Freeze all base layers by setting their requires_grad attribute to False
for param in model.features.parameters():
  param.requires_grad = False

In [31]:
from torch import nn

set_seed()
model.classifier = nn.Sequential(
    nn.Dropout(p=0.2, inplace=True),
    nn.Linear(in_features = 1280, out_features=len(class_names))
).to(device)

In [38]:
from torchinfo import summary
summary(model,
        input_size = (32,3,224,224),
        verbose = 0,
        col_names= ["input_size", "output_size", "num_params","trainable"],
        col_width = 20,
        row_settings = ["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

## 4. Train a single model and track results

In [39]:
# Define loss function optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model.parameters(),
                             lr= 0.001)

To track experiments, we're goint to use TensorBoard: https://www.tensorflow.org/tensorboard

And to interact with TensorBoard, we can use PyTorch's SummaryWriter


In [40]:
# Setup a SummaryWriter
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
writer

<torch.utils.tensorboard.writer.SummaryWriter at 0x7811c58e6290>

In [47]:
from typing import Dict, List, Tuple

import torch

from tqdm.auto import tqdm

from going_modular.engine import train_step, test_step


def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device) -> Dict[str, List[float]]:
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        test_loss, test_acc = test_step(model=model,
          dataloader=test_dataloader,
          loss_fn=loss_fn,
          device=device)

        # Print out what's happening
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

        ### New: Experiments  tracking ###
        # See SummaryWriter documentation
        writer.add_scalars(main_tag = "Loss",
                           tag_scalar_dict={"train_loss": train_loss,
                                            "test_loss": test_loss,
                                            },
                           global_step = epoch)

        writer.add_scalars(main_tag="Accuracy",
                           tag_scalar_dict={"train_acc": train_acc,
                                            "test_acc": test_acc},
                           global_step=epoch)
        writer.add_graph(model=model,
                         input_to_model = torch.randn(32,3,224,224).to(device))

    # Close the writer
    writer.close()
    ## End new ###


    # Return the filled results at the end of the epochs
    return results

In [48]:
# Train model
# Note: not using engine.train(), since we updated the train() function above
set_seed()

results = train(model = model,
                 train_dataloader=train_dataloader,
                 test_dataloader = test_dataloader,
                 optimizer=optimizer,
                 loss_fn = loss_fn,
                 epochs = 5,
                 device = device)


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.6862 | train_acc: 0.8945 | test_loss: 0.7297 | test_acc: 0.8759
Epoch: 2 | train_loss: 0.7366 | train_acc: 0.7539 | test_loss: 0.6617 | test_acc: 0.8759
Epoch: 3 | train_loss: 0.6150 | train_acc: 0.7930 | test_loss: 0.5568 | test_acc: 0.8759
Epoch: 4 | train_loss: 0.5734 | train_acc: 0.7930 | test_loss: 0.5375 | test_acc: 0.9062
Epoch: 5 | train_loss: 0.5633 | train_acc: 0.8008 | test_loss: 0.5401 | test_acc: 0.8561


In [None]:
## 9..................