# PyTorch Experiment Tracking

Given all the other models that have been written so far within this repository, this file will focus on developing a FoodVision Mini Experiment Tracking. 

### What is Experiment Tracking?

Experiment tracking is the process of systematically running different models simultaneously. This helps practitioners keep track of different configurations, results, and metadata for their experiments over time. 

This concept is key because due to the extensive set of variable hyperparameters, models, datasets, and results to manage, not having a proper tracking system can make it very challenging to identify what led to success or failure. 

Experiment tracking helps you figure out what works and what does not. 

### Why track experiments?

As the number of experiments you run starts to increase, tracking the results of each through print outs and a few dictionaries becomes unfeasible since it could easily get out of hand. 

### Different ways to track machine learning experiments

Here are a couple of ways to perform Experiment Tracking:

* TensorBoard
* Weights & Biases Experiment Tracking
* MLFlow

## 0. Setup

To save some time coding, we will leverage some of the Python scripts. 

In [1]:
try:
    import torch
    import torchvision
    assert int(torch.__version__.split(".")[0]) >= 2, "torch version should be 2.+"
    assert int(torchvision.__version__.split(".")[1]) >= 15, "torchvision version should be 0.15+"
    print(f"torch version: {torch.__version__}")
    print(f"torchvision version: {torchvision.__version__}")
except:
    print(f"[INFO] torch/torchvision versions not correct. Installing correct versions.")
    !pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
    import torch
    import torchvision
    print(f"torch version: {torch.__version__}")
    print(f"torchvision version: {torchvision.__version__}")


torch version: 2.2.2
torchvision version: 0.17.2


In [2]:
import matplotlib as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it")
    !pip install -q torchinfo
    from torchinfo import summary

try:
    from going_modular import data_setup, engine
except:
    print("[INFO] Could not find going_modular scripts. Downloading them from GitHub.")
    !git clone https://github.com/Aaron-Serpilin/Zero-To-Mastery-Pytorch
    !mv Zero-To-Mastery-Pytorch/Fundamentals/going_modular .
    !rm -rf Zero-To-Mastery-Pytorch
    from going_modular import data_setup, engine


In [3]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

In [4]:
def set_seeds (seed: int=42):
    """
    Sets random set for torch operations.

    Args:
        seed (int, optional): Random eed to set. Defaults to 42. 
    """

    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)

## 1. Get data

In [5]:
import os
import zipfile

from pathlib import Path

import requests

def download_data (source: str,
                   destination: str,
                   remove_source: bool = True) -> Path:
    """ 
    Downloads a zipped dataset from source and unzips to destination

    Args:
        source (str): A link to a zipped file containing data
        destination (str): A target directory to unzip data to
        remove_source (bool): Whether to remove the source after downloading and extracting. 

    Returns:
        pathlib.Path to download data
    """

    data_path = Path("data/")
    image_path = data_path / destination

    if image_path.is_dir():
        print(f"[INFO] {image_path} directory exists, skipping download")
    else:
        print(f"[INFO] Did not find {image_path} directory, creating one...")
        image_path.mkdir(parents=True, exist_ok=True)

        target_file = Path(source).name
        with open(data_path / target_file, "wb") as f:
            request = requests.get(source)
            print(f"[INFO] Downloading {target_file} from {source}...")
            f.write(request.content)

        # Unzip pizza, steak, sushi data
        with zipfile.ZipFile(data_path / target_file, "r") as zip_ref:
            print(f"[INFO] Unzipping {target_file} data...") 
            zip_ref.extractall(image_path)

        # Remove .zip file
        if remove_source:
            os.remove(data_path / target_file)
    
    return image_path

image_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                           destination="pizza_steak_sushi")
image_path

[INFO] data/pizza_steak_sushi directory exists, skipping download


PosixPath('data/pizza_steak_sushi')

## 2. Create Datasets and DataLoaders

For this first part, we will be using transfer learning. Hence, to transform the images into tensors, we can use:

1. Manually created transforms using `torchvision.transforms`
2. Automatically created transforms using `torchvision.models.MODEL_NAME_WEIGHTS.DEFAULT.transforms()`. Here, `MODEL_NAME` is a specific `torchvision.models` architecture, `MODEL_WEIGHTS` is a specific set of pre-trained weights and `DEFAULT` means the best available weights. 

All pre-trained `torchvision.models` are pre-trained on ImageNet, meaning we have to normalize in ImageNet format. Hence, we can do this with:

`normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])`

### 2.1 Create DataLoaders using manually created transforms

In [6]:
train_dir = image_path / "train"
test_dir = image_path / "test"

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

manual_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    normalize
])

print(f"Manually created transforms: {manual_transforms}")

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
    train_dir=train_dir,
    test_dir=test_dir,
    transform=manual_transforms,
    batch_size=32
)

train_dataloader, test_dataloader, class_names

Manually created transforms: Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=True)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)


(<torch.utils.data.dataloader.DataLoader at 0x13d288fd0>,
 <torch.utils.data.dataloader.DataLoader at 0x13d2e8310>,
 ['pizza', 'steak', 'sushi'])

### 2.2 Create DataLoaders using automatically created transforms

We can do this by first instantiating a set of pre-trained weights we would like to use and calling the `transforms()` method on it.

In [7]:
train_dir = image_path / "train"
test_dir = image_path / "test"

weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

automatic_transforms = weights.transforms()

print(f"Automatically created transforms: {automatic_transforms}")

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
    train_dir=train_dir,
    test_dir=test_dir,
    transform=automatic_transforms,
    batch_size=32
)

train_dataloader, test_dataloader, class_names

Automatically created transforms: ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)


(<torch.utils.data.dataloader.DataLoader at 0x13d277650>,
 <torch.utils.data.dataloader.DataLoader at 0x13d275bd0>,
 ['pizza', 'steak', 'sushi'])

## 3. Getting a pre-trained model, freezing the base layers and changing the classifier head

In [8]:
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
model = torchvision.models.efficientnet_b0(weights=weights).to(device)
model

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

Now that we have the pre-trained model, we can turn it into a feature extractor model.

We will freeze the base layers of the model to suit the number of classes we are working with.

In [9]:
for param in model.features.parameters():
    param.requires_grad = False

set_seeds()

model.classifier = torch.nn.Sequential(
    nn.Dropout(p=0.2, inplace=True),
    nn.Linear(in_features=1280,
              out_features=len(class_names),
              bias=True).to(device)
)

In [10]:
from torchinfo import summary

summary(model,
        input_size=(32, 3, 224, 224),
        verbose=0,
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"]
)

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

## 4. Train model and track results



In [11]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Previously, we have been tracking our modelling experiments using multiple Python dictionaries. When running a few experiments however, this can quickly get out of hand.

Hence, we can use PyTorch's `torch.utils.tensorboard.SummaryWriter()` class to save various parts of our model's training progress to file. 

By default, the `SummaryWriter()` class saves various information about our model to a file set by the `log_dir` parameter. 

The default location for `log_dir` is under `runs/CURRENT_DATETIME_HOSTNAME` where the `HOSTNAME` is the name of your computer. This can be changed though, the filename is customizable. 

The outputs of the `SummaryWriter()` are saved in TensorBoard format. 

In [12]:
try:
    from torch.utils.tensorboard import SummaryWriter
except:
    print("[INFO] Couldn't find tensorboard... installing it.")
    !pip install -q tensorboard
    from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()

[INFO] Couldn't find tensorboard... installing it.


Now that we have the writer, we can write a training loop, or adjust the existing `train()` function created in the `engine.py` file. Specifically, we will add the ability for the `train()` function to log our model's training and test loss and accuracy values. 

We can do this with `writer.add_scalars(main_tag, tag_scalar_dict)` where:
* `main_tag` (string) - the name for the scalars being tracked
* `tag_scalar_dict` (dict) - a dictionary of the values being tracked

Once we finish tracking values, we can call `writer.close()` to tell the writer to stop looking for values to track. 

In [13]:
from typing import Dict, List
from tqdm.auto import tqdm

from going_modular.engine import train_step, test_step

def train (model: torch.nn.Module,
           train_dataloader: torch.utils.data.DataLoader,
           test_dataloader: torch.utils.data.DataLoader,
           optimizer: torch.optim.Optimizer,
           loss_fn: torch.nn.Module,
           epochs: int,
           device: torch.device) -> Dict[str, List]:
    """ 
    Trains and tests a PyTorch model.

    Passes a target PyTorch model through train_step() and test_step() functions for a number of epochs,
    training and testing the model in teh same epoch loop. 

    Calculates, prints, and stores metrics throughout.

    Args:
        model: A PyTorch model to be trained and tested.
        train_dataloader: A DataLoader instance for the model to be trained on.
        test_dataloader: A DataLoader instance for the model to be tested on.
        optimizer: A PyTorch optimizer to help minimize the loss function.
        loss_fn: A PyTorch loss function to calculate loss on both datasets. 
        epochs: An integer indicating how many epochs to train for.
        device: A target device to compute on. 

    Returns:
        A dictionary of training and testing loss as well as training and
        testing accuracy metrics. Each metric has a value in a list for 
        each epoch.
        In the form: {train_loss: [...],
                    train_acc: [...],
                    test_loss: [...],
                    test_acc: [...]} 
        For example if training for epochs=2: 
                {train_loss: [2.0616, 1.0537],
                    train_acc: [0.3945, 0.3945],
                    test_loss: [1.2641, 1.5706],
                    test_acc: [0.3400, 0.2973]} 
    """

    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    for epoch in tqdm(range(epochs)):

        train_loss, train_acc = train_step(model=model,
                                           dataloader=train_dataloader,
                                           loss_fn=loss_fn,
                                           optimizer=optimizer,
                                           device=device)
        
        test_loss, test_acc = test_step(model=model,
                                        dataloader=test_dataloader,
                                        loss_fn=loss_fn,
                                        device=device)
        
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.3f} | "
          f"train_acc: {train_acc:.3f} | "
          f"test_loss: {test_loss:.3f} | "
          f"test_acc: {test_acc:.3f}"
        )

        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

        # Experiment Tracking

        # Add loss results to SummaryWriter
        writer.add_scalars(main_tag="Loss",
                           tag_scalar_dict={"train_loss": train_loss,
                                            "test_loss": test_loss},
                                            global_step=epoch)
        
        # Add accuracy results to SummaryWriter
        writer.add_scalars(main_tag="Accuracy",
                           tag_scalar_dict={"train_acc": train_acc,
                                            "test_acc": test_acc},
                                            global_step=epoch)
        
        writer.add_graph(model=model,
                         input_to_model=torch.randn(32, 3, 224, 224).to(device))
        
    writer.close()

    return results

In [None]:
set_seeds()
results = train(model=model,
                train_dataloader=train_dataloader,
                test_dataloader=test_dataloader,
                optimizer=optimizer,
                loss_fn=loss_fn,
                epochs=5, 
                device=device)