<a href="https://colab.research.google.com/github/Muntasir2179/pytorch-learnig/blob/experiment-tracking/07_pytorch_experiment_tracking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch Experiment Tracking

Machine Learning is very experimental.

In order to figure out which experiments are worth persuing, that's where **experiment tracking** comes in, it helps us to figure out what doesn't work so we can know what does work.

In this notebook, we are going to see an example of programmatically tracking experiments.

In [3]:
import torch
from torch import nn
import torchvision
from torchvision import transforms
import matplotlib.pyplot as plt

print(torch.__version__)
print(torchvision.__version__)

2.1.1+cu121
0.16.1+cu121


In [4]:
# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular.going_modular import data_setup, engine
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

[INFO] Couldn't find torchinfo... installing it.
[INFO] Couldn't find going_modular scripts... downloading them from GitHub.
Cloning into 'pytorch-deep-learning'...
remote: Enumerating objects: 4036, done.[K
remote: Counting objects: 100% (1224/1224), done.[K
remote: Compressing objects: 100% (223/223), done.[K
remote: Total 4036 (delta 1068), reused 1080 (delta 998), pack-reused 2812[K
Receiving objects: 100% (4036/4036), 651.02 MiB | 16.29 MiB/s, done.
Resolving deltas: 100% (2361/2361), done.
Updating files: 100% (248/248), done.


In [7]:
# setting up device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

# function for setting manual seed
def set_seed():
  torch.manual_seed(42)
  torch.cuda.manual_seed(42)

cuda


# 1.0 Getting the data

In [10]:
import os
import zipfile
from pathlib import Path
import requests

# example source: https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip

def download_data(source: str,
                  destination: str,
                  remove_source: bool = True) -> Path:
  """Downloads a zipped dataset from source and unzips to destinaiton."""
  # setup path to data folder
  data_path = Path("data/")
  image_path = data_path / destination

  # if the image folder doesn't exist, create it
  if image_path.is_dir():
    print(f"[INFO] {image_path} directory already exists, skipping download.")
  else:
    print(f"[INFO] Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)

    # download the target file
    target_file = Path(source).name   # name of the file that the path contains
    with open(data_path / target_file, "wb") as f:
      request = requests.get(source)
      print(f"[INFO] Downloading {target_file} from {source}....")
      f.write(request.content)

    # unzipping the target file
    with zipfile.ZipFile(data_path / target_file, "r") as zip_ref:
      print(f"[INFO] Unzipping {target_file} data...")
      zip_ref.extractall(image_path)

    # remove .zip file if needed
    if remove_source:
      os.remove(data_path / target_file)

    return image_path

In [12]:
image_path = download_data(source = "https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                           destination = "pizza_steak_sushi")
image_path

[INFO] Did not find data/pizza_steak_sushi directory, creating one...
[INFO] Downloading pizza_steak_sushi.zip from https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip....
[INFO] Unzipping pizza_steak_sushi.zip data...


PosixPath('data/pizza_steak_sushi')

# 2.0 Create Datasets and DataLoaders

## 2.1 Create DataLoaders with manual transforms

The goal with transforms is to ensure custom data in formatted in a reproducible way as well as a way that will suit pretrained models.

In [13]:
# setting up the directories
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

(PosixPath('data/pizza_steak_sushi/train'),
 PosixPath('data/pizza_steak_sushi/test'))

In [17]:
# setting up ImageNet normalization lavels
# see hare: https://pytorch.org/vision/0.12/models.html
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

# creating transform pipeline manually
from torchvision import transforms
manual_transforms = transforms.Compose([
    transforms.Resize(size=(224, 224)),
    transforms.ToTensor(),
    normalize
])

from going_modular.going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir = train_dir,
                                                                             test_dir = test_dir,
                                                                             transform = manual_transforms,
                                                                             batch_size = 32)
train_dataloader, test_dataloader, class_names

(<torch.utils.data.dataloader.DataLoader at 0x7be96f4e52a0>,
 <torch.utils.data.dataloader.DataLoader at 0x7be96f4e41f0>,
 ['pizza', 'steak', 'sushi'])