# 07 PyTorch Experiment Tracking 

ML is very experimental.

In orded to figure out which experiments are worth pursuing, that's where **experiment tracking** comes in, it helps you to 
figure out what doens't work so you can figure it out what **does** work.

## 0. Setting basic things

In [1]:
import torch
import torchvision 
print(f"Torch version: {torch.__version__} | Torchvision: {torchvision.__version__}")

Torch version: 1.13.1+cu117 | Torchvision: 0.14.1+cu117


In [2]:
# Standar libs for this sections
import matplotlib.pyplot as plt 
import torch
import torchvision  
import torchinfo 

from torch import nn
from torchvision import transforms  

try:
    from going_modular import data_setup, engine
except:
    print(f"[INFO] Couldn't find going_modular scripts... downloading them from GitHub...")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular import data_setup, engine 


In [3]:
# Setup device agnostic code 
dev = "cuda" if torch.cuda.is_available() else "cpu"

  return torch._C._cuda_getDeviceCount() > 0


In [4]:
# Set seeds
def set_seeds(seed: int = 42) -> None:
    '''
    Sets the random seeds for torch operations.
    
    Args: 
        seed(int, optional): Randon seed to set. Default is 42.
    '''
    torch.manual_seed(42)
    torch.cuda.manual_seed(42)
    return 

## 1. Getting the data  

In [5]:
import os 
import zipfile 

from pathlib import Path

import requests as rq
 
def download_data(src:str,
                  dest:str,
                  rm_src: bool = True) -> Path:
    '''
    Downloads a zipped dataset, unzips it and optional it can remove the zip file.
    The data will be saved in a ".data" directory.
    Args:
        src: Source (raw url) of the data.
        dest: Name of .
        rm_src: Optional arg to remove the data. Default is |True|.
    '''
    
    data_path = Path("./.data/")
    image_path = data_path/dest
    zip_path = Path(str(image_path) + ".zip")
    
    # If data folder do not exist, create it 
    if not data_path.exists():
        print(f"[INFO] .data directory do not exists, creating it...")
        data_path.mkdir(parents=True)
    else:
        print(f"[INFO] .data directory exists, skipping this step.")
    # If data is not downloaded, download
    if not image_path.exists():
        print(f"[INFO] data is not downloaded, downloading it...")
        
        # Getting the zipfile
        req = rq.get(src)
        with zip_path.open("wb") as f:
            f.write(req.content)
            
        # Unzipping 
        with zipfile.ZipFile(zip_path) as zip_ref:
            print(f"[INFO] unzipping...")
            zip_ref.extractall(image_path)
        
        # Removing if needed
        if rm_src:
            os.remove(zip_path)
    else:
        print(f"[INFO] data already downloaded, skipping this step.")

    return image_path

In [6]:
image_path = download_data(src="https://github.com/mrdbourke/pytorch-deep-learning/raw/refs/heads/main/data/pizza_steak_sushi.zip",
              dest="pizza_steak_sushi")
print(f"Data downloaded at: {str(image_path)}")

[INFO] .data directory exists, skipping this step.
[INFO] data already downloaded, skipping this step.
Data downloaded at: .data/pizza_steak_sushi


## 2. Creating datasets and dataloaders

### 2.1 Create dataloaders with manual transforms 

The goal with transforms is to ensure your custom data is formatted in a reproducible way as well as a way that will suit pretrained models.

In [7]:
# Setup directories
train_dir = image_path/"train"
test_dir = image_path/"test"

str(train_dir), str(test_dir)

('.data/pizza_steak_sushi/train', '.data/pizza_steak_sushi/test')

In [9]:
# Setup ImageNet normalization levels 
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225]
)

# Create transform pipeline manually 
manual_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    normalize
])
print(f"Manually created transforms: {manual_transform}")

# Create dataloaders
from going_modular import data_setup
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=str(train_dir),
                                                                               test_dir=str(test_dir),
                                                                               batch_size=32,
                                                                               train_transform=manual_transform,
                                                                               test_transform=manual_transform)
train_dataloader, test_dataloader, class_names 

Manually created transforms: Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=None)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)


(<torch.utils.data.dataloader.DataLoader at 0x7d390f4b35e0>,
 <torch.utils.data.dataloader.DataLoader at 0x7d38621d2910>,
 ['pizza', 'steak', 'sushi'])