# 05. PyTorch Going Modular

Notebook Reference: https://www.learnpytorch.io/05_pytorch_going_modular/

Within ML, modularity involves turning notebook code made up of a series of cells into Python files. 
For example, for `04_PyTorch_Custom_Datasets.ipynb` we can transform the cells into the following files:

* `get_data.py` - a file that downloads data if needed
* `data_setup.py` - a file to prepare and download data if needed
* `engine.py` - a file containing various training functions
* `model_builder.py` or `model.py` - a file to create a PyTorch model
* `train.py` - a file to leverage all other files and train a target PyTorch model
* `utils.py` - a file dedicated to helpful utility functions



Modularity is valuable since it makes these scripts more reproducible and easier to run. However, companies such as Netflix do defend the notion of using notebooks as their production code. 

Below are some pros and cons of both Notebooks and Python scripts.

Notebooks:
* Easier to experiment and get started; easier to share through a Google Collab notebook; very visual
* Versioning can be hard; it is hard to use only specific parts; text and graphics can get in the way

Python scripts:
* Can package together, avoiding constantly rewriting code; can use git for versioning; used by many open source projects; larger projects can be run on cloud vendors
* Experimenting is not as visual as notebooks since you have to run the entire script rather than one cell

A common structure for running PyTorch models written in Python scripts is the following:

`python train.py --model MODEL_NAME --batch_size BATCH_SIZE --lr LEARNING_RATE --num_epochs NUM_EPOCHS`

Within this command, the double-dash values are known as argument flags, and the capitalized argument following them are the various hyper parameters we can set. 

## 0. Cell mode vs Script mode

A cell mode notebook is a notebook run normally, where each cell in the notebook is either code or markdown.

A script mode notebook is very similar to a cell mode notebook, however, many of the code cells may be turned into Python scripts. 

## 1. Get data

In [11]:
%%writefile going_modular/get_data.py 
import os
import zipfile
import requests

from pathlib import Path

# Path to data folder
data_path = Path("data")
image_path = data_path / "pizza_steak_sushi"

if image_path.is_dir():
    print(f"{image_path} directory exists.")
else:
    print(f"Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)
    
# Dataset
with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading pizza, steak, sushi data...")
    f.write(request.content)

# Unzipping
with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping pizza, steak, sushi data...") 
    zip_ref.extractall(image_path)

# Remove zip file
os.remove(data_path / "pizza_steak_sushi.zip")

Overwriting going_modular/get_data.py


## 2. Create Datasets and DataLoaders (`data_setup.py`)

To make the transformation process for our training and testing `Dataset`'s and `DataLoader`'s, we can make one function called `create_dataloaders()`.

We can write it to file using the line `%%writefile going_modular/data_setup.py`.

In [12]:
%%writefile going_modular/data_setup.py
"""
Creates PyTorch DataLoaders for image classification data.
"""
import os

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

NUM_WORKERS = os.cpu_count()

class_names = ''

def create_dataloaders(
    train_dir: str, # training directory path
    test_dir: str, # testing directory path
    transform: transforms.Compose, # transforms to perform on training and testing
    batch_size: int, # number of samples per batch in each DataLoader
    num_workers: int=NUM_WORKERS # number of workers per DataLoader
):

  """

  Creates the training and testing DataLoaders.

  It takes in a training directory and testing directory path and turns them 
  into PyTorch Datasets and then into PyTorch DataLoaders.

  It returns a tuple of (train_dataloader, test_dataloader, class_names) where
  class_names is a list of the target classes. 
  
  """

  # Use ImageFolder to create dataset(s)
  train_data = datasets.ImageFolder(train_dir, transform=transform)
  test_data = datasets.ImageFolder(test_dir, transform=transform)

  class_names = train_data.classes

  train_dataloader = DataLoader(
      train_data, 
      batch_size=batch_size,
      shuffle=True,
      num_workers=num_workers,
      pin_memory=True
  )

  test_dataloader = DataLoader(
      test_data,
      batch_size=batch_size,
      shuffle=False,
      num_workers=num_workers,
      pin_memory=True
  )

  return train_dataloader, test_dataloader, class_names


Overwriting going_modular/data_setup.py


In [13]:
import os
from torchvision import transforms
from going_modular import data_setup

BATCH_SIZE = 32

data_transform = transforms.Compose([
    transforms.Resize(size=(64, 64)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor()
])

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
    train_dir="data/pizza_steak_sushi/train",
    test_dir="data/pizza_steak_sushi/test",
    transform=data_transform,
    batch_size=BATCH_SIZE,
    num_workers=os.cpu_count()
)

## 3. Making a model (`model_builder.py`)

Notebooks 3 and 4 were built upon the TinyVGG model. Hence, for reusability, it makes sense to place such into a file so we can reuse it again and again. 

In [14]:
%%writefile going_modular/model_builder.py
"""
Contains PyTorch model code to instantiate a TinyVGG model. 
"""

import torch
from torch import nn

class TinyVGG(nn.Module):
    """

    Create the TinyVGG architecture.

    Replicates the TinyVGG architecture from the following website:
    https://poloclub.github.io/cnn-explainer/

    """

    def __init__(
        self,
        input_shape: int, 
        hidden_units: int,
        output_shape: int
    ) -> None:
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(
                in_channels=input_shape,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=0
            ),
            nn.ReLU(),
            nn.Conv2d(
                in_channels=hidden_units,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=0
            ),
            nn.ReLU(),
            nn.MaxPool2d(
                kernel_size=2,
                stride=2
            )   
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(
                in_channels=hidden_units,
                out_channels=hidden_units, 
                kernel_size=3,
                padding=0
            ),
            nn.ReLU(),
            nn.Conv2d(
                in_channels=hidden_units,
                out_channels=hidden_units,
                kernel_size=3,
                padding=0
            ),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units * 13 * 13,
            out_features=output_shape)
        )
    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.classifier(x)
        return x
        # return self.classifier(self.conv_block_2(self.conv_block_1(x))) # <- leverage the benefits of operator fusion



Overwriting going_modular/model_builder.py


Instead of coding the TinyVGG model from scratch each time, it can now be imported as:

In [15]:
import torch
from going_modular import model_builder

device = "cuda" if torch.cuda.is_available() else "cpu"

torch.manual_seed(42)
model = model_builder.TinyVGG(input_shape=3,
                              hidden_units=10,
                              output_shape=len(class_names)).to(device)

## 4. Creating `train_step()` and `test_step()` functions and `train()` to combine them

1. `train_step()` takes in a model, a `DataLoader`, a loss function and an optimizer and trains the model on the DataLoaser
2. `test_step()` takes in a model, a `DataLoader` and a loss function and evaluates the model on the `DataLoader`
3. `train()` performs the other two steps together for a given number of epochs and returns a results dictionary

Since these will be the engine of the model training, they will all be placed into a Python script called `engine.py` with the line `%%writefile going_modular/engine.py`

In [16]:
%%writefile going_modular/engine.py

""" 
Contains functions for training and testing a PyTorch model.
"""

import torch
from tqdm.auto import tqdm
from typing import Dict, List, Tuple

def train_step(
    model: torch.nn.Module, # model to be trained
    dataloader: torch.utils.data.DataLoader, # DataLoader instance for the model to be trained on
    loss_fn: torch.nn.Module, # loss function to minimize
    optimizer: torch.optim.Optimizer, # optimizer to help minimize the loss function
    device: torch.device
    ) -> Tuple[float, float]:

    """ 

    Trains a PyTorch model for a single epoch

    Turns a target PyTorch model to training mode and then runs through all of the training steps:
        Forward Pass
        Loss Calculation
        Optimizer Step

    It returns a tuple of training loss and training accuracy metrics

    """

    model.train()

    train_loss, train_acc = 0, 0

    for batch, (X, y) in enumerate(dataloader):

        X, y = X.to(device), y.to(device)

        # Forward pass
        y_pred = model(X)

        # Calculate and accumulate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss.item()

        # Optimizer zero grad
        optimizer.zero_grad()

        # Loss backward
        loss.backward()

        # Optimizer step
        optimizer.step()

        # Accuracy metric across all batches
        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item()/len(y_pred)

    train_loss /= len(dataloader)
    train_acc /= len(dataloader)
    return train_loss, train_acc

def test_step(
    model: torch.nn.Module, # model to be tested
    dataloader: torch.utils.data.DataLoader, # DataLoader instance for the model to be tested on
    loss_fn: torch.nn.Module, # loss function to calculate loss on the test data
    device: torch.device 
    ) -> Tuple[float, float]:

    """ 

    Test a PyTorch model for a single epoch

    Turns a target PyTorch model to "eval" mode and then performs a forward pass on a testing dataset

    It returns a tuple of testing loss and testing accuracy metrics

    """

    model.eval()

    test_loss, test_acc = 0, 0

    with torch.inference_mode():

        for batch, (X, y) in enumerate(dataloader):
            X, y = X.to(device), y.to(device)

            # Forward pass
            test_pred_logits = model(X)

            # Calculate and accumulate loss
            loss = loss_fn(test_pred_logits, y)
            test_loss = loss.item()

            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))

    test_loss /= len(dataloader)
    test_acc /= len(dataloader)
    return test_loss, test_acc

def train(
    model: torch.nn.Module, # model to be trained and tested
    train_dataloader: torch.utils.data.DataLoader, # DataLoader instance for the model to be trained on
    test_dataloader: torch.utils.data.DataLoader, # DataLoader instance for the model to be tested on
    optimizer: torch.optim.Optimizer, # optimizer to help minimize the loss function
    loss_fn: torch.nn.Module, # loss function to calculate loss on both datasets
    epochs: int,
    device: torch.device
    ) -> Dict[str, List]:

    """ 
    
    Trains and tests a PyTorch model

    Passes a target PyTorch model through the train_step() and test_step() functions for a number of epochs,
    training and testing the model in the same epoch loop

    It calculates, prints and stores evaluation metrics throughout

    It returns a dictionary of training and testing loss as well as training and testing accuracy metrics.
    Each metric has a value in a list for each epoch

    """

    results = {
        "train_loss": [],
        "train_acc": [], 
        "test_loss": [],
        "test_acc": []
    }

    for epoch in tqdm(range(epochs)):
        
        train_loss, train_acc = train_step(
            model=model, 
            dataloader=train_dataloader,
            loss_fn=loss_fn,
            optimizer=optimizer,
            device=device
        )

        test_loss, test_acc = test_step(
            model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn,
            device=device
        )

        print(f"Epoch: {epoch+1} | train_loss: {train_loss:.4f} | train_acc: {train_acc:.4f} | test_loss: {test_loss:.4f} | test_acc: {test_acc:.4f}")

        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

    return results

Overwriting going_modular/engine.py


In [17]:
from going_modular import engine

# engine.train(...)

## 5. Creating a function to save the model (`utils.py`)

We can use the `save_model()` function to make a file called `utils.py` with the line `%%writefile going_modular/utils.py`

In [18]:
%%writefile going_modular/utils.py
""" 
Contains various utility functions for PyTorch model training and saving
"""

import torch
from pathlib import Path

def save_model(
    model: torch.nn.Module, # model to save
    target_dir: str, # directory for saving the model to
    model_name: str # filename for the saved model. Should include either ".pth" or ".pt" as the file extension
    ):

    """ 
    Saves a PyTorch model to a target directory
    """

    target_dir_path = Path(target_dir)
    target_dir_path.mkdir(parents=True, exist_ok=True)

    assert model.name.endswith(".pth") or model_name.endswith(".pt"), "model_name should end with '.pt' or '.pth'"
    model_save_path = target_dir_path / model_name

    print(f"[INFO] Saving model to: {model_save_path}")
    torch.save(obj=model.state_dict(), f=model_save_path)

Overwriting going_modular/utils.py


If we wanted to use our `save_model()` function, we would import it and use it as follows:

In [19]:
from going_modular import utils

# save_model(
#    model=...,
#    target_dir=...,
#    model_name=... 
# )

## 6. Train, evaluate and save the model (`train.py`)

There are a lot of PyTorch repositories that combine all of their functionality together in a `train.py` file. Essentially, this file trains the model using whatever data is available.

In our `train.py` file, we will combine all other Python scripts. 

This way a PyTorch model can be trained using a single line:

`python train.py`

To create `train.py`, we will do the following steps:

1. Import the various dependencies
2. Setup various hyper parameters
3. Setup training and test directories
4. Setup device-agnostic code
5. Create the necessary data transforms
6. Create the DataLoaders using `data_setup.py`
7. Create the model using `model_builder.py`
8. Setup the loss function and optimizer
9. Train the model using `engine.py`
10. Save the model using `utils.py`

We can create the file from a notebook cell using `%%writefile going_modular/train.py`

With the introduction of the `argparse` module, one can invoke the hyperparameters from the command line in the following way:

`python train.py --num_epochs 5 --batch_size 32 --hidden_units 32 --learning_rate 0.001`


In [20]:
%%writefile going_modular/train.py
""" 
Trains a PyTorch image classification model using device-agnostic code.
"""

import os
import argparse
import torch
import torchvision
import data_setup, engine, model_builder, utils
from torchvision import transforms

# Parser
parser = argparse.ArgumentParser(description="Enter hyperparameters.")

parser.add_argument("--num_epochs", default=5, type=int, help="The number of epochs to train for")
parser.add_argument("--batch_size", default=32, type=int, help="The number of samples per batch")
parser.add_argument("--hidden_units", default=32, type=int, help="The number of hidden units in hidden layers")
parser.add_argument("--learning_rate", default=0.001, type=float, help="The learning rate to use for the model")
parser.add_argument("--train_dir", default="../data/pizza_steak_sushi/train", type=str, help="The directory file path to training data in standard image classification format")
parser.add_argument("--test_dir", default="../data/pizza_steak_sushi/test", type=str, help="The directory file path to testing data in standard image classification format")

args = parser.parse_args()

# Hyper parameters
NUM_EPOCHS = args.num_epochs
BATCH_SIZE = args.batch_size
HIDDEN_UNITS = args.hidden_units
LEARNING_RATE = args.learning_rate

# Directories
train_dir = args.train_dir
test_dir = args.test_dir
# print(f"[INFO]\nEpochs:{NUM_EPOCHS}\nBatch size:{BATCH_SIZE}\nHidden units:{HIDDEN_UNITS}\nLearning rate:{LEARNING_RATE}\nTrain directory:{train_dir}\nTest directory:{test_dir}\n")

device = "cuda" if torch.cuda.is_available() else "cpu"

# Transforms
data_transform = transforms.Compose([
  transforms.Resize((64, 64)),
  transforms.ToTensor()
])

if __name__ == '__main__': # When num_workers > 0, the multiprocessing module can encounter issues by spawning worker processes. Wrapping it in this guard provides a clear entry point to avoid re-importing the module and leading to errors. 
  # DataLoaders with help from data_setup.py
  train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
      train_dir=train_dir,
      test_dir=test_dir,
      transform=data_transform,
      batch_size=BATCH_SIZE
  )

  # Model with help from model_builder.py
  model = model_builder.TinyVGG(
      input_shape=3,
      hidden_units=HIDDEN_UNITS,
      output_shape=len(class_names)
  ).to(device)

  # Loss and optimizer
  loss_fn = torch.nn.CrossEntropyLoss()
  optimizer = torch.optim.Adam(model.parameters(),
                              lr=LEARNING_RATE)

  # Training with help from engine.py
  engine.train(model=model,
              train_dataloader=train_dataloader,
              test_dataloader=test_dataloader,
              loss_fn=loss_fn,
              optimizer=optimizer,
              epochs=NUM_EPOCHS,
              device=device)

  # Save the model with help from utils.py
  utils.save_model(model=model,
                  target_dir="models",
                  model_name="05_going_modular_script_mode_tinyvgg_model.pth")

Overwriting going_modular/train.py
