<a href="https://colab.research.google.com/drive/1MZAZ5bNSVGJbnTHq5ZvM4jqwN8tsI7MV#scrollTo=FIZjNcO0pWY-" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 05. Going Modular: Part 1 (cell mode)



This notebook is part 1/2 of section [05. Going Modular](https://www.learnpytorch.io/05_pytorch_going_modular/).

For reference, the two parts are:
1. [**05. Going Modular: Part 1 (cell mode)**](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/05_pytorch_going_modular_cell_mode.ipynb) - this notebook is run as a traditional Jupyter Notebook/Google Colab notebook and is a condensed version of [notebook 04](https://www.learnpytorch.io/04_pytorch_custom_datasets/).
2. [**05. Going Modular: Part 2 (script mode)**](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/05_pytorch_going_modular_script_mode.ipynb) - this notebook is the same as number 1 but with added functionality to turn each of the major sections into Python scripts, such as, `data_setup.py` and `train.py`.

Why two parts?

Because sometimes the best way to learn something is to see how it *differs* from something else.

If you run each notebook side-by-side you'll see how they differ and that's where the key learnings are.

## What is cell mode?



A cell mode notebook is a regular notebook run exactly how we've been running them through the course.

Some cells contain text and others contain code.

## What's the difference between this notebook (Part 1) and the script mode notebook (Part 2)?



This notebook, 05. PyTorch Going Modular: Part 1 (cell mode), runs a cleaned up version of the most useful code from section [04. PyTorch Custom Datasets](https://www.learnpytorch.io/04_pytorch_custom_datasets/).

Running this notebook end-to-end will result in recreating the image classification model we built in notebook 04 (TinyVGG) trained on images of pizza, steak and sushi.

The main difference between this notebook (Part 1) and Part 2 is that each section in Part 2 (script mode) has an extra subsection (e.g. 2.1, 3.1, 4.1) for turning cell code into script code.

## Where can you get help?



You can find the book version of this section [05. PyTorch Going Modular on learnpytorch.io](https://www.learnpytorch.io/05_pytorch_going_modular/).

The rest of the materials for this course [are available on GitHub](https://github.com/mrdbourke/pytorch-deep-learning).

If you run into trouble, you can ask a question on the course [GitHub Discussions page](https://github.com/mrdbourke/pytorch-deep-learning/discussions).

And of course, there's the [PyTorch documentation](https://pytorch.org/docs/stable/index.html) and [PyTorch developer forums](https://discuss.pytorch.org/), a very helpful place for all things PyTorch.

## 0. Running a notebook in cell mode



As discussed, we're going to be running this notebook normally.

One cell at a time.

The code is from notebook 04, however, it has been condensed down to its core functionality.

## 1. Get data



We're going to start by downloading the same data we used in [notebook 04](https://www.learnpytorch.io/04_pytorch_custom_datasets/#1-get-data), the `pizza_steak_sushi` dataset with images of pizza, steak and sushi.

In [1]:
import os
import zipfile

from pathlib import Path

import requests

# Setup path to data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

# If the image folder doesn't exist, download it and prepare it...
if image_path.is_dir():
    print(f"{image_path} directory exists.")
else:
    print(f"Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)

# Download pizza, steak, sushi data
with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading pizza, steak, sushi data...")
    f.write(request.content)

# Unzip pizza, steak, sushi data
with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping pizza, steak, sushi data...")
    zip_ref.extractall(image_path)

# Remove zip file
os.remove(data_path / "pizza_steak_sushi.zip")

Did not find data/pizza_steak_sushi directory, creating one...
Downloading pizza, steak, sushi data...
Unzipping pizza, steak, sushi data...


#### Explanation of the Code



This Python script is designed to download and prepare a dataset of images (pizza, steak, and sushi) for use in a machine learning project. Below is a detailed breakdown of the code, including explanations of the key components.

##### 1. **Importing Libraries**
   - `os`: Provides functions for interacting with the operating system, such as file and directory manipulation.
   - `zipfile`: Allows for reading and writing ZIP files.
   - `Path` from `pathlib`: A module that provides an object-oriented interface for working with filesystem paths.
   - `requests`: A library for making HTTP requests, used here to download the dataset.

##### 2. **Setting Up the Data Path**
   - `data_path = Path("data/")`: Creates a `Path` object pointing to a directory named `data`. The `Path` class is used to handle filesystem paths in a more intuitive way compared to traditional string-based paths.
   - `image_path = data_path / "pizza_steak_sushi"`: Combines the `data_path` with a subdirectory named `pizza_steak_sushi` to create a new `Path` object. This is where the dataset will be stored.

##### 3. **Checking and Creating the Directory**
   - `if image_path.is_dir():`: Checks if the `image_path` directory already exists.
   - `image_path.mkdir(parents=True, exist_ok=True)`: If the directory does not exist, it creates it. The `parents=True` argument ensures that any necessary parent directories are also created, and `exist_ok=True` prevents an error if the directory already exists.

##### 4. **Downloading the Dataset**
   - `with open(data_path / "pizza_steak_sushi.zip", "wb") as f:`: Opens a file in write-binary mode (`"wb"`). The `with` statement ensures that the file is properly closed after the block of code is executed, even if an error occurs. The file is named `pizza_steak_sushi.zip` and is stored in the `data_path` directory.
   - `request = requests.get(...)`: Sends an HTTP GET request to download the dataset from the specified URL.
   - `f.write(request.content)`: Writes the content of the HTTP response (the downloaded data) to the file `f`.

##### 5. **Unzipping the Dataset**
   - `with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:`: Opens the ZIP file in read mode (`"r"`). The `with` statement ensures that the ZIP file is properly closed after the block of code is executed.
   - `zip_ref.extractall(image_path)`: Extracts all the contents of the ZIP file to the `image_path` directory.

##### 6. **Removing the ZIP File**
   - `os.remove(data_path / "pizza_steak_sushi.zip")`: Deletes the ZIP file after its contents have been extracted, to save space.

##### 7. **Key Concepts and Variables**
   - **`Path`**: An object-oriented way to handle filesystem paths. It allows for easy manipulation of paths, such as joining them or checking if they exist.
   - **`with open(...) as f:`**: A context manager that ensures the file is properly managed (opened and closed) within the block. The `as f` part assigns the file object to the variable `f`.
   - **`"wb"`**: A file mode that stands for "write binary". It opens the file for writing in binary mode, which is necessary for writing non-text data like images or ZIP files.
   - **`f.write(...)`**: A method of the file object `f` that writes data to the file. In this case, it writes the binary content of the downloaded dataset.
   - **`zip_ref`**: A variable that holds the `ZipFile` object, which is used to interact with the ZIP file (e.g., extracting its contents).

##### 8. **Other Common File Modes**
   - **`"r"`**: Read mode (default). Opens the file for reading.
   - **`"w"`**: Write mode. Opens the file for writing, overwriting the file if it exists.
   - **`"a"`**: Append mode. Opens the file for writing, but appends to the end of the file if it exists.
   - **`"rb"`**: Read binary mode. Opens the file for reading in binary mode.
   - **`"ab"`**: Append binary mode. Opens the file for appending in binary mode.

This script is a common pattern in data preparation workflows, where datasets are downloaded, extracted, and organized for further processing.

In [2]:
# Setup train and testing paths
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

(PosixPath('data/pizza_steak_sushi/train'),
 PosixPath('data/pizza_steak_sushi/test'))

## 2. Create Datasets and DataLoaders



Now we'll turn the image dataset into PyTorch `Dataset`'s and `DataLoader`'s.

In [3]:
from torchvision import datasets, transforms

# Create simple transform
data_transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

# Use ImageFolder to create dataset(s)
train_data = datasets.ImageFolder(root=train_dir, # target folder of images
                                  transform=data_transform, # transforms to perform on data (images)
                                  target_transform=None) # transforms to perform on labels (if necessary)

test_data = datasets.ImageFolder(root=test_dir,
                                 transform=data_transform)

print(f"Train data:\n{train_data}\nTest data:\n{test_data}")

Train data:
Dataset ImageFolder
    Number of datapoints: 225
    Root location: data/pizza_steak_sushi/train
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
           )
Test data:
Dataset ImageFolder
    Number of datapoints: 75
    Root location: data/pizza_steak_sushi/test
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
           )


#### Explanation of `transforms.Compose` and `transforms.Resize`



In PyTorch, the `torchvision.transforms` module provides a variety of image transformations that can be applied to datasets. These transformations are often used to preprocess images before feeding them into a neural network. The `transforms.Compose` function is a key part of this process.

##### 1. **`transforms.Compose`**:
   - **Purpose**: `transforms.Compose` is used to chain multiple image transformations together. It takes a list of transformations and applies them sequentially to the input image.
   - **Usage**: You pass a list of transformations to `transforms.Compose`, and it returns a single callable object that applies all the transformations in the specified order.

   ```python
   data_transform = transforms.Compose([
       transforms.Resize((64, 64)),
       transforms.ToTensor(),
   ])
   ```

   In this example, `data_transform` is a callable object that first resizes the image to 64x64 pixels and then converts it to a PyTorch tensor.

##### 2. **`transforms.Resize`**:
   - **Purpose**: `transforms.Resize` is used to resize an image to a specified size. This is often necessary because neural networks typically expect input images of a fixed size.
   - **Usage**: You specify the desired size as a tuple (height, width). The image will be resized to this size.

   ```python
   transforms.Resize((64, 64))
   ```

   This transformation resizes the image to 64x64 pixels.

##### 3. **Why `Resize` is Inside `Compose`**:
   - **Sequential Application**: By placing `transforms.Resize` inside `transforms.Compose`, you ensure that the resizing operation is applied first, followed by any other transformations specified in the list.
   - **Modularity and Reusability**: Using `Compose` allows you to define a sequence of transformations once and reuse it across different datasets or data loaders. This makes your code more modular and easier to maintain.

##### 4. **Example Workflow**:
   - **Step 1**: Define the transformations using `transforms.Compose`.
     ```python
     data_transform = transforms.Compose([
         transforms.Resize((64, 64)),
         transforms.ToTensor(),
     ])
     ```
   - **Step 2**: Apply these transformations to your dataset using `datasets.ImageFolder`.
     ```python
     train_data = datasets.ImageFolder(root=train_dir, transform=data_transform)
     test_data = datasets.ImageFolder(root=test_dir, transform=data_transform)
     ```
   - **Step 3**: When you access an image from `train_data` or `test_data`, the transformations defined in `data_transform` will be applied automatically.

##### 5. **Other Common Transformations**:
   - **`transforms.ToTensor()`**: Converts a PIL image or NumPy array to a PyTorch tensor. It also scales the pixel values to the range [0, 1].
   - **`transforms.Normalize(mean, std)`**: Normalizes a tensor image with mean and standard deviation.
   - **`transforms.RandomHorizontalFlip()`**: Randomly flips the image horizontally.
   - **`transforms.RandomRotation(degrees)`**: Randomly rotates the image by a specified number of degrees.



## 2. Create Datasets and DataLoaders



Now we'll turn the image dataset into PyTorch `Dataset`'s and `DataLoader`'s.

In [4]:
from torchvision import datasets, transforms

# Create simple transform
data_transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

# Use ImageFolder to create dataset(s)
train_data = datasets.ImageFolder(root=train_dir, # target folder of images
                                  transform=data_transform, # transforms to perform on data (images)
                                  target_transform=None) # transforms to perform on labels (if necessary)

test_data = datasets.ImageFolder(root=test_dir,
                                 transform=data_transform)

print(f"Train data:\n{train_data}\nTest data:\n{test_data}")

Train data:
Dataset ImageFolder
    Number of datapoints: 225
    Root location: data/pizza_steak_sushi/train
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
           )
Test data:
Dataset ImageFolder
    Number of datapoints: 75
    Root location: data/pizza_steak_sushi/test
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
           )




`datasets.ImageFolder` is a utility provided by PyTorch's `torchvision.datasets` module that simplifies the process of loading image datasets stored in a folder structure. It is particularly useful when your dataset is organized such that each class of images is stored in its own subdirectory. Here’s a slightly more detailed summary of what it does:

1. **Folder Structure**:
   - The dataset should be organized with each subdirectory representing a class, and the images for that class stored within that subdirectory. For example:
     ```
     train_dir/
     ├── class1/
     │   ├── img1.jpg
     │   ├── img2.jpg
     │   └── ...
     ├── class2/
     │   ├── img1.jpg
     │   ├── img2.jpg
     │   └── ...
     └── ...
     ```

2. **Automatic Labeling**:
   - `ImageFolder` automatically assigns labels to the images based on the subdirectory they are in. For instance, images in `class1/` will be labeled as `0`, images in `class2/` will be labeled as `1`, and so on. This makes it easy to handle datasets with multiple classes.

3. **Transformations**:
   - You can specify a `transform` parameter to apply a series of transformations (e.g., resizing, converting to tensor) to each image as it is loaded. This ensures that the images are preprocessed consistently before being fed into a model.
   - Optionally, you can specify a `target_transform` to apply transformations to the labels, though this is less commonly used.

4. **Usage**:
   - To create a dataset, you simply provide the root directory of your dataset and the desired transformations. For example:
     ```python
     train_data = datasets.ImageFolder(root=train_dir, transform=data_transform)
     ```

5. **Accessing Data**:
   - Once the dataset is created, you can access individual images and their labels using indexing. For example:
     ```python
     image, label = train_data[0]  # Get the first image and its label
     ```

6. **Dataset Information**:
   - The `print` statement in the original code displays information about the datasets, such as the number of classes and the total number of images. This helps you verify that the dataset has been loaded correctly.



In [5]:
# Get class names as a list
class_names = train_data.classes
class_names

['pizza', 'steak', 'sushi']

In [6]:
# Can also get class names as a dict
class_dict = train_data.class_to_idx
class_dict

{'pizza': 0, 'steak': 1, 'sushi': 2}

In [7]:
# Check the lengths
len(train_data), len(test_data)

(225, 75)

In [8]:
# Turn train and test Datasets into DataLoaders
from torch.utils.data import DataLoader
train_dataloader = DataLoader(dataset=train_data,
                              batch_size=1, # how many samples per batch?
                              num_workers=1, # how many subprocesses to use for data loading? (higher = more)
                              shuffle=True) # shuffle the data?

test_dataloader = DataLoader(dataset=test_data,
                             batch_size=1,
                             num_workers=1,
                             shuffle=False) # don't usually need to shuffle testing data

train_dataloader, test_dataloader

(<torch.utils.data.dataloader.DataLoader at 0x7a381618a010>,
 <torch.utils.data.dataloader.DataLoader at 0x7a381617ab90>)


`DataLoader` in PyTorch is used to efficiently load and iterate over datasets, especially when training machine learning models. Here’s a concise summary of why datasets are turned into `DataLoader`s:

1. **Batching**:
   - `DataLoader` allows you to specify a `batch_size`, which groups multiple samples (e.g., images) into batches. This is essential for training models, as most models process data in batches rather than one sample at a time.

2. **Shuffling**:
   - For training data, shuffling (`shuffle=True`) is important to ensure that the model does not learn the order of the data, which can lead to overfitting or poor generalization.

3. **Parallel Loading**:
   - The `num_workers` parameter allows you to load data in parallel using multiple subprocesses, speeding up data loading and reducing bottlenecks during training.

4. **Iteration**:
   - `DataLoader` provides an easy way to iterate over the dataset in batches, making it straightforward to feed data into a model during training or evaluation.

5. **Testing Data**:
   - For testing data, shuffling is typically not needed (`shuffle=False`), as the order of evaluation does not affect the model's performance.



In [9]:
# Check out single image size/shape
img, label = next(iter(train_dataloader))

# Batch size will now be 1, try changing the batch_size parameter above and see what happens
print(f"Image shape: {img.shape} -> [batch_size, color_channels, height, width]")
print(f"Label shape: {label.shape}")

Image shape: torch.Size([1, 3, 64, 64]) -> [batch_size, color_channels, height, width]
Label shape: torch.Size([1])


## 3. Making a model (TinyVGG)



We're going to use the same model we used in notebook 04: TinyVGG from the CNN Explainer website.

The only change here from notebook 04 is that a docstring has been added using [Google's Style Guide for Python](https://google.github.io/styleguide/pyguide.html#384-classes).

In [10]:
import torch

from torch import nn

class TinyVGG(nn.Module):
  """Creates the TinyVGG architecture.

  Replicates the TinyVGG architecture from the CNN explainer website in PyTorch.
  See the original architecture here: https://poloclub.github.io/cnn-explainer/

  Args:
    input_shape: An integer indicating number of input channels.
    hidden_units: An integer indicating number of hidden units between layers.
    output_shape: An integer indicating number of output units.
  """
  def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
      super().__init__()
      self.conv_block_1 = nn.Sequential(
          nn.Conv2d(in_channels=input_shape,
                    out_channels=hidden_units,
                    kernel_size=3, # how big is the square that's going over the image?
                    stride=1, # default
                    padding=0), # options = "valid" (no padding) or "same" (output has same shape as input) or int for specific number
          nn.ReLU(),
          nn.Conv2d(in_channels=hidden_units,
                    out_channels=hidden_units,
                    kernel_size=3,
                    stride=1,
                    padding=0),
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2,
                        stride=2) # default stride value is same as kernel_size
      )
      self.conv_block_2 = nn.Sequential(
          nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),
          nn.ReLU(),
          nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),
          nn.ReLU(),
          nn.MaxPool2d(2)
      )
      self.classifier = nn.Sequential(
          nn.Flatten(),
          # Where did this in_features shape come from?
          # It's because each layer of our network compresses and changes the shape of our inputs data.
          nn.Linear(in_features=hidden_units*13*13,
                    out_features=output_shape)
      )

  def forward(self, x: torch.Tensor):
      x = self.conv_block_1(x)
      x = self.conv_block_2(x)
      x = self.classifier(x)
      return x
      # return self.classifier(self.block_2(self.block_1(x))) # <- leverage the benefits of operator fusion

In [11]:
import torch

# Setup device-agnostic code
if torch.cuda.is_available():
    device = "cuda" # NVIDIA GPU
elif torch.backends.mps.is_available():
    device = "mps" # Apple GPU
else:
    device = "cpu" # Defaults to CPU if NVIDIA GPU/Apple GPU aren't available

print(f"Using device: {device}")

# Instantiate an instance of the model
torch.manual_seed(42)
model_0 = TinyVGG(input_shape=3, # number of color channels (3 for RGB)
                  hidden_units=10,
                  output_shape=len(train_data.classes)).to(device)
model_0

Using device: cuda


TinyVGG(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=1690, out_features=3, bias=True)
  )
)

To test our model let's do a single forward pass (pass a sample batch from the training set through our model).

In [12]:
# 1. Get a batch of images and labels from the DataLoader
img_batch, label_batch = next(iter(train_dataloader))

# 2. Get a single image from the batch and unsqueeze the image so its shape fits the model
img_single, label_single = img_batch[0].unsqueeze(dim=0), label_batch[0]
print(f"Single image shape: {img_single.shape}\n")

# 3. Perform a forward pass on a single image
model_0.eval()
with torch.inference_mode():
    pred = model_0(img_single.to(device))

# 4. Print out what's happening and convert model logits -> pred probs -> pred label
print(f"Output logits:\n{pred}\n")
print(f"Output prediction probabilities:\n{torch.softmax(pred, dim=1)}\n")
print(f"Output prediction label:\n{torch.argmax(torch.softmax(pred, dim=1), dim=1)}\n")
print(f"Actual label:\n{label_single}")

Single image shape: torch.Size([1, 3, 64, 64])

Output logits:
tensor([[ 0.0208, -0.0020,  0.0095]], device='cuda:0')

Output prediction probabilities:
tensor([[0.3371, 0.3295, 0.3333]], device='cuda:0')

Output prediction label:
tensor([0], device='cuda:0')

Actual label:
0


#### Explanation of `next(iter(train_dataloader))` and Batch Structure



When working with `DataLoader` in PyTorch, each batch typically consists of two parts: the **input data** (e.g., images) and the **labels** (e.g., class indices). This is why `next(iter(train_dataloader))` returns two variables: one for the batch of images and one for the corresponding labels.

##### 1. **Why Two Variables?**
   - **`img_batch`**: This contains the batch of images (or input data) from the dataset.
   - **`label_batch`**: This contains the corresponding labels for the images in the batch.
   - This separation is necessary because, during training or evaluation, the model needs both the input data and the ground truth labels to compute predictions and losses.

##### 2. **Why Does the Batch Consist of Two Parts?**
   - In supervised learning, each data sample has an input (e.g., an image) and a corresponding label (e.g., the class of the image).
   - The `DataLoader` groups these pairs into batches, so each batch contains:
     - A tensor of input data (e.g., a batch of images).
     - A tensor of labels (e.g., a batch of class indices).

##### 3. **What Does `next(iter(train_dataloader))` Do?**
   - **`iter(train_dataloader)`**: Converts the `DataLoader` into an iterator, allowing you to fetch batches one at a time.
   - **`next(...)`**: Retrieves the next batch from the iterator. Since each batch contains both images and labels, `next(...)` returns two tensors: one for the images and one for the labels.

##### 4. **Example Breakdown**:
   ```python
   img_batch, label_batch = next(iter(train_dataloader))
   ```
   - `img_batch`: A tensor containing a batch of images (e.g., shape `[batch_size, channels, height, width]`).
   - `label_batch`: A tensor containing the corresponding labels (e.g., shape `[batch_size]`).



## 4. Creating `train_step()` and `test_step()` functions and `train()` to combine them  



Rather than writing them again, we can reuse the `train_step()` and `test_step()` functions from [notebook 04](https://www.learnpytorch.io/04_pytorch_custom_datasets/#75-create-train-test-loop-functions).

The same goes for the `train()` function we created.

The only difference here is that these functions have had docstrings added to them in [Google's Python Functions and Methods Style Guide](https://google.github.io/styleguide/pyguide.html#383-functions-and-methods).

Let's start by making `train_step()`.

### `train_step()`

In [13]:
from typing import Tuple

def train_step(model: torch.nn.Module,
               dataloader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               device: torch.device) -> Tuple[float, float]:
  """Trains a PyTorch model for a single epoch.

  Turns a target PyTorch model to training mode and then
  runs through all of the required training steps (forward
  pass, loss calculation, optimizer step).

  Args:
    model: A PyTorch model to be trained.
    dataloader: A DataLoader instance for the model to be trained on.
    loss_fn: A PyTorch loss function to minimize.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    device: A target device to compute on (e.g. "cuda" or "cpu").

  Returns:
    A tuple of training loss and training accuracy metrics.
    In the form (train_loss, train_accuracy). For example:

    (0.1112, 0.8743)
  """
  # Put model in train mode
  model.train()

  # Setup train loss and train accuracy values
  train_loss, train_acc = 0, 0

  # Loop through data loader data batches
  for batch, (X, y) in enumerate(dataloader):
      # Send data to target device
      X, y = X.to(device), y.to(device)

      # 1. Forward pass
      y_pred = model(X)

      # 2. Calculate  and accumulate loss
      loss = loss_fn(y_pred, y)
      train_loss += loss.item()

      # 3. Optimizer zero grad
      optimizer.zero_grad()

      # 4. Loss backward
      loss.backward()

      # 5. Optimizer step
      optimizer.step()

      # Calculate and accumulate accuracy metric across all batches
      y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
      train_acc += (y_pred_class == y).sum().item()/len(y_pred)

  # Adjust metrics to get average loss and accuracy per batch
  train_loss = train_loss / len(dataloader)
  train_acc = train_acc / len(dataloader)
  return train_loss, train_acc

#### Explanation of `Tuple` and `train_loss += loss.item()`



Let's break down the key parts of the code to understand why a `Tuple` is used and why `train_loss += loss.item()` is used.

##### Why is `Tuple` Used Here?

1. **Return Type Annotation**:
   - The function `train_step` is annotated to return a `Tuple[float, float]`. This means the function will return a tuple containing two `float` values: the training loss and the training accuracy.
   - Using a tuple is a convenient way to return multiple values from a function in Python. In this case, the function needs to return both the average training loss and the average training accuracy for the epoch.

2. **Example**:
   - The function returns a tuple like `(0.1112, 0.8743)`, where `0.1112` is the average training loss and `0.8743` is the average training accuracy.

##### Why is `train_loss += loss.item()` Used Here?

1. **Accumulating Loss**:
   - `train_loss += loss.item()` is used to accumulate the loss over all batches in the epoch.
   - `loss.item()` extracts the scalar value from the loss tensor. This is necessary because `loss` is a tensor, and we need to convert it to a Python float to accumulate it.

2. **Average Loss Calculation**:
   - After processing all batches, the total loss is divided by the number of batches (`len(dataloader)`) to get the average loss per batch for the epoch.
   - This gives a more accurate representation of the model's performance over the entire dataset.

3. **When to Use `loss.item()`**:
   - Use `loss.item()` when you need to extract the scalar value from a single-element tensor. This is common when accumulating metrics like loss or accuracy during training or evaluation.



### `test_step()`

In [14]:
def test_step(model: torch.nn.Module,
              dataloader: torch.utils.data.DataLoader,
              loss_fn: torch.nn.Module,
              device: torch.device) -> Tuple[float, float]:
  """Tests a PyTorch model for a single epoch.

  Turns a target PyTorch model to "eval" mode and then performs
  a forward pass on a testing dataset.

  Args:
    model: A PyTorch model to be tested.
    dataloader: A DataLoader instance for the model to be tested on.
    loss_fn: A PyTorch loss function to calculate loss on the test data.
    device: A target device to compute on (e.g. "cuda" or "cpu").

  Returns:
    A tuple of testing loss and testing accuracy metrics.
    In the form (test_loss, test_accuracy). For example:

    (0.0223, 0.8985)
  """
  # Put model in eval mode
  model.eval()

  # Setup test loss and test accuracy values
  test_loss, test_acc = 0, 0

  # Turn on inference context manager
  with torch.inference_mode():
      # Loop through DataLoader batches
      for batch, (X, y) in enumerate(dataloader):
          # Send data to target device
          X, y = X.to(device), y.to(device)

          # 1. Forward pass
          test_pred_logits = model(X)

          # 2. Calculate and accumulate loss
          loss = loss_fn(test_pred_logits, y)
          test_loss += loss.item()

          # Calculate and accumulate accuracy
          test_pred_labels = test_pred_logits.argmax(dim=1)
          test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))

  # Adjust metrics to get average loss and accuracy per batch
  test_loss = test_loss / len(dataloader)
  test_acc = test_acc / len(dataloader)
  return test_loss, test_acc

### `train()` - Combining `train_step()` and `test_step()`

And we'll combine `train_step()` and `test_step()` into `train()`.

In [15]:
from typing import Dict, List

from tqdm.auto import tqdm

def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device) -> Dict[str, List[float]]:
  """Trains and tests a PyTorch model.

  Passes a target PyTorch models through train_step() and test_step()
  functions for a number of epochs, training and testing the model
  in the same epoch loop.

  Calculates, prints and stores evaluation metrics throughout.

  Args:
    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

  Returns:
    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for
    each epoch.
    In the form: {train_loss: [...],
                  train_acc: [...],
                  test_loss: [...],
                  test_acc: [...]}
    For example if training for epochs=2:
                 {train_loss: [2.0616, 1.0537],
                  train_acc: [0.3945, 0.3945],
                  test_loss: [1.2641, 1.5706],
                  test_acc: [0.3400, 0.2973]}
  """
  # Create empty results dictionary
  results = {"train_loss": [],
      "train_acc": [],
      "test_loss": [],
      "test_acc": []
  }

  # Loop through training and testing steps for a number of epochs
  for epoch in tqdm(range(epochs)):
      train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
      test_loss, test_acc = test_step(model=model,
          dataloader=test_dataloader,
          loss_fn=loss_fn,
          device=device)

      # Print out what's happening
      print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
      )

      # Update results dictionary
      results["train_loss"].append(train_loss)
      results["train_acc"].append(train_acc)
      results["test_loss"].append(test_loss)
      results["test_acc"].append(test_acc)

  # Return the filled results at the end of the epochs
  return results

#### Explanation of `train()`



This code defines a function `train` that trains and tests a PyTorch model over multiple epochs, storing and returning the results in a dictionary. Let's break down the key parts of the code to understand why an empty results dictionary is created and what `from typing import Dict, List` is for.

##### 1. **Why Create an Empty Results Dictionary?**

- **Purpose**: The empty results dictionary is created to store the training and testing metrics (loss and accuracy) for each epoch.
- **Structure**: The dictionary has four keys: `train_loss`, `train_acc`, `test_loss`, and `test_acc`. Each key maps to a list that will store the corresponding metric values for each epoch.
- **Usage**: As the model is trained and tested over multiple epochs, the results for each epoch are appended to the respective lists in the dictionary. This allows for easy tracking and analysis of the model's performance over time.

##### 2. **What is `from typing import Dict, List` For?**

- **Type Annotations**: The `typing` module provides type hints that help in documenting the expected types of function arguments and return values. This improves code readability and can help catch type-related errors early.
- **`Dict`**: Used to specify that a variable is a dictionary. In this case, the `results` dictionary is annotated as `Dict[str, List[float]]`, meaning it is a dictionary where the keys are strings and the values are lists of floats.
- **`List`**: Used to specify that a variable is a list. Here, the values in the `results` dictionary are lists of floats, so `List[float]` is used to indicate this.

##### 3. **Code Breakdown**

- **Initialize Results Dictionary**:
  ```python
  results = {"train_loss": [],
             "train_acc": [],
             "test_loss": [],
             "test_acc": []}
  ```
  - This creates an empty dictionary with keys for training loss, training accuracy, testing loss, and testing accuracy. Each key maps to an empty list that will store the metric values for each epoch.

- **Loop Through Epochs**:
  ```python
  for epoch in tqdm(range(epochs)):
  ```
  - The loop iterates over the specified number of epochs, using `tqdm` to display a progress bar.

- **Training and Testing**:
  ```python
  train_loss, train_acc = train_step(model=model,
                                     dataloader=train_dataloader,
                                     loss_fn=loss_fn,
                                     optimizer=optimizer,
                                     device=device)
  test_loss, test_acc = test_step(model=model,
                                  dataloader=test_dataloader,
                                  loss_fn=loss_fn,
                                  device=device)
  ```
  - Calls `train_step` and `test_step` to train and test the model for one epoch, returning the loss and accuracy metrics.

- **Update Results Dictionary**:
  ```python
  results["train_loss"].append(train_loss)
  results["train_acc"].append(train_acc)
  results["test_loss"].append(test_loss)
  results["test_acc"].append(test_acc)
  ```
  - Appends the metrics for the current epoch to the respective lists in the `results` dictionary.

- **Return Results**:
  ```python
  return results
  ```
  - After all epochs are completed, the function returns the `results` dictionary containing the metrics for each epoch.

##### 4. **Summary**

- **Empty Results Dictionary**: Created to store and track training and testing metrics (loss and accuracy) over multiple epochs.
- **`from typing import Dict, List`**: Used for type annotations to specify that the `results` dictionary is a `Dict[str, List[float]]`, improving code readability and type safety.

This approach ensures that the training and testing metrics are systematically recorded and can be easily analyzed or visualized after the training process.

## 5. Creating a function to save the model



Let's setup a function to save our model to a directory.

In [16]:
from pathlib import Path

def save_model(model: torch.nn.Module,
               target_dir: str,
               model_name: str):
  """Saves a PyTorch model to a target directory.

  Args:
    model: A target PyTorch model to save.
    target_dir: A directory for saving the model to.
    model_name: A filename for the saved model. Should include
      either ".pth" or ".pt" as the file extension.

  Example usage:
    save_model(model=model_0,
               target_dir="models",
               model_name="05_going_modular_tingvgg_model.pth")
  """
  # Create target directory
  target_dir_path = Path(target_dir)
  target_dir_path.mkdir(parents=True,
                        exist_ok=True)

  # Create model save path
  assert model_name.endswith(".pth") or model_name.endswith(".pt"), "model_name should end with '.pt' or '.pth'"
  model_save_path = target_dir_path / model_name

  # Save the model state_dict()
  print(f"[INFO] Saving model to: {model_save_path}")
  torch.save(obj=model.state_dict(),
             f=model_save_path)

## 6. Train, evaluate and save the model



Let's leverage the functions we've got above to train, test and save a model to file.


In [17]:
# Set random seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Set number of epochs
NUM_EPOCHS = 5

# Recreate an instance of TinyVGG
model_0 = TinyVGG(input_shape=3, # number of color channels (3 for RGB)
                  hidden_units=10,
                  output_shape=len(train_data.classes)).to(device)

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model_0.parameters(), lr=0.001)

# Start the timer
from timeit import default_timer as timer
start_time = timer()

# Train model_0
model_0_results = train(model=model_0,
                        train_dataloader=train_dataloader,
                        test_dataloader=test_dataloader,
                        optimizer=optimizer,
                        loss_fn=loss_fn,
                        epochs=NUM_EPOCHS,
                        device=device)

# End the timer and print out how long it took
end_time = timer()
print(f"[INFO] Total training time: {end_time-start_time:.3f} seconds")

# Save the model
save_model(model=model_0,
           target_dir="models",
           model_name="05_going_modular_cell_mode_tinyvgg_model.pth")

  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0928 | train_acc: 0.3956 | test_loss: 1.0724 | test_acc: 0.4267
Epoch: 2 | train_loss: 1.0123 | train_acc: 0.4800 | test_loss: 0.9924 | test_acc: 0.4267
Epoch: 3 | train_loss: 0.9822 | train_acc: 0.5644 | test_loss: 0.9849 | test_acc: 0.4800
Epoch: 4 | train_loss: 0.9120 | train_acc: 0.5867 | test_loss: 0.9866 | test_acc: 0.4267
Epoch: 5 | train_loss: 0.8853 | train_acc: 0.5733 | test_loss: 0.9952 | test_acc: 0.4800
[INFO] Total training time: 8.485 seconds
[INFO] Saving model to: models/05_going_modular_cell_mode_tinyvgg_model.pth


We finish with a saved image classification model at `models/05_going_modular_cell_mode_tinyvgg_model.pth`.

The code continues in [05. Going Modular: Part 2 (script mode)](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/05_pytorch_going_modular_script_mode.ipynb).