<a href="https://colab.research.google.com/github/aRod209/pytorch-for-deep-learning/blob/main/exercises/05_pytorch_going_modular_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 05. PyTorch Going Modular Exercises

Welcome to the 05. PyTorch Going Modular exercise template notebook.

There are several questions in this notebook and it's your goal to answer them by writing Python and PyTorch code.

> **Note:** There may be more than one solution to each of the exercises, don't worry too much about the *exact* right answer. Try to write some code that works first and then improve it if you can.

## Resources and solutions

* These exercises/solutions are based on [section 05. PyTorch Going Modular](https://www.learnpytorch.io/05_pytorch_going_modular/) of the Learn PyTorch for Deep Learning course by Zero to Mastery.

**Solutions:**

Try to complete the code below *before* looking at these.

* See a live [walkthrough of the solutions (errors and all) on YouTube](https://youtu.be/ijgFhMK3pp4).
* See an example [solutions notebook for these exercises on GitHub](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/solutions/05_pytorch_going_modular_exercise_solutions.ipynb).

## 1. Turn the code to get the data (from section 1. Get Data) into a Python script, such as `get_data.py`.

* When you run the script using `python get_data.py` it should check if the data already exists and skip downloading if it does.
* If the data download is successful, you should be able to access the `pizza_steak_sushi` images from the `data` directory.

In [None]:
%%writefile get_data.py
"""
Downloads a zipfile of data consisting of pizza, steak, and sushi images.
The zipfile is unzipped and the images are stored in an image path directory.
The Zipfile is then removed from the directory.
"""
import os
import zipfile

from pathlib import Path

import requests

# Setup path to data folder
data_path = Path('data/')
image_path = data_path / 'pizza_steak_sushi'
image_zip_path = data_path / 'pizza_steak_sushi.zip'

# If the image folder doesn't exist, download it and prepare it.
if image_path.is_dir():
  print(f'{image_path} directory exists.')
else:
  print(f'Did not find {image_path} directory, creating one...')
  image_path.mkdir(parents=True, exist_ok=True)

# Download pizza, steak, and sushi data
with open(image_zip_path, 'wb') as f:
  request  = requests.get('https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip')
  print('Downloading pizza, steak, and sushi data...')
  f.write(request.content)

# Unzip pizza, steak, and sushi data
with zipfile.ZipFile(image_zip_path, 'r') as zip_ref:
  print('Unzipping pizza, steak, and sushi data...')
  zip_ref.extractall(image_path)

# Remove zip file
os.remove(image_zip_path)

Writing get_data.py


In [None]:
# Example running of get_data.py
!python get_data.py

Did not find data/pizza_steak_sushi directory, creating one...
Downloading pizza, steak, and sushi data...
Unzipping pizza, steak, and sushi data...


## 2. Use [Python's `argparse` module](https://docs.python.org/3/library/argparse.html) to be able to send the `train.py` custom hyperparameter values for training procedures.
* Add an argument flag for using a different:
  * Training/testing directory
  * Learning rate
  * Batch size
  * Number of epochs to train for
  * Number of hidden units in the TinyVGG model
    * Keep the default values for each of the above arguments as what they already are (as in notebook 05).
* For example, you should be able to run something similar to the following line to train a TinyVGG model with a learning rate of 0.003 and a batch size of 64 for 20 epochs: `python train.py --learning_rate 0.003 batch_size 64 num_epochs 20`.
* **Note:** Since `train.py` leverages the other scripts we created in section 05, such as, `model_builder.py`, `utils.py` and `engine.py`, you'll have to make sure they're available to use too. You can find these in the [`going_modular` folder on the course GitHub](https://github.com/mrdbourke/pytorch-deep-learning/tree/main/going_modular/going_modular).

In [7]:
%%writefile data_setup.py
"""
Sets up the data that is needed for model training and testing.
"""
# Standard library imports
import os
from pathlib import Path

# Third-party imports
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import torchvision


BATCH_SIZE = 32
NUM_WORKERS = os.cpu_count()

def create_data_directories(image_path:Path) -> tuple:
  """Creates Paths for training data and testing data.

  Creates and returns Path objects for the training data directory
  and the testing data directory.

  Args:
  image_path: A parent Path for the image data.

  Returns:
  A tuple of (Path, Path) representing the path directories of the
  training data and testing data respectively.

  Example usage:
  train_dir, test_dir = create_data_directories(image_path="path/to/images")
  """
  train_dir = image_path / 'train'
  test_dir = image_path / 'test'
  return train_dir, test_dir

def create_data_transform() -> transforms.Compose:
  """Creates a DataTransform.

  Creates a Compose object that will apply two transforms.
  First the Compose object will resize an image to 64X64 pixels and
  then transform the image to a PyTorch tensor.

  Returns:
  A Compose object that applies two transforms that resizes an image
  and turns the image into a tensor.

  Example usage:
  data_transform = create_data_transform
  """
  data_transform = transforms.Compose([
      transforms.Resize(size=(64, 64)),
      transforms.ToTensor()])
  return data_transform

def create_datasets(train_dir: Path,
                    test_dir: Path,
                    transform: transforms.Compose) -> tuple:
  """Creates training and testing Datasets.

  Takes in Paths, for the training data directory and testing data directory,
  and a Transform to build a tuple of ImageFolder Datasets.

  Args:
  train_dir: Path to training data directory.
  test_dir: Path to testing data directory.
  transform: torchvision transforms to perform on training and testing data.

  Returns:
  A tuple of (ImageFolder, ImageFolder).

  Example usage:
  train_data, test_data = create_datasets(train_dir:train_path,
    test_dir=test_path,
    transform=data_transform)
  """
  train_data = datasets.ImageFolder(root=train_dir, transform=transform)
  test_data = datasets.ImageFolder(root=test_dir, transform=transform)
  return train_data, test_data

def create_dataloaders(train_data: str,
                       test_data: str,
                       batch_size:int=BATCH_SIZE,
                       num_workers: int=NUM_WORKERS) -> tuple:
  """Creates training and testing DataLoaders.

  Takes in a training directory and testing directory path and turns them into
  PyTorch Datasts and then into PyTorch DataLoaders.

  Args:
  train_dir: Path to training data directory.
  test_dir: Path to testing data directory.
  batch_size: Number of samples per batch in each of te DataLoaders.
  num_workers: An integer of number of workers per DataLoader.

  Returns:
  A tuple of (DataLoader, DataLoader).

  Example usage:
  train_dataloader, test_dataloader, class_names = create_dataloader(
    train_dir=path/to/train_dir,
    test_dir=path/to/test_dir,
    batch_size=32,
    num_workers=4
  )
  """

  # Turn datasets into DataLoaders
  train_dataloader = DataLoader(
      dataset=train_data,
      batch_size=batch_size,
      shuffle=True,
      num_workers=num_workers,
      pin_memory=True
  )

  test_dataloader = DataLoader(
      dataset=test_data,
      batch_size=batch_size,
      shuffle=False,
      num_workers=num_workers,
      pin_memory=True
  )

  return train_dataloader, test_dataloader

Overwriting data_setup.py


In [None]:
%%writefile train.py
# Third-party imports
import torch

# Application-specific imports
import data_setup

# Setup hyperparameters

# Setup device agnostic code
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Setup directories


In [None]:
# Example running of train.py
!python train.py --num_epochs 5 --batch_size 128 --hidden_units 128 --learning_rate 0.0003

## 3. Create a Python script to predict (such as `predict.py`) on a target image given a file path with a saved model.

* For example, you should be able to run the command `python predict.py some_image.jpeg` and have a trained PyTorch model predict on the image and return its prediction.
* To see example prediction code, check out the [predicting on a custom image section in notebook 04](https://www.learnpytorch.io/04_pytorch_custom_datasets/#113-putting-custom-image-prediction-together-building-a-function).
* You may also have to write code to load in a trained model.

In [None]:
# YOUR CODE HERE

In [None]:
# Example running of predict.py
!python predict.py --image data/pizza_steak_sushi/test/sushi/175783.jpg