# Intermediate architectures and advanced PyTorch tools
## TD 4

We are essentially going to use the same `Food101` ([credit where it's due](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/)) data, the same object `ImageDataset`, the same `DataLoader`.

The code below is mainly a copy of the code from the previous TD, except that global variables are now defined separately and everything is wrapped in different functions. This is to make it easier to train the same model with different hyperparameters and architectures, etc ...

In [153]:
# Imports

import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import pathlib
import time
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Set the random seed for reproducibility
_ = torch.manual_seed(25)

In [154]:
# Global variables

# Setup device-agnostic code
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {DEVICE} device")

# Batch size
BATCH_SIZE = 4

# Learning rate
LEARNING_RATE = 1e-3

# Number of epochs
NUM_EPOCHS = 10

# Number of classes
NUM_CLASSES = 3

Using cpu device


In [155]:
def get_data_loaders(batch_size: int = 4) -> tuple[DataLoader, DataLoader]:
    """
    Load the training and test datasets into data loaders.
    """
    data_dir = pathlib.Path("data")
    train_dir = data_dir / "Food-3" / "train"
    test_dir = data_dir / "Food-3" / "test"

    data_transform = transforms.Compose(
        [
            transforms.Resize(size=(64, 64)),  # Resize the images to 64x64*
            transforms.ToTensor()  # Convert the images to tensors
        ]
    )

    train_data = datasets.ImageFolder(
        root=train_dir,  # target folder of images
        transform=data_transform,  # transforms to perform on data (images)
        target_transform=None  # transforms to perform on labels (if necessary)
    ) 

    test_data = datasets.ImageFolder(
        root=test_dir,
        transform=data_transform
    )

    train_dataloader = DataLoader(
        dataset=train_data,
        batch_size=batch_size,  # how many samples per batch?
        shuffle=True  # shuffle the data?
    )

    test_dataloader = DataLoader(
        dataset=test_data,
        batch_size=batch_size,
        shuffle=False
    ) # don't usually need to shuffle testing data


    return train_dataloader, test_dataloader

In [156]:
# Load dataloaders in global variables
TRAIN_DATALOADER, TEST_DATALOADER = get_data_loaders(BATCH_SIZE)

In [157]:
class Net(nn.Module):
    def __init__(self, hidden_units=200):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(64*64*3, hidden_units)
        self.fc2 = nn.Linear(hidden_units, NUM_CLASSES)

    def forward(self, x):
        x = nn.ReLU()(self.fc1(x))
        x = self.fc2(x)
        return x

In [158]:
# Create model
MODEL: Net = Net().to(DEVICE)

In [159]:
def test_our_model() -> float:
    # 0. Put model in eval mode
    MODEL.eval()  # to remove stuff like dropout that's only going to be in the training part

    # 1. Setup test accuracy value
    test_acc = 0

    # 2. Turn on inference context manager
    with torch.no_grad():
        # Loop through DataLoader batches
        for X_test, y_test in TEST_DATALOADER:  # majuscule à X car c'est une "matrice", et y un entier
            # a. Move data to device
            X_test_flattened = X_test.view(BATCH_SIZE, 64*64*3).to(DEVICE)
            y_test = y_test.to(DEVICE)

            # b. Forward pass
            model_output = MODEL(X_test_flattened)

            # c. Calculate and accumulate accuracy
            test_pred_label = model_output.argmax(dim=1)
            test_acc += (test_pred_label == y_test).sum()

    # Adjust metrics to get average loss and accuracy per batch
    test_acc = test_acc / (BATCH_SIZE*len(TEST_DATALOADER))
    return test_acc.item()

In [160]:
# Test our untrained model
print((f"{100*test_our_model():.2f}%"))

36.00%


You should get 36.00% accuracy on the testing set without training and with the default hyperparameters if you used the same seed.

In [161]:
def main_train(loss_fn, optimizer) -> None:
    """
    Train the model and modified the trained model inplace.
    """
    start_time_global = time.time()

    # Put model in train mode
    MODEL.train()

    # Setup train loss and train accuracy values
    train_loss, train_acc = 0, 0

    # Loop through data loader data batches
    for epoch in range(NUM_EPOCHS):
        start_time_epoch = time.time()
        for X, y in TRAIN_DATALOADER:
            # 0. Move data to device
            X = X.view(BATCH_SIZE, 64*64*3).to(DEVICE)
            y = y.to(DEVICE)

            # 1. Forward pass
            y_pred = MODEL(X)

            # 2. Calculate  and accumulate loss
            loss = loss_fn(y_pred, y)
            train_loss += loss.item()

            # 3. Optimizer zero grad
            optimizer.zero_grad()

            # 4. Loss backward
            loss.backward()

            # 5. Optimizer step
            optimizer.step()

            # Calculate and accumulate accuracy metric across all batches
            y_pred_class = y_pred.argmax(dim=1)
            train_acc += (y_pred_class == y).sum()

        # Adjust metrics to get average loss and accuracy per batch
        train_loss = train_loss / (BATCH_SIZE * len(TRAIN_DATALOADER))
        train_acc = train_acc / (BATCH_SIZE * len(TRAIN_DATALOADER))
        print(
            f"epoch {epoch+1}/{NUM_EPOCHS},"
            f" train_loss = {train_loss:.2e},"
            f" train_acc = {100*train_acc.item():.2f}%,"
            f" time spent during this epoch = {time.time() - start_time_epoch:.2f}s"
            f" total time spent = {time.time() - start_time_global:.2f}s"
        )

In [162]:
main_train(nn.CrossEntropyLoss(), torch.optim.SGD(MODEL.parameters(), lr=LEARNING_RATE))

epoch 1/10, train_loss = 2.43e-01, train_acc = 51.00%, time spent during this epoch = 11.76s total time spent = 11.76s
epoch 2/10, train_loss = 2.26e-01, train_acc = 56.43%, time spent during this epoch = 11.95s total time spent = 23.71s
epoch 3/10, train_loss = 2.19e-01, train_acc = 59.21%, time spent during this epoch = 11.53s total time spent = 35.24s
epoch 4/10, train_loss = 2.15e-01, train_acc = 60.02%, time spent during this epoch = 11.60s total time spent = 46.84s
epoch 5/10, train_loss = 2.10e-01, train_acc = 61.54%, time spent during this epoch = 11.48s total time spent = 58.32s
epoch 6/10, train_loss = 2.07e-01, train_acc = 62.73%, time spent during this epoch = 11.43s total time spent = 69.75s
epoch 7/10, train_loss = 2.02e-01, train_acc = 63.76%, time spent during this epoch = 11.49s total time spent = 81.24s
epoch 8/10, train_loss = 2.00e-01, train_acc = 64.73%, time spent during this epoch = 12.20s total time spent = 93.44s
epoch 9/10, train_loss = 1.96e-01, train_acc = 6

In [163]:
print((f"{100*test_our_model():.2f}%"))

58.33%


You should get 58.33% accuracy on the testing set without training and with the default hyperparameters if you used the same seed.

## Let's try to improve this accuracy!

You will need to install the Optuna package (`pip install optuna`) and import it at the beginning of your script. Then, you will need to define a new function that will be used as the objective function for Optuna's optimization. This function should take in the `trial` object from Optuna as an argument and use the `trial` object to define and sample the hyperparameters that you want to optimize. For example, you can use the `trial` object to sample a choice between a convolutional and dense network, and to sample the number of neurons for the chosen network.

We will do this together, and then you'll have to implement optimization of the learning rate and optimizer's choice on your own.

In [164]:
import optuna

In the main_train function, we will need to use the sampled hyperparameters to define the architecture of the model, and to pass it to the optimizer. After training the model, we will need to return the final validation loss or accuracy as the objective function value for Optuna.

Finally, we will need to call the `optuna.create_study()` function to create a new study, and use the `study.optimize()` function to run the optimization, passing the objective function that we defined earlier.

You can find more information about how to use Optuna in the [Optuna documentation](https://optuna.org/docs/index.html).