# Intermediate architectures and advanced PyTorch tools
## TD 4

We are essentially going to use the same `Food101` ([credit where it's due](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/)) data, the same object `ImageDataset`, the same `DataLoader`.

The code below is mainly a copy of the code from the previous TD, except that global variables are now defined separately and everything is wrapped in different functions. This is to make it easier to train the same model with different hyperparameters and architectures, etc ...

For those that can use their GPUs, all the necessary `.to(device)` are already in the code.

If, for some reason, you encounter this error: `OutOfMemoryError: CUDA out of memory.`. It means that your GPU does not have enough memory to run the model. You can try to reduce the batch size, or the number of neurons in the network, or the number of layers in the network, or the number of filters in the convolutional layers, etc ...

In [1]:
# Imports

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import pathlib
import time
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Set the random seed for reproducibility
_ = torch.manual_seed(25)

You can set the `flush` parameter to `True` for all `print()` statements in `Python` by overriding the built-in `print()` function using the `functools.partial()` method. An example of this is:

```py
from functools import partial
print = partial(print, flush=True)
```

We will use this to make sure that the outputs are printed in the correct order and at the correct time (for more info, check [this link](https://www.includehelp.com/python/flush-parameter-in-python-with-print-function.aspx)).

In [2]:
from functools import partial
print = partial(print, flush=True)

In [3]:
# Global variables

# Setup device-agnostic code
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {DEVICE} device")

# Batch size
BATCH_SIZE = 8

# Learning rate
LEARNING_RATE = 2e-2

# Number of epochs
NUM_EPOCHS = 15

# Number of classes
NUM_CLASSES = 3

Using cuda device


In [4]:
def get_datasets_and_dataloaders(
    batch_size: int = 4
) -> tuple[
    datasets.ImageFolder, 
    datasets.ImageFolder, 
    DataLoader, 
    DataLoader
]:
    """
    Load the training and test datasets into data loaders.
    """
    data_dir = pathlib.Path("data")
    train_dir = data_dir / "Food-3" / "train"
    test_dir = data_dir / "Food-3" / "test"

    data_transform = transforms.Compose(
        [
            transforms.Resize(size=(64, 64)),  # Resize the images to 64x64*
            transforms.ToTensor()  # Convert the images to tensors
        ]
    )

    train_data = datasets.ImageFolder(
        root=train_dir,  # target folder of images
        transform=data_transform,  # transforms to perform on data (images)
        target_transform=None  # transforms to perform on labels (if necessary)
    ) 

    test_data = datasets.ImageFolder(
        root=test_dir,
        transform=data_transform
    )

    train_dataloader = DataLoader(
        dataset=train_data,
        batch_size=batch_size,  # how many samples per batch?
        shuffle=True  # shuffle the data?
    )

    test_dataloader = DataLoader(
        dataset=test_data,
        batch_size=batch_size,
        shuffle=False
    ) # don't usually need to shuffle testing data


    return train_data, test_data, train_dataloader, test_dataloader

In [5]:
# Load dataloaders in global variables
TRAIN_DATASET, TEST_DATASET, TRAIN_DATALOADER, TEST_DATALOADER = get_datasets_and_dataloaders(BATCH_SIZE)

# We actually don't really need to return the datasets, but it's nice to have them for reference. If you don't,
# you can just return the dataloaders and find the datasets by calling TRAIN_DATALOADER.dataset or TEST_DATALOADER.dataset:
print(TRAIN_DATALOADER.dataset == TRAIN_DATASET)
print(TEST_DATALOADER.dataset == TEST_DATASET)

True
True


In [6]:
class Net(nn.Module):
    def __init__(self, hidden_units=200):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(64*64*3, hidden_units)
        self.fc2 = nn.Linear(hidden_units, NUM_CLASSES)

    def forward(self, x):
        x = nn.ReLU()(self.fc1(x))
        x = self.fc2(x)
        return x

In [7]:
# Create model
MODEL: Net = Net().to(DEVICE)

In [8]:
def test_our_model() -> float:
    # 0. Put model in eval mode
    MODEL.eval()  # to remove stuff like dropout that's only going to be in the training part

    # 1. Setup test accuracy value
    test_acc: float = 0

    # 2. Turn on inference context manager
    with torch.no_grad():
        # Loop through DataLoader batches
        for X_test, y_test in TEST_DATALOADER:  # majuscule à X car c'est une "matrice", et y un entier
            # a. Move data to device
            X_test_flattened = X_test.view(-1, 64*64*3).to(DEVICE) 
            y_test = y_test.to(DEVICE)

            # b. Forward pass
            model_output = MODEL(X_test_flattened)

            # c. Calculate and accumulate accuracy
            test_pred_label = model_output.argmax(dim=1)
            test_acc += (test_pred_label == y_test).sum()

    # Adjust metrics to get average loss and accuracy per batch
    test_acc = test_acc / (len(TEST_DATASET))
    return test_acc.item()

In [9]:
# Test our untrained model
print((f"{100*test_our_model():.2f}%"))

36.00%


You should get 36.00% accuracy on the testing set without training and with the default hyperparameters if you used the same seed.

---

Why does it not work with ` X_test_flattened = X_test.view(BATCH_SIZE, 64*64*3).to(DEVICE)`?

---

In [10]:
def main_train(loss_fn, optimizer) -> None:
    """
    Train the model and modified the trained model inplace.
    """
    start_time_global = time.time()

    # Put model in train mode
    MODEL.train()

    # Loop through data loader data batches
    for epoch in range(NUM_EPOCHS):
        start_time_epoch = time.time()

        # Setup train loss and train accuracy values
        train_loss, train_acc = 0, 0

        for X, y in TRAIN_DATALOADER:
            # 0. Move data to device
            X = X.view(-1, 64*64*3).to(DEVICE)
            y = y.to(DEVICE)

            # 1. Forward pass
            y_pred = MODEL(X)

            # 2. Calculate and accumulate loss
            loss = loss_fn(y_pred, y)
            train_loss += loss.item()

            # 3. Optimizer zero grad
            optimizer.zero_grad()

            # 4. Loss backward
            loss.backward()

            # 5. Optimizer step
            optimizer.step()

            # Calculate and accumulate accuracy metric across all batches
            y_pred_class = y_pred.argmax(dim=1)
            train_acc += (y_pred_class == y).sum()

        # Adjust metrics to get average loss and accuracy per batch
        train_loss = train_loss / (len(TRAIN_DATASET))
        train_acc = train_acc / (len(TRAIN_DATASET))
        print(
            f"epoch {epoch+1}/{NUM_EPOCHS},"
            f" train_loss = {train_loss:.2e},"
            f" train_acc = {100*train_acc.item():.2f}%,"
            f" time spent during this epoch = {time.time() - start_time_epoch:.2f}s,"
            f" total time spent = {time.time() - start_time_global:.2f}s"
        )

In [11]:
main_train(nn.CrossEntropyLoss(), torch.optim.SGD(MODEL.parameters(), lr=LEARNING_RATE))

epoch 1/15, train_loss = 1.25e-01, train_acc = 49.81%, time spent during this epoch = 370.72s, total time spent = 370.72s
epoch 2/15, train_loss = 1.16e-01, train_acc = 54.41%, time spent during this epoch = 79.50s, total time spent = 450.23s
epoch 3/15, train_loss = 1.12e-01, train_acc = 57.63%, time spent during this epoch = 14.82s, total time spent = 465.05s
epoch 4/15, train_loss = 1.09e-01, train_acc = 59.33%, time spent during this epoch = 10.66s, total time spent = 475.71s
epoch 5/15, train_loss = 1.08e-01, train_acc = 60.41%, time spent during this epoch = 10.58s, total time spent = 486.29s
epoch 6/15, train_loss = 1.05e-01, train_acc = 60.67%, time spent during this epoch = 10.62s, total time spent = 496.92s
epoch 7/15, train_loss = 1.03e-01, train_acc = 61.30%, time spent during this epoch = 10.63s, total time spent = 507.55s
epoch 8/15, train_loss = 1.01e-01, train_acc = 62.48%, time spent during this epoch = 10.57s, total time spent = 518.11s
epoch 9/15, train_loss = 9.84e-

In [12]:
print((f"{100*test_our_model():.2f}%"))

55.67%


You should get 55.67% accuracy on the testing set without training and with the default hyperparameters if you used the same seed. And we almost reached convergence (the loss is not decreasing that much anymore, and if you try to train for more epochs, you will see that the testing set accuracy will decrease). Note that we kind of cheated by using the testing set to set the number of epochs, we should instead use validation sets and cross validation techniques ... and we will (today)! No worries.

-----

Is it possible for `train_loss` to decrease whilst `train_acc` decreases at the same time? Look at what happens between epochs 10 and 11 here:

```
epoch 10/15, train_loss = 9.64e-02, train_acc = 65.11%, [...], total time spent = 121.83s
epoch 11/15, train_loss = 9.54e-02, train_acc = 64.78%, [...], total time spent = 134.67s
```

Why is that?

-----

## Let's try to improve this accuracy!

You will need to install the Optuna package (`pip install optuna`) and import it at the beginning of your script. We should also import KFold from sklearn.model_selection. This is because we will use cross-validation to find the best hyperparameters.

In [13]:
import optuna
from sklearn.model_selection import KFold

 First easy task is to decide whether one should use a convolutional network or a dense network.
 
 We will do this together (choice between a convolutional and dense network), and then you'll have to implement optimization of the learning rate* and optimizer's choice on your own.

 \* *Careful! Small learning rates are not always better, especially if you do not change the number of epochs. You should try to find the best learning rate for the number of epochs you chose, one that is not too big for your computer to handle.*

In [14]:
class AdvancedNet(nn.Module):
    def __init__(self, use_conv: bool, hidden_units: int = 200):
        super(AdvancedNet, self).__init__()
        self.use_conv: bool = use_conv
        if use_conv:
            self.conv = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
            # output of this layer will be ((64+2*1-3)/1)+1 = 64. 
            # -> 64 channels of 64x64 images
            self.fc1 = nn.Linear(64*64*64, hidden_units)  # flattening will be necessary to enter fc1
            self.fc2 = nn.Linear(hidden_units, NUM_CLASSES)
        else:
            self.fc1 = nn.Linear(3*64*64, hidden_units)
            self.fc2 = nn.Linear(hidden_units, NUM_CLASSES)

    def forward(self, x):
        if self.use_conv:
            x = nn.ReLU()(self.conv(x))
            x = x.view(-1, 64*64*64)  # flattening is necessary, and, same as above,
            # we need to use -1 and not BATCH_SIZE because the last batch might be smaller
        x = nn.ReLU()(self.fc1(x))
        x = self.fc2(x)
        return x

 Then, you will need to define a new function that will be used as the objective function for Optuna's optimization. This function should take in the `trial` object from Optuna as an argument and use the `trial` object to define and sample the hyperparameters that you want to optimize. For example, you can use the `trial` object to sample a choice between a convolutional and dense network, and to sample the number of neurons for the chosen network. After training the model, we will need to return the final validation accuracy calculated with cross-validation* as the objective function value for Optuna to maximise.

 \* We use cross-validation here (3-fold) because we want to use the testing set as little as possible. We will use the testing set only once, at the end, to get the final accuracy of the best model. But, cross-validation greatly increases the time required to run the algorithms, so we won't always use cross-validation to optimize hyperparameters.

In [15]:
def objective(trial: optuna.trial.Trial) -> float:
    print("New trial")

    # Set up cross validation
    n_splits: int = 3
    fold = KFold(n_splits=n_splits, shuffle=True, random_state=0)
    scores = [0]*n_splits

    use_conv: bool = trial.suggest_categorical('use_conv', [True, False])

    # Loop through data loader data batches
    for fold_idx, (train_idx, valid_idx) in enumerate(fold.split(range(len(TRAIN_DATASET)))):
        # train_idx and valid_idx are numpy arrays of indices of the training and validation sets for this fold respectively.
        # They do not contain the actual data, but the indices of the data in the dataset.
        # We can use these indices to create a subset of the dataset for this fold with torch.utils.data.Subset.
        # Obviously, if an index is in the validation set, it will not be in the training set. You can
        # check this by printing train_idx and valid_idx and check by yourself.
        
        print(f"Fold {fold_idx+1}/{n_splits}")

        # Create subsets of the dataset for this fold
        sub_train_data = torch.utils.data.Subset(TRAIN_DATASET, train_idx)
        sub_valid_data = torch.utils.data.Subset(TRAIN_DATASET, valid_idx)

        # Create data loaders for this fold
        sub_train_loader = torch.utils.data.DataLoader(sub_train_data, batch_size=BATCH_SIZE, shuffle=True)
        sub_valid_loader = torch.utils.data.DataLoader(sub_valid_data, batch_size=BATCH_SIZE, shuffle=False)
        
        # Generate the model.
        my_model: AdvancedNet = AdvancedNet(use_conv).to(DEVICE)
        
        for epoch in range(NUM_EPOCHS):
            # Training of the model.
            # Put model in train mode
            my_model.train()

            # Set up optimizer
            optimizer = torch.optim.SGD(my_model.parameters(), lr=LEARNING_RATE)

            # Set up loss function
            loss_fn = nn.CrossEntropyLoss()
            for X, y in sub_train_loader:
                # 0. Reshape data to input to the network
                if use_conv:
                    pass
                else:
                    X = X.view(-1, 64*64*3)

                # 1. Move data to device
                X = X.to(DEVICE)
                y = y.to(DEVICE)

                # 2. Forward pass
                y_pred = my_model(X)

                # 3. Calculate and accumulate loss
                loss = loss_fn(y_pred, y)

                # 4. Optimizer zero grad
                optimizer.zero_grad()

                # 5. Loss backward
                loss.backward()

                # 6. Optimizer step
                optimizer.step()

        # Validation of the model.
        # Put model in eval mode
        my_model.eval()
        
        val_acc = 0
        with torch.no_grad():
            for X, y in sub_valid_loader:
                # 0. Reshape data to input to the network
                if use_conv:
                    pass
                else:
                    X = X.view(-1, 64*64*3)
                
                # 1. Move data to device
                X = X.to(DEVICE)
                y = y.to(DEVICE)

                # 2. Forward pass
                y_pred = my_model(X)
                
                # 3. Compute accuracy
                pred = y_pred.argmax(dim=1, keepdim=True)
                y_pred_class = y_pred.argmax(dim=1)

                val_acc += (y_pred_class == y).sum()

        scores[fold_idx] = (val_acc / len(sub_valid_data)).cpu()
        # bring it back otherwise, np.mean will not work
        print(f"Fold {fold_idx+1}/{n_splits} accuracy: {scores[fold_idx]}")
    
    return np.mean(scores)

Finally, we will need to call the `optuna.create_study()` function to create a new study, and use the `study.optimize()` function to run the optimization, passing the objective function that we defined earlier.

You can find more information about how to use Optuna in the [Optuna documentation](https://optuna.readthedocs.io/en/stable/index.html).

In [16]:
study = optuna.create_study(direction="maximize")
study.optimize(objective, timeout=1200, n_trials = 2) 
# - timeout=1200 -> stops after 20 minutes; 
# - n_trials = 2 -> here we only try two models, a dense or a convolutional model so
#   we need to make it stop after having trained the two models otherwise it will continue to 
#   loop on those two models unless it reaches the 20 minutes mark*. In practice, you will give
#   a lot of hyperparameters to optimize and you will want to run the optimization for a lot
#   longer than 20 minutes. The timeout parameter is useful in those cases because you won't 
#   know how long it'll take.
#   * e.g., https://i.imgur.com/bCzH1pm.png

pruned_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
complete_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]

print("\n")
print("--------------------")
print("--------------------")
print("--------------------")
print("\n")
print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print(f"\t{key}: {value}")

[32m[I 2023-01-24 22:10:17,113][0m A new study created in memory with name: no-name-0b9a7089-56bf-4e9b-b479-062c95c4ab93[0m


New trial
Fold 1/3
Fold 1/3 accuracy: 0.6088889241218567
Fold 2/3
Fold 2/3 accuracy: 0.6177777647972107
Fold 3/3
Fold 3/3 accuracy: 0.6322222352027893


[32m[I 2023-01-24 22:21:51,511][0m Trial 0 finished with value: 0.6196296215057373 and parameters: {'use_conv': True}. Best is trial 0 with value: 0.6196296215057373.[0m


New trial
Fold 1/3
Fold 1/3 accuracy: 0.6311111450195312
Fold 2/3
Fold 2/3 accuracy: 0.6200000047683716
Fold 3/3
Fold 3/3 accuracy: 0.6355555653572083


[32m[I 2023-01-24 22:27:25,121][0m Trial 1 finished with value: 0.6288889050483704 and parameters: {'use_conv': True}. Best is trial 1 with value: 0.6288889050483704.[0m




--------------------
--------------------
--------------------


Study statistics: 
  Number of finished trials:  2
  Number of pruned trials:  0
  Number of complete trials:  2
Best trial:
  Value:  0.6288889050483704
  Params: 
	use_conv: True


A lot of you lot might have a problem: we've only allowed two trials but `Optuna` tried `False` then `False` or `True` then `True`. This is because `Optuna` doesn't check if it already has used the previous set of hyperparameters. To fix this, we can add the following code:

```py
from optuna.trial import TrialState

...

for previous_trial in trial.study.trials:
    if previous_trial.state == TrialState.COMPLETE and trial.params == previous_trial.params:
        print(f"Duplicated trial: {trial.params}, return {previous_trial.value}")
        return previous_trial.value
```

And set n_trials to 5 for example, that way it'll be very unlikely to have the same hyperparameters twice.

In [17]:
from optuna.trial import TrialState

def objective(trial: optuna.trial.Trial) -> float:
    print("New trial")

    # Set up cross validation
    n_splits: int = 3
    fold = KFold(n_splits=n_splits, shuffle=True, random_state=0)
    scores = [0]*n_splits

    use_conv: bool = trial.suggest_categorical('use_conv', [True, False])

    # Check if this trial has already been run before
    for previous_trial in trial.study.trials:
        if previous_trial.state == TrialState.COMPLETE and trial.params == previous_trial.params:
            print(f"Duplicated trial: {trial.params}, return {previous_trial.value}")
            return previous_trial.value

    # Loop through data loader data batches
    for fold_idx, (train_idx, valid_idx) in enumerate(fold.split(range(len(TRAIN_DATASET)))):
        # train_idx and valid_idx are numpy arrays of indices of the training and validation sets for this fold respectively.
        # They do not contain the actual data, but the indices of the data in the dataset.
        # We can use these indices to create a subset of the dataset for this fold with torch.utils.data.Subset.
        # Obviously, if an index is in the validation set, it will not be in the training set. You can
        # check this by printing train_idx and valid_idx and check by yourself.
        
        print(f"Fold {fold_idx+1}/{n_splits}")

        # Create subsets of the dataset for this fold
        sub_train_data = torch.utils.data.Subset(TRAIN_DATASET, train_idx)
        sub_valid_data = torch.utils.data.Subset(TRAIN_DATASET, valid_idx)

        # Create data loaders for this fold
        sub_train_loader = torch.utils.data.DataLoader(sub_train_data, batch_size=BATCH_SIZE, shuffle=True)
        sub_valid_loader = torch.utils.data.DataLoader(sub_valid_data, batch_size=BATCH_SIZE, shuffle=False)
        
        # Generate the model.
        my_model: AdvancedNet = AdvancedNet(use_conv).to(DEVICE)
        
        for epoch in range(NUM_EPOCHS):
            # Training of the model.
            # Put model in train mode
            my_model.train()

            # Set up optimizer
            optimizer = torch.optim.SGD(my_model.parameters(), lr=LEARNING_RATE)

            # Set up loss function
            loss_fn = nn.CrossEntropyLoss()
            for X, y in sub_train_loader:
                # 0. Reshape data to input to the network
                if use_conv:
                    pass
                else:
                    X = X.view(-1, 64*64*3)

                # 1. Move data to device
                X = X.to(DEVICE)
                y = y.to(DEVICE)

                # 2. Forward pass
                y_pred = my_model(X)

                # 3. Calculate and accumulate loss
                loss = loss_fn(y_pred, y)

                # 4. Optimizer zero grad
                optimizer.zero_grad()

                # 5. Loss backward
                loss.backward()

                # 6. Optimizer step
                optimizer.step()

        # Validation of the model.
        # Put model in eval mode
        my_model.eval()
        
        val_acc = 0
        with torch.no_grad():
            for X, y in sub_valid_loader:
                # 0. Reshape data to input to the network
                if use_conv:
                    pass
                else:
                    X = X.view(-1, 64*64*3)
                
                # 1. Move data to device
                X = X.to(DEVICE)
                y = y.to(DEVICE)

                # 2. Forward pass
                y_pred = my_model(X)
                
                # 3. Compute accuracy
                pred = y_pred.argmax(dim=1, keepdim=True)
                y_pred_class = y_pred.argmax(dim=1)

                val_acc += (y_pred_class == y).sum()

        scores[fold_idx] = (val_acc / len(sub_valid_data)).cpu()
        # bring it back otherwise, np.mean will not work
        print(f"Fold {fold_idx+1}/{n_splits} accuracy: {scores[fold_idx]}")
    
    return np.mean(scores)


study = optuna.create_study(direction="maximize")
study.optimize(objective, timeout=1200, n_trials = 5) 
# - timeout=1200 -> stops after 20 minutes; 

pruned_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
complete_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]

print("\n")
print("--------------------")
print("--------------------")
print("--------------------")
print("\n")
print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print(f"\t{key}: {value}")

[32m[I 2023-01-24 22:27:25,175][0m A new study created in memory with name: no-name-a65c2591-4c31-4f56-8ee2-dc1a55c75211[0m


New trial
Fold 1/3
Fold 1/3 accuracy: 0.6177777647972107
Fold 2/3
Fold 2/3 accuracy: 0.6044444441795349
Fold 3/3
Fold 3/3 accuracy: 0.5699999928474426


[32m[I 2023-01-24 22:32:51,664][0m Trial 0 finished with value: 0.5974074006080627 and parameters: {'use_conv': False}. Best is trial 0 with value: 0.5974074006080627.[0m


New trial
Fold 1/3
Fold 1/3 accuracy: 0.6100000143051147
Fold 2/3
Fold 2/3 accuracy: 0.643333375453949
Fold 3/3
Fold 3/3 accuracy: 0.6344444751739502


[32m[I 2023-01-24 22:38:25,012][0m Trial 1 finished with value: 0.6292592883110046 and parameters: {'use_conv': True}. Best is trial 1 with value: 0.6292592883110046.[0m


New trial
Duplicated trial: {'use_conv': True}, return 0.6292592883110046


[32m[I 2023-01-24 22:38:25,020][0m Trial 2 finished with value: 0.6292592883110046 and parameters: {'use_conv': True}. Best is trial 1 with value: 0.6292592883110046.[0m


New trial
Duplicated trial: {'use_conv': True}, return 0.6292592883110046


[32m[I 2023-01-24 22:38:25,027][0m Trial 3 finished with value: 0.6292592883110046 and parameters: {'use_conv': True}. Best is trial 1 with value: 0.6292592883110046.[0m


New trial
Duplicated trial: {'use_conv': True}, return 0.6292592883110046


[32m[I 2023-01-24 22:38:25,034][0m Trial 4 finished with value: 0.6292592883110046 and parameters: {'use_conv': True}. Best is trial 1 with value: 0.6292592883110046.[0m




--------------------
--------------------
--------------------


Study statistics: 
  Number of finished trials:  5
  Number of pruned trials:  0
  Number of complete trials:  5
Best trial:
  Value:  0.6292592883110046
  Params: 
	use_conv: True


Let's now train with the hyperparameters that we found with Optuna. We will use the `study.best_params` attribute to get the best hyperparameters. You need to re-train on the whole training dataset!!! Otherwise, you will not get the best accuracy as you're leaving out some data.

In [18]:
# Create model
MODEL: AdvancedNet = AdvancedNet(**study.best_params).to(DEVICE)

In [19]:
print(MODEL)

AdvancedNet(
  (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=262144, out_features=200, bias=True)
  (fc2): Linear(in_features=200, out_features=3, bias=True)
)


In [20]:
def main_train_conv(loss_fn, optimizer) -> None:
    """
    Train the model and modified the trained model inplace.
    """
    start_time_global = time.time()

    # Put model in train mode
    MODEL.train()

    # Loop through data loader data batches
    for epoch in range(NUM_EPOCHS):
        start_time_epoch = time.time()

        # Setup train loss and train accuracy values
        train_loss, train_acc = 0, 0

        for X, y in TRAIN_DATALOADER:
            # 0. Reshape data to input to the network
            pass  # we are happy with the shape BATCH_SIZE, 3, 64, 64

            # 1. Move data to device
            X = X.to(DEVICE)
            y = y.to(DEVICE)

            # 2. Forward pass
            y_pred = MODEL(X)

            # 3. Calculate and accumulate loss
            loss = loss_fn(y_pred, y)
            train_loss += loss.item()

            # 4. Optimizer zero grad
            optimizer.zero_grad()

            # 5. Loss backward
            loss.backward()

            # 6. Optimizer step
            optimizer.step()

            # Calculate and accumulate accuracy metric across all batches
            y_pred_class = y_pred.argmax(dim=1)
            train_acc += (y_pred_class == y).sum()

        # Adjust metrics to get average loss and accuracy per batch
        train_loss = train_loss / (BATCH_SIZE * len(TRAIN_DATALOADER))
        train_acc = train_acc / (BATCH_SIZE * len(TRAIN_DATALOADER))
        print(
            f"epoch {epoch+1}/{NUM_EPOCHS},"
            f" train_loss = {train_loss:.2e},"
            f" train_acc = {100*train_acc.item():.2f}%,"
            f" time spent during this epoch = {time.time() - start_time_epoch:.2f}s,"
            f" total time spent = {time.time() - start_time_global:.2f}s"
        )

In [21]:
main_train_conv(nn.CrossEntropyLoss(), torch.optim.SGD(MODEL.parameters(), lr=LEARNING_RATE))

epoch 1/15, train_loss = 1.20e-01, train_acc = 53.25%, time spent during this epoch = 11.29s, total time spent = 11.29s
epoch 2/15, train_loss = 1.05e-01, train_acc = 62.17%, time spent during this epoch = 11.27s, total time spent = 22.56s
epoch 3/15, train_loss = 9.74e-02, train_acc = 66.05%, time spent during this epoch = 11.29s, total time spent = 33.85s
epoch 4/15, train_loss = 9.00e-02, train_acc = 68.16%, time spent during this epoch = 11.26s, total time spent = 45.11s
epoch 5/15, train_loss = 8.10e-02, train_acc = 72.82%, time spent during this epoch = 11.27s, total time spent = 56.39s
epoch 6/15, train_loss = 7.09e-02, train_acc = 76.48%, time spent during this epoch = 11.28s, total time spent = 67.66s
epoch 7/15, train_loss = 5.87e-02, train_acc = 80.95%, time spent during this epoch = 11.37s, total time spent = 79.03s
epoch 8/15, train_loss = 4.42e-02, train_acc = 86.50%, time spent during this epoch = 11.32s, total time spent = 90.35s
epoch 9/15, train_loss = 3.15e-02, train

In [22]:
def test_our_model_conv() -> float:
    # 0. Put model in eval mode
    MODEL.eval()  # to remove stuff like dropout that's only going to be in the training part

    # 1. Setup test accuracy value
    test_acc: float = 0

    # 2. Turn on inference context manager
    with torch.no_grad():
        # Loop through DataLoader batches
        for X_test, y_test in TEST_DATALOADER:  # majuscule à X car c'est une "matrice", et y un entier
            # a. Move data to device
            X_test_flattened = X_test.to(DEVICE)  # no need to flatten here
            y_test = y_test.to(DEVICE)

            # b. Forward pass
            model_output = MODEL(X_test_flattened)

            # c. Calculate and accumulate accuracy
            test_pred_label = model_output.argmax(dim=1)
            test_acc += (test_pred_label == y_test).sum()

    # Adjust metrics to get average loss and accuracy per batch
    test_acc = test_acc / (len(TEST_DATASET))
    return test_acc.item()

In [25]:
print((f"{100*test_our_model_conv():.2f}%"))

62.33%


Most likely some sort of overfitting has happened here (look at the training accuracy!), but we did improve our accuracy (62.33% now against 55.67% earlier, and not far off what there was in the validation set (62.92%) on average (which makes sense))! This is not amazing though, that's why we should also optimise the learning rate (or the number of epochs), etc ... not just the architecture.

Your turn now!

Optimizing learning rate and the number of channels after the first convolution layer:

In [26]:
class AdvancedNet2(nn.Module):
    def __init__(self, use_conv: bool, out_channels: int, hidden_units: int = 200):
        super(AdvancedNet2, self).__init__()
        self.use_conv: bool = use_conv
        self.out_channels: int = out_channels
        if use_conv:
            self.conv = nn.Conv2d(3, out_channels, kernel_size=3, stride=1, padding=1)
            # output of this layer will be ((64+2*1-3)/1)+1 = 64. 
            # -> 64 channels of 64x64 images
            self.fc1 = nn.Linear(out_channels*64*64, hidden_units)  # flattening will be necessary to enter fc1
            self.fc2 = nn.Linear(hidden_units, NUM_CLASSES)
        else:
            self.fc1 = nn.Linear(3*64*64, hidden_units)
            self.fc2 = nn.Linear(hidden_units, NUM_CLASSES)

    def forward(self, x):
        if self.use_conv:
            x = nn.ReLU()(self.conv(x))
            x = x.view(-1, self.out_channels*64*64)  # flattening is necessary, and, same as above,
            # we need to use -1 and not BATCH_SIZE because the last batch might be smaller
        x = nn.ReLU()(self.fc1(x))
        x = self.fc2(x)
        return x

def objective(trial: optuna.trial.Trial) -> float:
    print("New trial")

    # Set up cross validation
    n_splits: int = 5
    fold = KFold(n_splits=n_splits, shuffle=True, random_state=0)
    scores = [0]*n_splits

    use_conv: bool = trial.suggest_categorical('use_conv', [True, False])
    if use_conv:
        out_channels: int = trial.suggest_int('out_channels', 3, 64)
    else:
        out_channels: int = 0
    learning_rate_to_optimise: float = trial.suggest_float('learning_rate', 1e-5, 1e-1, log=True)

    # Check if this trial has already been run before
    for previous_trial in trial.study.trials:
        if previous_trial.state == TrialState.COMPLETE and trial.params == previous_trial.params:
            print(f"Duplicated trial: {trial.params}, return {previous_trial.value}")
            return previous_trial.value

    # Loop through data loader data batches
    for fold_idx, (train_idx, valid_idx) in enumerate(fold.split(range(len(TRAIN_DATASET)))):
        # train_idx and valid_idx are numpy arrays of indices of the training and validation sets for this fold respectively.
        # They do not contain the actual data, but the indices of the data in the dataset.
        # We can use these indices to create a subset of the dataset for this fold with torch.utils.data.Subset.
        # Obviously, if an index is in the validation set, it will not be in the training set. You can
        # check this by printing train_idx and valid_idx and check by yourself.
        
        print(f"Fold {fold_idx+1}/{n_splits}")

        # Create subsets of the dataset for this fold
        sub_train_data = torch.utils.data.Subset(TRAIN_DATASET, train_idx)
        sub_valid_data = torch.utils.data.Subset(TRAIN_DATASET, valid_idx)

        # Create data loaders for this fold
        sub_train_loader = torch.utils.data.DataLoader(sub_train_data, batch_size=BATCH_SIZE, shuffle=True)
        sub_valid_loader = torch.utils.data.DataLoader(sub_valid_data, batch_size=BATCH_SIZE, shuffle=False)
        
        # Generate the model.
        my_model: AdvancedNet2 = AdvancedNet2(use_conv, out_channels).to(DEVICE)
        
        for epoch in range(NUM_EPOCHS):
            # Training of the model.
            # Put model in train mode
            my_model.train()

            # Set up optimizer
            optimizer = torch.optim.SGD(my_model.parameters(), lr=learning_rate_to_optimise)

            # Set up loss function
            loss_fn = nn.CrossEntropyLoss()
            for X, y in sub_train_loader:
                # 0. Reshape data to input to the network
                if use_conv:
                    pass
                else:
                    X = X.view(-1, 64*64*3)

                # 1. Move data to device
                X = X.to(DEVICE)
                y = y.to(DEVICE)

                # 2. Forward pass
                y_pred = my_model(X)

                # 3. Calculate and accumulate loss
                loss = loss_fn(y_pred, y)

                # 4. Optimizer zero grad
                optimizer.zero_grad()

                # 5. Loss backward
                loss.backward()

                # 6. Optimizer step
                optimizer.step()

        # Validation of the model.
        # Put model in eval mode
        my_model.eval()
        
        val_acc = 0
        with torch.no_grad():
            for X, y in sub_valid_loader:
                # 0. Reshape data to input to the network
                if use_conv:
                    pass
                else:
                    X = X.view(-1, 64*64*3)
                
                # 1. Move data to device
                X = X.to(DEVICE)
                y = y.to(DEVICE)

                # 2. Forward pass
                y_pred = my_model(X)
                
                # 3. Compute accuracy
                pred = y_pred.argmax(dim=1, keepdim=True)
                y_pred_class = y_pred.argmax(dim=1)

                val_acc += (y_pred_class == y).sum()

        scores[fold_idx] = (val_acc / len(sub_valid_data)).cpu()
        # bring it back otherwise, np.mean will not work
        print(f"Fold {fold_idx+1}/{n_splits} accuracy: {scores[fold_idx]}")
    
    return np.mean(scores)


study = optuna.create_study(direction="maximize")
study.optimize(objective, timeout=36000, n_trials = 500) 
# - timeout=3600 -> stops after 10 hours or 500 trials, whichever comes first; 

pruned_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
complete_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]

print("\n")
print("--------------------")
print("--------------------")
print("--------------------")
print("\n")
print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print(f"\t{key}: {value}")

[32m[I 2023-01-24 22:55:38,580][0m A new study created in memory with name: no-name-599dea74-db32-4ef3-9189-ba88c961bcad[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5999999642372131
Fold 2/5
Fold 2/5 accuracy: 0.6037036776542664
Fold 3/5
Fold 3/5 accuracy: 0.5592592358589172
Fold 4/5
Fold 4/5 accuracy: 0.5870370268821716
Fold 5/5
Fold 5/5 accuracy: 0.5833333134651184


[32m[I 2023-01-24 23:07:08,969][0m Trial 0 finished with value: 0.5866666436195374 and parameters: {'use_conv': True, 'out_channels': 31, 'learning_rate': 0.00027216975756670815}. Best is trial 0 with value: 0.5866666436195374.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6092592477798462
Fold 2/5
Fold 2/5 accuracy: 0.585185170173645
Fold 3/5
Fold 3/5 accuracy: 0.5814814567565918
Fold 4/5
Fold 4/5 accuracy: 0.5796296000480652
Fold 5/5
Fold 5/5 accuracy: 0.5833333134651184


[32m[I 2023-01-24 23:17:50,002][0m Trial 1 finished with value: 0.5877777338027954 and parameters: {'use_conv': False, 'learning_rate': 0.012377683289352596}. Best is trial 1 with value: 0.5877777338027954.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.605555534362793
Fold 2/5
Fold 2/5 accuracy: 0.6722221970558167
Fold 3/5
Fold 3/5 accuracy: 0.6222221851348877
Fold 4/5
Fold 4/5 accuracy: 0.5870370268821716
Fold 5/5
Fold 5/5 accuracy: 0.6611111164093018


[32m[I 2023-01-24 23:28:42,554][0m Trial 2 finished with value: 0.6296296119689941 and parameters: {'use_conv': True, 'out_channels': 36, 'learning_rate': 0.007463640154675805}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6092592477798462
Fold 2/5
Fold 2/5 accuracy: 0.5777777433395386
Fold 3/5
Fold 3/5 accuracy: 0.555555522441864
Fold 4/5
Fold 4/5 accuracy: 0.5537036657333374
Fold 5/5
Fold 5/5 accuracy: 0.555555522441864


[32m[I 2023-01-24 23:39:26,019][0m Trial 3 finished with value: 0.5703703165054321 and parameters: {'use_conv': False, 'learning_rate': 0.0001308924002348598}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5259259343147278
Fold 2/5
Fold 2/5 accuracy: 0.5
Fold 3/5
Fold 3/5 accuracy: 0.5370370149612427
Fold 4/5
Fold 4/5 accuracy: 0.5407407283782959
Fold 5/5
Fold 5/5 accuracy: 0.5277777910232544


[32m[I 2023-01-24 23:50:17,523][0m Trial 4 finished with value: 0.5262962579727173 and parameters: {'use_conv': True, 'out_channels': 15, 'learning_rate': 0.08267596829405642}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.31111109256744385
Fold 2/5
Fold 2/5 accuracy: 0.3333333134651184
Fold 3/5
Fold 3/5 accuracy: 0.32777777314186096
Fold 4/5
Fold 4/5 accuracy: 0.3185185194015503
Fold 5/5
Fold 5/5 accuracy: 0.34074074029922485


[32m[I 2023-01-25 00:02:11,550][0m Trial 5 finished with value: 0.32629626989364624 and parameters: {'use_conv': False, 'learning_rate': 0.08458397978025306}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6222221851348877
Fold 2/5
Fold 2/5 accuracy: 0.6240740418434143
Fold 3/5
Fold 3/5 accuracy: 0.5925925970077515
Fold 4/5
Fold 4/5 accuracy: 0.5944444537162781
Fold 5/5
Fold 5/5 accuracy: 0.5962963104248047


[32m[I 2023-01-25 00:21:16,374][0m Trial 6 finished with value: 0.6059259176254272 and parameters: {'use_conv': True, 'out_channels': 55, 'learning_rate': 0.000455960667565156}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6092592477798462
Fold 2/5
Fold 2/5 accuracy: 0.5962963104248047
Fold 3/5
Fold 3/5 accuracy: 0.5962963104248047
Fold 4/5
Fold 4/5 accuracy: 0.5592592358589172
Fold 5/5
Fold 5/5 accuracy: 0.5629629492759705


[32m[I 2023-01-25 00:39:23,061][0m Trial 7 finished with value: 0.5848148465156555 and parameters: {'use_conv': False, 'learning_rate': 0.010172336852831261}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5796296000480652
Fold 2/5
Fold 2/5 accuracy: 0.614814817905426
Fold 3/5
Fold 3/5 accuracy: 0.585185170173645
Fold 4/5
Fold 4/5 accuracy: 0.6111111044883728
Fold 5/5
Fold 5/5 accuracy: 0.585185170173645


[32m[I 2023-01-25 00:50:09,881][0m Trial 8 finished with value: 0.5951851606369019 and parameters: {'use_conv': False, 'learning_rate': 0.012866546188185862}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5962963104248047
Fold 2/5
Fold 2/5 accuracy: 0.5870370268821716
Fold 3/5
Fold 3/5 accuracy: 0.5592592358589172
Fold 4/5
Fold 4/5 accuracy: 0.5462962985038757
Fold 5/5
Fold 5/5 accuracy: 0.5629629492759705


[32m[I 2023-01-25 01:01:28,176][0m Trial 9 finished with value: 0.5703703761100769 and parameters: {'use_conv': False, 'learning_rate': 0.0001263659182092633}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5925925970077515
Fold 2/5
Fold 2/5 accuracy: 0.5703703761100769
Fold 3/5
Fold 3/5 accuracy: 0.5629629492759705
Fold 4/5
Fold 4/5 accuracy: 0.575925886631012
Fold 5/5
Fold 5/5 accuracy: 0.5629629492759705


[32m[I 2023-01-25 01:12:48,010][0m Trial 10 finished with value: 0.5729629397392273 and parameters: {'use_conv': True, 'out_channels': 50, 'learning_rate': 4.2841416432543415e-05}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6222221851348877
Fold 2/5
Fold 2/5 accuracy: 0.6129629611968994
Fold 3/5
Fold 3/5 accuracy: 0.6185185313224792
Fold 4/5
Fold 4/5 accuracy: 0.585185170173645
Fold 5/5
Fold 5/5 accuracy: 0.6222221851348877


[32m[I 2023-01-25 01:23:52,100][0m Trial 11 finished with value: 0.6122222542762756 and parameters: {'use_conv': True, 'out_channels': 62, 'learning_rate': 0.0011904814964542464}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6166666746139526
Fold 2/5
Fold 2/5 accuracy: 0.6444444060325623
Fold 3/5
Fold 3/5 accuracy: 0.6333333253860474
Fold 4/5
Fold 4/5 accuracy: 0.5907407402992249
Fold 5/5
Fold 5/5 accuracy: 0.6314814686775208


[32m[I 2023-01-25 01:36:21,251][0m Trial 12 finished with value: 0.6233333349227905 and parameters: {'use_conv': True, 'out_channels': 36, 'learning_rate': 0.0025089581111183232}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6240740418434143
Fold 2/5
Fold 2/5 accuracy: 0.6333333253860474
Fold 3/5
Fold 3/5 accuracy: 0.5629629492759705
Fold 4/5
Fold 4/5 accuracy: 0.5814814567565918
Fold 5/5
Fold 5/5 accuracy: 0.6203703880310059


[32m[I 2023-01-25 01:47:12,745][0m Trial 13 finished with value: 0.6044444441795349 and parameters: {'use_conv': True, 'out_channels': 33, 'learning_rate': 0.0019809363071112557}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6074073910713196
Fold 2/5
Fold 2/5 accuracy: 0.6277777552604675
Fold 3/5
Fold 3/5 accuracy: 0.6185185313224792
Fold 4/5
Fold 4/5 accuracy: 0.6129629611968994
Fold 5/5
Fold 5/5 accuracy: 0.6518518328666687


[32m[I 2023-01-25 02:24:21,723][0m Trial 14 finished with value: 0.6237037181854248 and parameters: {'use_conv': True, 'out_channels': 41, 'learning_rate': 0.002763394800201883}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5888888835906982
Fold 2/5
Fold 2/5 accuracy: 0.5888888835906982
Fold 3/5
Fold 3/5 accuracy: 0.5388888716697693
Fold 4/5
Fold 4/5 accuracy: 0.5222222208976746
Fold 5/5
Fold 5/5 accuracy: 0.5333333015441895


[32m[I 2023-01-25 02:36:07,150][0m Trial 15 finished with value: 0.554444432258606 and parameters: {'use_conv': True, 'out_channels': 41, 'learning_rate': 1.1773911707213765e-05}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6074073910713196
Fold 2/5
Fold 2/5 accuracy: 0.6555555462837219
Fold 3/5
Fold 3/5 accuracy: 0.614814817905426
Fold 4/5
Fold 4/5 accuracy: 0.5722222328186035
Fold 5/5
Fold 5/5 accuracy: 0.6537036895751953


[32m[I 2023-01-25 02:47:29,351][0m Trial 16 finished with value: 0.6207407712936401 and parameters: {'use_conv': True, 'out_channels': 15, 'learning_rate': 0.00484976276163504}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6166666746139526
Fold 2/5
Fold 2/5 accuracy: 0.6407407522201538
Fold 3/5
Fold 3/5 accuracy: 0.5981481671333313
Fold 4/5
Fold 4/5 accuracy: 0.5740740895271301
Fold 5/5
Fold 5/5 accuracy: 0.6203703880310059


[32m[I 2023-01-25 02:58:41,385][0m Trial 17 finished with value: 0.6100000143051147 and parameters: {'use_conv': True, 'out_channels': 24, 'learning_rate': 0.0009586359059350767}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5999999642372131
Fold 2/5
Fold 2/5 accuracy: 0.5777777433395386
Fold 3/5
Fold 3/5 accuracy: 0.6074073910713196
Fold 4/5
Fold 4/5 accuracy: 0.6166666746139526
Fold 5/5
Fold 5/5 accuracy: 0.6259258985519409


[32m[I 2023-01-25 03:09:45,788][0m Trial 18 finished with value: 0.605555534362793 and parameters: {'use_conv': True, 'out_channels': 46, 'learning_rate': 0.004745910145905113}. Best is trial 2 with value: 0.6296296119689941.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5981481671333313
Fold 2/5
Fold 2/5 accuracy: 0.664814829826355
Fold 3/5
Fold 3/5 accuracy: 0.6314814686775208
Fold 4/5
Fold 4/5 accuracy: 0.6222221851348877
Fold 5/5
Fold 5/5 accuracy: 0.6574074029922485


[32m[I 2023-01-25 03:21:03,870][0m Trial 19 finished with value: 0.6348148584365845 and parameters: {'use_conv': True, 'out_channels': 25, 'learning_rate': 0.022887831450837906}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6018518209457397
Fold 2/5
Fold 2/5 accuracy: 0.6129629611968994
Fold 3/5
Fold 3/5 accuracy: 0.5962963104248047
Fold 4/5
Fold 4/5 accuracy: 0.5981481671333313
Fold 5/5
Fold 5/5 accuracy: 0.5999999642372131


[32m[I 2023-01-25 03:32:22,557][0m Trial 20 finished with value: 0.6018518209457397 and parameters: {'use_conv': True, 'out_channels': 5, 'learning_rate': 0.027377406551207698}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5925925970077515
Fold 2/5
Fold 2/5 accuracy: 0.6481481194496155
Fold 3/5
Fold 3/5 accuracy: 0.6462962627410889
Fold 4/5
Fold 4/5 accuracy: 0.5685185194015503
Fold 5/5
Fold 5/5 accuracy: 0.6314814686775208


[32m[I 2023-01-25 03:43:47,703][0m Trial 21 finished with value: 0.6174073815345764 and parameters: {'use_conv': True, 'out_channels': 24, 'learning_rate': 0.03983890122097303}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6111111044883728
Fold 2/5
Fold 2/5 accuracy: 0.635185182094574
Fold 3/5
Fold 3/5 accuracy: 0.6185185313224792
Fold 4/5
Fold 4/5 accuracy: 0.6074073910713196
Fold 5/5
Fold 5/5 accuracy: 0.6370370388031006


[32m[I 2023-01-25 03:55:02,374][0m Trial 22 finished with value: 0.6218518018722534 and parameters: {'use_conv': True, 'out_channels': 41, 'learning_rate': 0.005166970386374458}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6203703880310059
Fold 2/5
Fold 2/5 accuracy: 0.6666666269302368
Fold 3/5
Fold 3/5 accuracy: 0.6296296119689941
Fold 4/5
Fold 4/5 accuracy: 0.5962963104248047
Fold 5/5
Fold 5/5 accuracy: 0.6388888955116272


[32m[I 2023-01-25 04:05:58,448][0m Trial 23 finished with value: 0.6303703188896179 and parameters: {'use_conv': True, 'out_channels': 26, 'learning_rate': 0.027713339707092735}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5981481671333313
Fold 2/5
Fold 2/5 accuracy: 0.6370370388031006
Fold 3/5
Fold 3/5 accuracy: 0.6018518209457397
Fold 4/5
Fold 4/5 accuracy: 0.614814817905426
Fold 5/5
Fold 5/5 accuracy: 0.6592592597007751


[32m[I 2023-01-25 04:16:54,966][0m Trial 24 finished with value: 0.6222222447395325 and parameters: {'use_conv': True, 'out_channels': 25, 'learning_rate': 0.040406319302466874}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.614814817905426
Fold 2/5
Fold 2/5 accuracy: 0.6259258985519409
Fold 3/5
Fold 3/5 accuracy: 0.6259258985519409
Fold 4/5
Fold 4/5 accuracy: 0.5574073791503906
Fold 5/5
Fold 5/5 accuracy: 0.5999999642372131


[32m[I 2023-01-25 04:27:51,324][0m Trial 25 finished with value: 0.6048148274421692 and parameters: {'use_conv': True, 'out_channels': 14, 'learning_rate': 0.02285730224886605}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6092592477798462
Fold 2/5
Fold 2/5 accuracy: 0.6296296119689941
Fold 3/5
Fold 3/5 accuracy: 0.6185185313224792
Fold 4/5
Fold 4/5 accuracy: 0.5981481671333313
Fold 5/5
Fold 5/5 accuracy: 0.6666666269302368


[32m[I 2023-01-25 04:38:47,262][0m Trial 26 finished with value: 0.6244443655014038 and parameters: {'use_conv': True, 'out_channels': 30, 'learning_rate': 0.014457960417158107}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.575925886631012
Fold 2/5
Fold 2/5 accuracy: 0.5944444537162781
Fold 3/5
Fold 3/5 accuracy: 0.5925925970077515
Fold 4/5
Fold 4/5 accuracy: 0.555555522441864
Fold 5/5
Fold 5/5 accuracy: 0.6129629611968994


[32m[I 2023-01-25 04:49:40,838][0m Trial 27 finished with value: 0.5862962603569031 and parameters: {'use_conv': True, 'out_channels': 20, 'learning_rate': 0.049981134753738526}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6203703880310059
Fold 2/5
Fold 2/5 accuracy: 0.605555534362793
Fold 3/5
Fold 3/5 accuracy: 0.5962963104248047
Fold 4/5
Fold 4/5 accuracy: 0.5777777433395386
Fold 5/5
Fold 5/5 accuracy: 0.6092592477798462


[32m[I 2023-01-25 05:00:32,457][0m Trial 28 finished with value: 0.6018518209457397 and parameters: {'use_conv': True, 'out_channels': 6, 'learning_rate': 0.008072868356308301}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6296296119689941
Fold 2/5
Fold 2/5 accuracy: 0.6444444060325623
Fold 3/5
Fold 3/5 accuracy: 0.6111111044883728
Fold 4/5
Fold 4/5 accuracy: 0.6166666746139526
Fold 5/5
Fold 5/5 accuracy: 0.6555555462837219


[32m[I 2023-01-25 05:11:24,390][0m Trial 29 finished with value: 0.6314815282821655 and parameters: {'use_conv': True, 'out_channels': 27, 'learning_rate': 0.023370030204294713}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5962963104248047
Fold 2/5
Fold 2/5 accuracy: 0.6481481194496155
Fold 3/5
Fold 3/5 accuracy: 0.6444444060325623
Fold 4/5
Fold 4/5 accuracy: 0.5777777433395386
Fold 5/5
Fold 5/5 accuracy: 0.6611111164093018


[32m[I 2023-01-25 05:22:15,619][0m Trial 30 finished with value: 0.6255555748939514 and parameters: {'use_conv': True, 'out_channels': 20, 'learning_rate': 0.02192934902502196}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6240740418434143
Fold 2/5
Fold 2/5 accuracy: 0.6444444060325623
Fold 3/5
Fold 3/5 accuracy: 0.6425926089286804
Fold 4/5
Fold 4/5 accuracy: 0.6111111044883728
Fold 5/5
Fold 5/5 accuracy: 0.6425926089286804


[32m[I 2023-01-25 05:33:06,368][0m Trial 31 finished with value: 0.6329630017280579 and parameters: {'use_conv': True, 'out_channels': 28, 'learning_rate': 0.02025520278264623}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6314814686775208
Fold 2/5
Fold 2/5 accuracy: 0.6388888955116272
Fold 3/5
Fold 3/5 accuracy: 0.6333333253860474
Fold 4/5
Fold 4/5 accuracy: 0.5796296000480652
Fold 5/5
Fold 5/5 accuracy: 0.6407407522201538


[32m[I 2023-01-25 05:43:57,089][0m Trial 32 finished with value: 0.6248148083686829 and parameters: {'use_conv': True, 'out_channels': 28, 'learning_rate': 0.016303241420515844}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5703703761100769
Fold 2/5
Fold 2/5 accuracy: 0.5703703761100769
Fold 3/5
Fold 3/5 accuracy: 0.5777777433395386
Fold 4/5
Fold 4/5 accuracy: 0.5425925850868225
Fold 5/5
Fold 5/5 accuracy: 0.6092592477798462


[32m[I 2023-01-25 05:54:47,415][0m Trial 33 finished with value: 0.5740740895271301 and parameters: {'use_conv': True, 'out_channels': 19, 'learning_rate': 0.05310173837255445}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5740740895271301
Fold 2/5
Fold 2/5 accuracy: 0.6462962627410889
Fold 3/5
Fold 3/5 accuracy: 0.5481481552124023
Fold 4/5
Fold 4/5 accuracy: 0.5444444417953491
Fold 5/5
Fold 5/5 accuracy: 0.5814814567565918


[32m[I 2023-01-25 06:05:26,634][0m Trial 34 finished with value: 0.5788888931274414 and parameters: {'use_conv': False, 'learning_rate': 0.026845609386660033}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5240740776062012
Fold 2/5
Fold 2/5 accuracy: 0.5018518567085266
Fold 3/5
Fold 3/5 accuracy: 0.49444442987442017
Fold 4/5
Fold 4/5 accuracy: 0.5185185074806213
Fold 5/5
Fold 5/5 accuracy: 0.5351851582527161


[32m[I 2023-01-25 06:16:17,866][0m Trial 35 finished with value: 0.5148147344589233 and parameters: {'use_conv': True, 'out_channels': 27, 'learning_rate': 0.08641576602056762}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.575925886631012
Fold 2/5
Fold 2/5 accuracy: 0.6574074029922485
Fold 3/5
Fold 3/5 accuracy: 0.6129629611968994
Fold 4/5
Fold 4/5 accuracy: 0.5944444537162781
Fold 5/5
Fold 5/5 accuracy: 0.6111111044883728


[32m[I 2023-01-25 06:27:09,252][0m Trial 36 finished with value: 0.610370397567749 and parameters: {'use_conv': True, 'out_channels': 35, 'learning_rate': 0.011489379054081298}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6203703880310059
Fold 2/5
Fold 2/5 accuracy: 0.5796296000480652
Fold 3/5
Fold 3/5 accuracy: 0.6074073910713196
Fold 4/5
Fold 4/5 accuracy: 0.6092592477798462
Fold 5/5
Fold 5/5 accuracy: 0.585185170173645


[32m[I 2023-01-25 06:37:49,960][0m Trial 37 finished with value: 0.6003702878952026 and parameters: {'use_conv': False, 'learning_rate': 0.007849239120912484}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5870370268821716
Fold 2/5
Fold 2/5 accuracy: 0.5740740895271301
Fold 3/5
Fold 3/5 accuracy: 0.6092592477798462
Fold 4/5
Fold 4/5 accuracy: 0.5814814567565918
Fold 5/5
Fold 5/5 accuracy: 0.5722222328186035


[32m[I 2023-01-25 06:48:41,505][0m Trial 38 finished with value: 0.5848148465156555 and parameters: {'use_conv': True, 'out_channels': 31, 'learning_rate': 0.0659425965101501}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5907407402992249
Fold 2/5
Fold 2/5 accuracy: 0.635185182094574
Fold 3/5
Fold 3/5 accuracy: 0.6240740418434143
Fold 4/5
Fold 4/5 accuracy: 0.5907407402992249
Fold 5/5
Fold 5/5 accuracy: 0.6555555462837219


[32m[I 2023-01-25 06:59:33,297][0m Trial 39 finished with value: 0.619259238243103 and parameters: {'use_conv': True, 'out_channels': 22, 'learning_rate': 0.0351018937330664}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5833333134651184
Fold 2/5
Fold 2/5 accuracy: 0.6333333253860474
Fold 3/5
Fold 3/5 accuracy: 0.5666666626930237
Fold 4/5
Fold 4/5 accuracy: 0.5925925970077515
Fold 5/5
Fold 5/5 accuracy: 0.6074073910713196


[32m[I 2023-01-25 07:10:14,405][0m Trial 40 finished with value: 0.596666693687439 and parameters: {'use_conv': False, 'learning_rate': 0.018836004308406654}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6018518209457397
Fold 2/5
Fold 2/5 accuracy: 0.6370370388031006
Fold 3/5
Fold 3/5 accuracy: 0.6111111044883728
Fold 4/5
Fold 4/5 accuracy: 0.6037036776542664
Fold 5/5
Fold 5/5 accuracy: 0.6462962627410889


[32m[I 2023-01-25 07:21:07,299][0m Trial 41 finished with value: 0.6200000047683716 and parameters: {'use_conv': True, 'out_channels': 36, 'learning_rate': 0.0343526482690169}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5833333134651184
Fold 2/5
Fold 2/5 accuracy: 0.6555555462837219
Fold 3/5
Fold 3/5 accuracy: 0.6259258985519409
Fold 4/5
Fold 4/5 accuracy: 0.6425926089286804
Fold 5/5
Fold 5/5 accuracy: 0.6574074029922485


[32m[I 2023-01-25 07:31:59,397][0m Trial 42 finished with value: 0.6329630017280579 and parameters: {'use_conv': True, 'out_channels': 31, 'learning_rate': 0.016678893948021103}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5
Fold 2/5
Fold 2/5 accuracy: 0.5148147940635681
Fold 3/5
Fold 3/5 accuracy: 0.5351851582527161
Fold 4/5
Fold 4/5 accuracy: 0.5074074268341064
Fold 5/5
Fold 5/5 accuracy: 0.49444442987442017


[32m[I 2023-01-25 07:42:50,789][0m Trial 43 finished with value: 0.5103703737258911 and parameters: {'use_conv': True, 'out_channels': 29, 'learning_rate': 0.09079387961250182}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6314814686775208
Fold 2/5
Fold 2/5 accuracy: 0.5907407402992249
Fold 3/5
Fold 3/5 accuracy: 0.6074073910713196
Fold 4/5
Fold 4/5 accuracy: 0.5888888835906982
Fold 5/5
Fold 5/5 accuracy: 0.6388888955116272


[32m[I 2023-01-25 07:53:41,561][0m Trial 44 finished with value: 0.6114814877510071 and parameters: {'use_conv': True, 'out_channels': 12, 'learning_rate': 0.012863339725357914}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6111111044883728
Fold 2/5
Fold 2/5 accuracy: 0.6592592597007751
Fold 3/5
Fold 3/5 accuracy: 0.6129629611968994
Fold 4/5
Fold 4/5 accuracy: 0.5962963104248047
Fold 5/5
Fold 5/5 accuracy: 0.6777777671813965


[32m[I 2023-01-25 08:04:33,571][0m Trial 45 finished with value: 0.6314814686775208 and parameters: {'use_conv': True, 'out_channels': 32, 'learning_rate': 0.020164333300676126}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6240740418434143
Fold 2/5
Fold 2/5 accuracy: 0.605555534362793
Fold 3/5
Fold 3/5 accuracy: 0.5962963104248047
Fold 4/5
Fold 4/5 accuracy: 0.5037037134170532
Fold 5/5
Fold 5/5 accuracy: 0.5740740895271301


[32m[I 2023-01-25 08:15:14,238][0m Trial 46 finished with value: 0.5807406902313232 and parameters: {'use_conv': False, 'learning_rate': 0.017465756788065796}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.6129629611968994
Fold 2/5
Fold 2/5 accuracy: 0.614814817905426
Fold 3/5
Fold 3/5 accuracy: 0.5944444537162781
Fold 4/5
Fold 4/5 accuracy: 0.5685185194015503
Fold 5/5
Fold 5/5 accuracy: 0.5425925850868225


[32m[I 2023-01-25 08:26:06,444][0m Trial 47 finished with value: 0.5866667032241821 and parameters: {'use_conv': True, 'out_channels': 39, 'learning_rate': 0.059138658313891215}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.5925925970077515
Fold 2/5
Fold 2/5 accuracy: 0.6518518328666687
Fold 3/5
Fold 3/5 accuracy: 0.605555534362793
Fold 4/5
Fold 4/5 accuracy: 0.614814817905426
Fold 5/5
Fold 5/5 accuracy: 0.6333333253860474


[32m[I 2023-01-25 08:36:58,637][0m Trial 48 finished with value: 0.6196295619010925 and parameters: {'use_conv': True, 'out_channels': 32, 'learning_rate': 0.009877135285585288}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.4981481432914734
Fold 2/5
Fold 2/5 accuracy: 0.5629629492759705
Fold 3/5
Fold 3/5 accuracy: 0.5351851582527161
Fold 4/5
Fold 4/5 accuracy: 0.5388888716697693
Fold 5/5
Fold 5/5 accuracy: 0.520370364189148


[32m[I 2023-01-25 08:47:51,073][0m Trial 49 finished with value: 0.5311111211776733 and parameters: {'use_conv': True, 'out_channels': 47, 'learning_rate': 0.09997980532176738}. Best is trial 19 with value: 0.6348148584365845.[0m


New trial
Fold 1/5
Fold 1/5 accuracy: 0.614814817905426
Fold 2/5
Fold 2/5 accuracy: 0.664814829826355
Fold 3/5
Fold 3/5 accuracy: 0.6277777552604675
Fold 4/5
Fold 4/5 accuracy: 0.5944444537162781
Fold 5/5
Fold 5/5 accuracy: 0.6129629611968994


[32m[I 2023-01-25 08:58:42,823][0m Trial 50 finished with value: 0.622963011264801 and parameters: {'use_conv': True, 'out_channels': 32, 'learning_rate': 0.015853111650765268}. Best is trial 19 with value: 0.6348148584365845.[0m




--------------------
--------------------
--------------------


Study statistics: 
  Number of finished trials:  51
  Number of pruned trials:  0
  Number of complete trials:  51
Best trial:
  Value:  0.6348148584365845
  Params: 
	use_conv: True
	out_channels: 25
	learning_rate: 0.022887831450837906


And then we train the model with the best hyperparameters on the whole training set and test it on the testing set: ...

In [29]:
# Create model
MODEL: AdvancedNet2 = AdvancedNet2(out_channels=25, use_conv=True).to(DEVICE)

In [30]:
MODEL

AdvancedNet2(
  (conv): Conv2d(3, 25, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=102400, out_features=200, bias=True)
  (fc2): Linear(in_features=200, out_features=3, bias=True)
)

In [31]:
main_train_conv(nn.CrossEntropyLoss(), torch.optim.SGD(MODEL.parameters(), lr=0.022887831450837906))

epoch 1/15, train_loss = 1.20e-01, train_acc = 53.92%, time spent during this epoch = 10.96s, total time spent = 10.96s
epoch 2/15, train_loss = 1.07e-01, train_acc = 60.80%, time spent during this epoch = 10.93s, total time spent = 21.90s
epoch 3/15, train_loss = 1.00e-01, train_acc = 64.20%, time spent during this epoch = 10.93s, total time spent = 32.83s
epoch 4/15, train_loss = 9.43e-02, train_acc = 66.72%, time spent during this epoch = 10.93s, total time spent = 43.77s
epoch 5/15, train_loss = 8.47e-02, train_acc = 71.30%, time spent during this epoch = 10.94s, total time spent = 54.71s
epoch 6/15, train_loss = 7.41e-02, train_acc = 75.26%, time spent during this epoch = 10.94s, total time spent = 65.66s
epoch 7/15, train_loss = 6.42e-02, train_acc = 78.51%, time spent during this epoch = 10.94s, total time spent = 76.60s
epoch 8/15, train_loss = 5.07e-02, train_acc = 83.28%, time spent during this epoch = 10.95s, total time spent = 87.55s
epoch 9/15, train_loss = 3.73e-02, train

In [32]:
print((f"{100*test_our_model_conv():.2f}%"))

62.67%


We are 0.34% better than last time, which is still better than nothing, but there is still a lot of overfitting. It would be interesting to implement a [dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html) for example to avoid this overfitting!

A good value for dropout in a hidden layer is between 0.2 and 0.5.