# Hyperparameters optimisation
## TD 5

Data is here, you can download Food-3 or Food-3-big if you want more data (https://drive.google.com/drive/u/1/folders/1kMhMH5pi_jJNgwNPpy1epXsWCI4Hqsve).

We are essentially going to use the same `Food101` ([credit where it's due](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/)) data, the same object `ImageDataset`, the same `DataLoader`.

The code below is mainly a copy of the code from the previous TD, except that global variables are now defined separately and everything is wrapped in different functions. This is to make it easier to train the same model with different hyperparameters and architectures, etc ...

For those that can use their GPUs or are on the DCE, all the necessary `.to(device)` are already in the code.

If, for some reason, you encounter this error: `OutOfMemoryError: CUDA out of memory.`, it means that your GPU does not have enough memory to run the model. You can try to reduce the batch size, or the number of neurons in the network, or the number of layers in the network, or the number of filters in the convolutional layers, etc ...

In [None]:
# Imports
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import pathlib
import time
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Set the random seed for reproducibility
_ = torch.manual_seed(25)

In [None]:
from functools import partial
print = partial(print, flush=True)

In [None]:
# Global variables

# Setup device-agnostic code
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {DEVICE} device")

# Batch size
BATCH_SIZE = 512

# Learning rate
LEARNING_RATE = 2e-2

# Number of epochs
NUM_EPOCHS = 20

# Number of classes
NUM_CLASSES = 3

# Image size
IMAGE_SIZE = 128

In [None]:
def get_datasets_and_dataloaders(
    batch_size: int = 4
) -> tuple[
    DataLoader, 
    DataLoader
]:
    """
    Load the training and test datasets into data loaders.
    """
    data_dir = pathlib.Path(".")
    train_dir = data_dir / "Food-3" / "train"
    test_dir = data_dir / "Food-3" / "test"

    data_transform_train = transforms.Compose(
        [
            transforms.Resize(size=(IMAGE_SIZE, IMAGE_SIZE)),  # Resize the images to 128x128
            transforms.ToTensor(),  # Convert the images to tensors
            transforms.RandomHorizontalFlip(p=0.5),  # Flip the images horizontally with probability 0.5
        ]
    )
    data_transform_test = transforms.Compose(
        [
            transforms.Resize(size=(IMAGE_SIZE, IMAGE_SIZE)),  # Resize the images to 128x128
            transforms.ToTensor(),  # Convert the images to tensors
        ]
    )

    train_data = datasets.ImageFolder(
        root=str(train_dir),  # target folder of images
        transform=data_transform_train,  # transforms to perform on data (images)
        target_transform=None  # transforms to perform on labels (if necessary)
    ) 

    test_data = datasets.ImageFolder(
        root=str(test_dir),
        transform=data_transform_test
    )

    train_dataloader = DataLoader(
        dataset=train_data,
        batch_size=batch_size,  # how many samples per batch?
        shuffle=True  # shuffle the data?
    )

    test_dataloader = DataLoader(
        dataset=test_data,
        batch_size=batch_size,
        shuffle=False
    ) # don't usually need to shuffle testing data


    return train_dataloader, test_dataloader

In [None]:
# Load dataloaders in global variables
TRAIN_DATALOADER, TEST_DATALOADER = get_datasets_and_dataloaders(BATCH_SIZE)

In [None]:
# How can we get the datasets? Did we lose them? No
print(TRAIN_DATALOADER.dataset)
print("---")
print(TEST_DATALOADER.dataset)

In [None]:
class Net(nn.Module):
    def __init__(
        self,
        hidden_units: int = 200,
        batch_norm: bool = False,
        use_conv: bool = False
    ):
        

    def forward(self, x):
        

---

Why does it not work with `x = x.view(BATCH_SIZE, -1)`?

---

In [None]:
# Create model
MODEL: Net = Net(hidden_units=200, batch_norm=False, use_conv=False).to(DEVICE)

In [None]:
def test_our_model() -> float:
    

In [None]:
# Test our untrained model
print((f"{100*test_our_model():.2f}%"))

You should get 34.31% accuracy on the testing set without training and with the default hyperparameters if you used the same seed. Close enough to 33% which is the expected accuracy for a random classifier.

We can plot some images like last time

In [None]:
# Get the tensors and put them in the right dimensions for matplotlib
my_pizza = TRAIN_DATALOADER.dataset[0][0]
my_label = TRAIN_DATALOADER.dataset[0][1]
print(my_pizza.shape)
my_pizza_reshaped = my_pizza.permute(1, 2, 0)
print(my_pizza_reshaped.shape)

# Plot the image
plt.figure(figsize=(10, 7))
plt.imshow(my_pizza_reshaped)
plt.axis("off")
plt.title(TRAIN_DATALOADER.dataset.classes[my_label], fontsize=14)

Let's do the training loop now

In [None]:
def main_train(loss_fn, optimizer) -> None:
    """
    Train the model and modifies the trained model inplace.
    """
    

In [None]:
main_train(nn.CrossEntropyLoss(), torch.optim.SGD(MODEL.parameters(), lr=LEARNING_RATE))

In [None]:
print((f"{100*test_our_model():.2f}%"))

You should get 58.17% accuracy on the testing set without training and with the default hyperparameters if you used the same seed. And we definitely reached convergence (the loss is not decreasing that much anymore, and if you try to train for more epochs, you will see that the testing set accuracy will decrease). Note that by saying "if you try to train for more epochs, you will see that the testing set accuracy will decrease", we kind of cheated by using the testing set to infer an information about the number of epochs, we should instead use validation sets and cross validation techniques ... and we will (today)! No worries.

How could that help? How is it a trade off?
```py
class CachedImageFolder(datasets.ImageFolder):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.cache = {}

    def __getitem__(self, index):
        if index not in self.cache:
            self.cache[index] = super().__getitem__(index)
        return self.cache[index]
```

We're losing something by doing that (unrelated to the trade-off mentioned above), what is it? How could we fix it?

-----

Is it possible for `train_loss` to decrease whilst `train_acc` decreases at the same time? Look at what happens between epochs 14 and 15 in this example (it's a real run, different seed though):
```py
epoch 14/15, train_loss = 1.01e-01, train_acc = 63.61%, time spent ...
epoch 15/15, train_loss = 9.91e-02, train_acc = 63.50%, time spent ...
```

Why is that?

-----

## Let's try to improve this accuracy!

You will need to install the Optuna package (`pip install optuna`) and import it at the beginning of your script (no need if you're using the shared environment of the DCE, we installed it for you already). We should also import `KFold` from `sklearn.model_selection`. This is because we will use cross-validation to find the best hyperparameters.

In [None]:
import optuna
from sklearn.model_selection import KFold

First easy task is to decide how many neurons there should be in the hidden layer.
 
We will do this together (optimising the number of hidden nuerons), and then you'll have to implement optimization of the learning rate*, the optimizer's choice on your own. We will also show you how to choose between a convolutional and dense network.

\**Careful! Small learning rates are not always better, especially if you do not change the number of epochs. You should try to find the best learning rate for the number of epochs you chose, one that is not too big for your computer to handle.*

We will need to define a new function that will be used as the objective function for Optuna's optimization. This function should take in the `trial` object from Optuna as an argument and use the `trial` object to define and sample the hyperparameters that you want to optimize. For example, you can use the `trial` object to sample a choice between a convolutional and dense network, and to sample the number of neurons for the chosen network. After training the model, we will need to return the final validation accuracy calculated with cross-validation* as the objective function value for Optuna to maximise.

\*We use cross-validation here (5-fold) because we want to use the testing set as little as possible. We will use the testing set only once, at the end, to get the final accuracy of the best model. But, cross-validation greatly increases the time required to run the algorithms, so we won't always use cross-validation to optimize hyperparameters.

In [None]:
def objective(trial: optuna.trial.Trial) -> float:
    

Finally, we will need to call the `optuna.create_study()` function to create a new study, and use the `study.optimize()` function to run the optimization, passing the objective function that we defined earlier.

You can find more information about how to use Optuna in the [Optuna documentation](https://optuna.readthedocs.io/en/stable/index.html).

In [None]:
study = optuna.create_study(direction="maximize")
study.optimize(objective, timeout=1200, n_trials=15) 
# - timeout = 1200 -> stops after 20 minutes;
# - n_trials = 5 -> tries 5 different values for the hyperparameter.

pruned_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
complete_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]

print("\n")
print("--------------------")
print("--------------------")
print("--------------------")
print("\n")
print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print(f"\t{key}: {value}")

In [None]:
import matplotlib.pyplot as plt

hyperparam_values = [t.params['hidden_units'] for t in complete_trials]
accuracies = [t.value for t in complete_trials]

plt.figure(figsize=(6, 4))
plt.scatter(hyperparam_values, accuracies)
plt.title('Hyperparameter Optimization Results')
plt.xlabel('Value of hidden_units')
plt.ylabel('Accuracy')
plt.grid(True)
plt.show()

More neurons on the hidden layers is better (when considering the range 5-50). It makes sense!

Few of you might have a problem: we've only allowed 15 trials, but `Optuna` tried twice the same trial (unlikely if you set a big range but likely if you only allowed the range [5; 8] for example. An example: https://i.imgur.com/DceMhuf.png). This is because `Optuna` doesn't check if it already has used the previous set of hyperparameters. To fix this, we can add the following code:

```py
from optuna.trial import TrialState

def objective(trial: optuna.trial.Trial) -> float:
    for previous_trial in trial.study.trials:
        if previous_trial.state == TrialState.COMPLETE and trial.params == previous_trial.params:
            print(f"Duplicated trial: {trial.params}, return {previous_trial.value}")
            return previous_trial.value
    ...
...
```

And even setting n_trials to 5000, we won't have optuna running two "experiments" with the same hyperparameters.

Let's add this and let's also optimize on the learning rate, the number of epochs, the use of batch normalisation layers, and the optimizer's choice.

We also add some manual pruning.

In [None]:
from optuna.exceptions import TrialPruned
from optuna.trial import TrialState

def objective(trial: optuna.trial.Trial) -> float:
    

In [None]:
study = optuna.create_study(direction="maximize")
study.optimize(objective, timeout=3600, n_trials=20) 
# - timeout = 3600 -> stops after 60 minutes;
# - n_trials = 20 -> tries 20 different values for the hyperparameter.

pruned_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
complete_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]

print("\n")
print("--------------------")
print("--------------------")
print("--------------------")
print("\n")
print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print(f"\t{key}: {value}")

Let's now train the whole model with the optimal hyperparameters that we found with Optuna. We will use the `study.best_params` attribute to get the best hyperparameters. You need to re-train on the whole training dataset!!! Otherwise, you will not get the best accuracy as you're leaving out some data.

Our performance went up by more than 10%! (Only 20 trials of hyperparameter optimisation)