![Banner](img/AI_Special_Program_Banner.jpg)

## Hyperparameter Optimization (HPO) w/ Optuna - Exercise
---
Instructions are given in <span style="color:blue">blue</span> color.

For this exercise, your main concern will be the optimization of hyperparameters for a machine learning model.
For this, you are going back to the ```CIFAR-10``` data set.

This exercise combines many of the ideas and methodologies you encountered during the last couple of days. As a result, this notebook contains a base CNN implementation utilizing ```PyTorch``` as well as transfer learning to classify the images.

If you fully execute this notebook, you will see that the pre-selected parameters are not well suited to come up with a good enough classifier when trained for 5 epochs only. What is the best accuracy you can come up with when limited to those 5 epochs?

* <div style="color:blue">Alter the existing code so that it utilizes <code>Optuna</code> for automated hyperparameter tuning.</div>
* <div style="color:blue">You should work with all parameters defined below. However, if you think more parameters need tuning, feel free to do so.</div>
* <div style="color:blue">Your code should contain the possibility to prune unpromising trials automatically.</div>
* <div style="color:blue">Keep in mind that machine learning often takes quite some time for training (even when GPU support is enabled). Try to run a meaningful number of trials and grab a coffee if needed.</div>
* <div style="color:blue">What are your best parameters and how well do they perform? Can you achieve more than 85% accuracy after 5 epochs?</div>

In [None]:
import random
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR

from tqdm import tqdm
import matplotlib.pyplot as plt


import torchvision
import torchvision.transforms as transforms

from torchvision import models

In [None]:
# Fixed settings (please don't change)
EPOCHS = 5
torch.manual_seed(42)
random.seed(42)
RANDOM_SEED = 42

In [None]:
# Hyperparameters
DROPOUT_RATE = 0.5
BATCH_SIZE = 32
OPTIMIZER = 'SGD' # SGD vs. ADAM vs. RMSprop
LEARNING_RATE = 0.1
SCHEDULER = False

In [None]:
# CIFAR-10 is directly available via torchvision
def get_data():
    transform = transforms.Compose([transforms.ToTensor()])

    trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4)

    testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4)
    
    return trainloader, testloader

In [None]:
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [None]:
# Enabling GPU support
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

In [None]:
def get_model():
    model = models.resnet50(pretrained=True)
    num_features = model.fc.in_features
    # Replacing final layer of pre-trained model with dropout layer and custom fully connected layer
    model.fc = nn.Sequential(nn.Dropout(DROPOUT_RATE), nn.Linear(num_features, len(classes)))
    model.to(device)
    return model

In [None]:
def get_optimizer(model):
    if OPTIMIZER == 'Adam':
        return torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
    elif OPTIMIZER == 'SGD':
        return torch.optim.SGD(model.parameters(), lr=LEARNING_RATE)
    elif OPTIMIZER == 'RMSprop':
        return torch.optim.RMSprop(model.parameters(), lr=LEARNING_RATE)

In [None]:
def train(model, training_batches, testing_batches):
    accuracy = list()
    criterion = nn.CrossEntropyLoss()
    optimizer = get_optimizer(model)
    
    if SCHEDULER:
        scheduler = CosineAnnealingLR(optimizer, EPOCHS-1, verbose=False)
        
    # training
    for epoch in range(EPOCHS):
        model.train()
        training_loop = tqdm(training_batches)
        for images, labels in training_loop:
            images = images.to(device, non_blocking=True)
            labels = labels.to(device, non_blocking=True)
        
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
        if SCHEDULER:
            scheduler.step()
        
        # testing
        num_samples = 0
        correct_predictions = 0 
        model.eval()
        with torch.no_grad():
            for images, labels in testing_batches:
                images = images.to(device, non_blocking=True)
                labels = labels.to(device, non_blocking=True)
            
                outputs = model(images)
                num_samples += labels.size(0)
                correct_predictions += (outputs.argmax(dim=1) == labels).sum().item()
    
        accuracy.append(100.0 * correct_predictions / num_samples)
    
    return accuracy

In [None]:
model = get_model()
training_batches, testing_batches = get_data()
history = train(model, training_batches, testing_batches)

In [None]:
# last epoch's testing accuracy %
history[-1]

In [None]:
# Optional plot for a single training session (not needed when optimizing parameters)
plt.plot(history)
plt.ylabel('validation accuracy %')
plt.xlabel('epoch index')
plt.grid()