In [1]:
"""
In this fourth example, we will take a closer look at the model optimization.

At first the optimization might seem a bit complicated, but the core of the optimization is actually rather simple.
What makes it looks complicated is all the reporting and tracking of loss and accuracy.
Try to step through the optimization code and see whether you can understand the various parts of the code.

Here we will briefly talk about some of the design choices that are behind this pipeline.

1) Typically you want to train a model many times and easily be able to compare current runs to previous runs and often on these runs are done on a remote server.
    Our current implementation does not really offer this.
    To get something like this we suggest the usage of some reporting framework like mlflow, tensorboard, or other such tools that automatically logs metrics and hyperparameters and allows easy visualization and comparison through an API that you can access remotely.
    This also allows your entire team to run various models on various computers and compare them all in a central API.

2) Currently, we do not use the validation dataset for anything, but typically you would want to use this to find the best model.

3) In a more realistic pipeline we also need to be able to save and load the model.
    Saving and loading models is something that needs to fit into the overall framework you run your machine learning models in.
    For information on how saving and loading models might be done see:
    https://pytorch.org/tutorials/beginner/saving_loading_models.html
"""

'\nIn this fourth example, we will take a closer look at the model optimization.\n\nAt first the optimization might seem a bit complicated, but the core of the optimization is actually rather simple.\nWhat makes it looks complicated is all the reporting and tracking of loss and accuracy.\nTry to step through the optimization code and see whether you can understand the various parts of the code.\n\nHere we will briefly talk about some of the design choices that are behind this pipeline.\n\n1) Typically you want to train a model many times and easily be able to compare current runs to previous runs and often on these runs are done on a remote server.\n    Our current implementation does not really offer this.\n    To get something like this we suggest the usage of some reporting framework like mlflow, tensorboard, or other such tools that automatically logs metrics and hyperparameters and allows easy visualization and comparison through an API that you can access remotely.\n    This also

In [2]:
from config.unet import Configuration

# load configuration information
configuration = Configuration()


In [3]:
import torch

from src.dataloader import load_data_wrapper
from src.losses import select_loss_fnc
from src.model import model_loader_wrapper
from src.viz import eval_unlabelled_images, plot_loss_and_accuracy


In [4]:
dataloaders = load_data_wrapper(**configuration)
model = model_loader_wrapper(**configuration)
optimizer = torch.optim.Adam(model.parameters(), lr=configuration['lr'])
loss_fnc = select_loss_fnc(**configuration)


unet loaded, containing 31.04M trainable parameters.


In [5]:
from tqdm import tqdm

def optimize_model(model, dataloaders, optimizer, loss_fnc, epochs, **kwargs):
    """
    Simple tracking of model optimization.
    """
    dataset_types = ['train', 'val']
    assert set(dataset_types).issubset(dataloaders), f'dataloaders are expected to contain {dataset_types}'
    losses = {dataset_type: [] for dataset_type in dataset_types}
    accuracies = {dataset_type: [] for dataset_type in dataset_types}

    for epoch in (pbar := tqdm(range(epochs),desc=f"Training...")):
        for dataset_type in dataset_types:
            loss, accuracy = train_or_eval_model(model, dataloaders[dataset_type], optimizer, loss_fnc, is_training=dataset_type=='train')
            losses[dataset_type].append(loss)
            accuracies[dataset_type].append(accuracy)
        pbar.set_postfix(loss_train=losses['train'][-1], loss_val=losses['val'][-1], accuracy_train = accuracies['train'][-1], accuracy_val=accuracies['val'][-1])
    return losses, accuracies

def train_or_eval_model(model, dataloader, optimizer, loss_fnc, is_training):
    """
    Runs a dataloader through a model.
    Supports both training and evaluation mode.
    """
    model.train(is_training)
    torch.set_grad_enabled(is_training)
    loss_agg = 0.0
    true_predictions = 0
    predictions = 0
    assert len(dataloader) > 0, f"The dataloader should contain at least one batch of samples."
    for i, (x,target) in enumerate(dataloader):
        prediction_prob = model(x)
        loss = loss_fnc(prediction_prob,target)
        if is_training:
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        loss_agg += loss.item()
        prediction = torch.argmax(prediction_prob,dim=1)
        true_predictions += (prediction == target).sum().item()
        predictions += target.shape[0]

    accuracy_average = true_predictions / predictions
    loss_average = loss_agg / predictions
    return loss_average, accuracy_average




In [6]:
losses, accuracies = optimize_model(model, dataloaders, optimizer, loss_fnc, **configuration)


NameError: name 'tqdm' is not defined

In [None]:
plot_loss_and_accuracy(losses, accuracies)


# Questions/exercises:

1. As mentioned, the functions in optimizer.py have a lot of reporting information that are not essential for the training of a model.
    Try to make a copy of the two functions in optimizer.py and remove all the reporting information and make the code as simple as possible.
    Can you train your model with these new functions?

2. At each epoch of the optimization, the model goes through the training dataset followed by the validation dataset.
    As previously stated, the purpose of the validation dataset is to help find the best model, how should the validation dataset help with this? (How would you incorporate that into your optimization function?)
3. How does the validation dataset differ from the test dataset?
4. How come the validation loss is lower than the training loss?