(using_sweeps)=
# Hyperparameter sweeps with Weights&Biases

Many parameters can affect whether a neural network can learn a task and whether its predictions generalize well to an entirely new test dataset. Some of these parameters depend on the problem at hand and the data, both of which you cannot change. For example, some learning rates might work better than others. 

In machine learning we use a strategy in which we split up the data into training, validation, and test data sets and run hyperparameter sweeps in which we try to find the best hyperparameters such as learning rate, batch-size, number of layers etc. pp.

This multifactorial search can be tedious and it is easy to get lost. The library weights and biases helps you with that.

Remember, our goal is to train a model that a) can learn the task from the training data and that b) generalizes well, i.e. performs well on new data that have not been used during training.


In [None]:
!git clone https://github.com/PhilippS893/delphi
import sys
sys.path.insert(1, './delphi')

!pip install torchinfo 
!pip install pyyaml
!pip install -U PyYAML
# install weights and biases, a library that helps us tracking the results from 
# parameter sweeps
!pip install wandb -qU

In [1]:
import os
import torch
import numpy as np
from delphi.utils.train_fns import standard_train
from delphi.networks.LinearNets import SimpleLinearModel
from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import ToTensor
from torchvision.datasets import MNIST

# NEW IMPORTANT IMPORTS
from delphi.utils.tools import compute_accuracy, convert_wandb_config, read_config
import wandb

# this variable contains information whether a GPU can be used for training. If not, we automatically use the CPU.
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
# Log in to your W&B account
import wandb
wandb.login()

Setting all random seeds for reproducibility.

In [2]:
# set the random seed for reproducibility
def set_random_seed(seed):
    import random 
    
    torch.manual_seed(seed)
    random.seed(seed)
    np.random.seed(seed)
    g = torch.Generator() # can be used in pytorch dataloaders for reproducible sample selection when shuffle=True
    g.manual_seed(seed)
    
    return g

g = set_random_seed(42)

Let us already load the MNIST data such that we do not forget it before running the parameter sweep :).

In [3]:
# get the MNIST dataset
mnist_train = MNIST('./data/', train=True, download=True if not os.path.exists('./data/MNIST') else False, transform=ToTensor())
mnist_test = MNIST('./data/', train=False, download=False, transform=ToTensor())

# create the dataloaders
dl_train = DataLoader(mnist_train, batch_size=256, shuffle=True, generator=g)
dl_test = DataLoader(mnist_test, batch_size=256, shuffle=True, generator=g)

## What are hyperparameters?

A __hyperparameter__ is a variable or setting that controls the learning process. Hyperparameters are "known" before the learning process begins and are not changed during training. They should never be confused with the __model parameters__ (e.g., weights and biases). 

Typical hyperparameters are the following:
* the learning rate
* the number of hidden layers 
* the number of neurons per hidden layer
* the kernel size per convolutional layer
* the number of channels per convolutional layer
* the cost function
* the optimization algorithm
* the activation function
* the number of epochs we use for training
* the batch size
* even the ratio of train/validation splits
* etc.

I am sure you get the idea that anything that changes the way a network learns is considered a hyperaparameter. Now, depending on how many hyperparameters we use the searchspace of the optimal parameter settings simply explodes. It is thus pretty much impossible to set these parameters manually. Thus we need some help in determining what the optimal parameters in our given searchspace are.

This is where the [weights&biases](www.wandb.ai) (wandb) package comes into play. 

## Using weights&biases (wandb)

What you will see in this jupyterbook is quite condensed and in certain cases you may need additional information that we do not provide here yet. Thus you can check out the official documention of hyperparameter sweeps with wandb [here](https://docs.wandb.ai/guides/sweeps). 

How we use wandb (adapted from [docs](https://docs.wandb.ai/guides/sweeps)):
1. Write config: Define the variables and ranges to sweep over and determine the search strategy. Wandb offers a few options:
    * grid: run all possible combinations
    * random: randomly choose a user supplied __n__ number of parameter combinations
    * and Bayesian search
2. Initialize the search: Wandb hosts a controller and coordinates between the agent(s) that execute the sweep. They can be local or distributed.
3. Launch agent(s): If we wanted to use multiple computers, we could use the same command to execute one training process with a selected parameter combination. The agent(s) ask(s) the sweep server what hyperparameter combination to try next, and then they execute the runs.
4. We visualize the results: we can do this locally on our computers or we can use the wandb platform to do so.

Let's get into it then.

First, we need to set ourselfs a goal. Let's say we want to use a ```SimpleLinearModel``` to classify handwritten digits. We saw that the default settings of the ```SimpleLinearModel``` worked quite well but we cannot be sure that those parameters yield the best results. We therefore decided to do a parameter search over the number of neurons per layer, the learning rate, and the number of epochs. 

There are two ways in which we can do it.
1. Defining a python ```dict```
2. Loading a .yaml file

In the cells below you see how both ways work.

### Setting up a sweep config with a python dict

In [4]:
sweep_config = {
    "name": "linear-mnist-sweep",
    "method": "random",
    "metric": {
        "name": "test_acc"
    },
    "parameters": {
        "lin_neurons1": {
            "values": [512, 256, 128, 64, 32, 16, 8] # the possible values for the first linear layer
        },
        "lin_neurons2": {
            "values": [512, 256, 128, 64, 32, 16, 8] # the possible values for the second linear layer
        },
        "lin_neurons3": {
            "values": [512, 256, 128, 64, 32, 16, 8] # the possible values for the third linear layer
        },
        "learning_rate": { # describes the range of possible values for the learning rate
            "min": .0001,
            "max": .1
        },
        "epochs": {
            "values": [5, 10, 20, 30] # the possible values for how many epochs to train the network
        }
    }
}

### Setting up a sweep config using a yaml file

In [5]:
sweep_config = read_config("delphi/mnist_sweep_config.yaml")
print(sweep_config)

{'name': 'linear-mnist-sweep', 'method': 'random', 'metric': {'name': 'test_acc'}, 'parameters': {'lin_neurons1': {'values': [512, 256, 128, 64, 32, 16, 8]}, 'lin_neurons2': {'values': [512, 256, 128, 64, 32, 16, 8]}, 'lin_neurons3': {'values': [512, 256, 128, 64, 32, 16, 8]}, 'learning_rate': {'min': 0.0001, 'max': 0.1}, 'epochs': {'values': [5, 10, 20, 30]}}}


You probably noticed that this ```dict``` or config is not in the same format as the ```dict``` or config we need to configure our neural networks. 
Unfortunately, [wandb](www.wandb.ai) does not yet support nested values in hyperparameter searches (at least not to my knowledge). But do not be alarmed, I took care of this issue for now by writing a converter method called ```convert_wandb_config```. You can find it in the ```_core.utils.tools``` package. 

You will see this function in action in the sections below.

## Setting up the sweep

We are almost there.

Wandb also requires you to set a function for your agents to call. At least in a jupyternotebook like this one it does.

What you will find in the next code sections are two new functions: ```train_net()``` and ```run_train()```

We will now look in detail what each of them does:

In the new ```train_net()``` function I defined below you should notice, that this function works now for any network you supply to it. Additionally, there is something new in there: the ```wandb.log()``` function which takes a dict with loss and accuracy scores as arguments. This function is part of the weights&biases package and logs and creates plots from the values we supply to it in real-time. 

In [6]:
def train_net(model, n_epochs, lr, logwandb=True):

    # loop for the above set number of epochs
    for epoch in range(0, n_epochs):

        # THIS IS WHERE THE MAGIC HAPPENS
        # calling the model.fit() function will execute the 'standard_train' function as defined above.
        train_loss, train_stats = model.fit(dl_train, lr=lr, device=DEVICE)
        train_acc = compute_accuracy(train_stats[:, -1], train_stats[:, -2])

        # for validating or testing set the network into evaluation mode such that layers like dropout are not active
        with torch.no_grad():
            test_loss, test_stats = model.fit(dl_test, device=DEVICE, train=False)
            test_acc = compute_accuracy(test_stats[:, -2], test_stats[:, -1])

        print('epoch=%03d, train_loss=%1.3f, train_acc=%1.3f, test_loss=%1.3f, test_acc=%1.3f' % 
             (epoch, train_loss, train_acc, test_loss, test_acc))

        # LOG PARAMETERS WITH WANDB
        # Please keep in mind that the code below might be better placed somewhere else
        # in case you want to use this function without weights and biases or use the
        # logwandb flag like here
        if logwandb:
            wandb.log({
                "train_loss": train_loss,
                "train_acc": train_acc,
                "test_loss": test_loss,
                "test_acc": test_acc,
            })

The ```run_train()``` function defined below might seem a bit redundant. Even though the ```train_net()``` function implemented above could be adapted with all the code below, it is best to separate as much functionality as much as possible. The way I programmed it now allows me to use the ```train_net()``` function in many different approaches. Whereas the ```run_train()``` function is currently specific for the ```SimpleLinearModel``` class. It is also the function I supply to the sweep-agents.

In [7]:
# define the training function with the wandb init
def run_train():
    
    # here we initialize weights&biases. 
    with wandb.init() as run:
        
        #Within this context we have access to the parameters the agent chose.
        #It would look something like this:
        #wandb.config.epochs = 5
        #wandb.config.lin_neurons1 = 512
        #wandb.config.lin_neurons2 = 8
        #wandb.config.lin_neurons3 = 128
        #wandb.config.learning_rate = 0.00791742
        
        # here's the promised conversion of the wandb.config
        # this results into a dict that contains key-value pairs that we can use to configure our network:
        # converted_config['lin_neurons'] = [512, 8, 128]
        converted_config = convert_wandb_config(wandb.config, SimpleLinearModel._REQUIRED_PARAMS)
        
        model = SimpleLinearModel(784, 10, converted_config)
        
        # We do not necessarily need this line but it is nice to update the config.
        wandb.config.update(model.config, allow_val_change=True)
        
        # now train the netwok, yay!
        train_net(model, wandb.config.epochs, wandb.config.learning_rate)


It is now time to create the sweep and thus the central controller:

In [8]:
# set the wandb sweep config
#os.environ['WANDB_MODE'] = 'offline'
os.environ['WANDB_ENTITY'] = "philis893" # this is my wandb account name. This can also be a group name, for example
os.environ['WANDB_PROJECT'] = "test-jupytersweep" # this is simply the project name where we want to store the sweep logs and plots
sweep_id = wandb.sweep(sweep_config)

Already ran. Skipping to save time.


In [9]:
count = 20
wandb.agent(sweep_id, function=run_train, count=count)

Already ran.


Checkout: https://wandb.ai/philis893/test-jupytersweep/sweeps/ybwt9udl?workspace=user-philis893

## Exercises

Try running a hyperparameter sweep over different hyperparameters for the ```Simple2dCnnClassifier```.