<a href="https://colab.research.google.com/github/JayThibs/effective-ml-mini-projects/blob/main/hyperparameter_sweeps_using_wandb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to do a Hyperparameter Sweep in Weights and Biases

A hyperparameter sweep is used when we want to do hyperparameter tuning of our machine learning model. Weights and Biases allows uses to do sweeps fairly easily and creates some beautiful graphs to see how well our model did after training it with different sets of hyperparameters.

# An Overview of Sweeps

There a 3 steps to running a sweep with Weights and Biases:

1. **Define the sweep:** create a dictionary or YAML file that specifies the parameters to search through, the search strategy, the optimization strategy, etc.

2. **Initialize the sweep:** initialize the sweep and pass the dictionary of sweep configurations with: `sweep_id = wandb.sweep(sweep_config)`.

3. **Run the sweep agent:** call `wandb.agent(sweep_id, function=train)`, where `function` defines the model architecture and trains it.

If you decide the create a YAML file for your sweep, it should look something like this for a **deep learning** model:

    # sweep.yaml
    program: train.py
    method: random
    metric:
     name: val_loss
     goal: minimize
    parameters:
     learning-rate:
       min: 0.00001
       max: 0.1
     optimizer:
       values: ["adam", "sgd"]
     hidden_layer_size:
       values: [96, 128, 148]
     epochs:
       value: 27
    early_terminate:
       type: hyperband
       s: 2
       eta: 3
       max_iter: 27

And if you use a python dictionary, it should look like this for an xgboost model:

    sweep_config = {
        "method": "random", # try grid or random
        "metric": {
          "name": "accuracy",
          "goal": "maximize"   
        },
        "parameters": {
            "booster": {
                "values": ["gbtree","gblinear"]
            },
            "max_depth": {
                "values": [3, 6, 9, 12]
            },
            "learning_rate": {
                "values": [0.1, 0.05, 0.2]
            },
            "subsample": {
                "values": [1, 0.5, 0.3]
            }
        }
    }

If you want to run a sweep from the command-line, you can run the following commands:

1. **Setup a new sweep:** `wandb sweep sweep.yaml` which creates your sweep, and returns both a unique identifier (SWEEP_ID) and a URL to track all your runs.

2. **Launch the sweep:** `wandb agent SWEEP_ID`, this will start the hyperparameter sweep and return the URL where you can track the sweep's progress. You can also launch multiple agents (GPUs / CPUs) concurrently. Each of these agents will fetch parameters from the W&B server and use them to train the next model.

Documentation on sweeps can be found here: https://docs.wandb.ai/guides/sweeps/quickstart

**Tips:** 

1. You are probably going to end up doing more than one hyperparameter sweep, so start out broad and then hone in on the hyperparameter space with the best performance for your next sweeps.

2. Try to use log ditributed sweeps (especially for batch size, learning rate, and hidden layer size). Instead of doing sweeps uniformly between every value in 1 to 1000, you can try every order of magnitude (1, 10, 100, 1000). For example, `q_log_uniform` will try different orders of magnitude with equal probability.

Now, let's get started with code!

## Setup





In [1]:
!pip install wandb --upgrade

import wandb

wandb.login()

Collecting wandb
  Downloading wandb-0.12.0-py2.py3-none-any.whl (1.6 MB)
[?25l[K     |▏                               | 10 kB 33.7 MB/s eta 0:00:01[K     |▍                               | 20 kB 35.7 MB/s eta 0:00:01[K     |▋                               | 30 kB 22.7 MB/s eta 0:00:01[K     |▉                               | 40 kB 18.7 MB/s eta 0:00:01[K     |█                               | 51 kB 8.2 MB/s eta 0:00:01[K     |█▏                              | 61 kB 8.3 MB/s eta 0:00:01[K     |█▍                              | 71 kB 9.4 MB/s eta 0:00:01[K     |█▋                              | 81 kB 9.7 MB/s eta 0:00:01[K     |█▉                              | 92 kB 10.0 MB/s eta 0:00:01[K     |██                              | 102 kB 7.9 MB/s eta 0:00:01[K     |██▏                             | 112 kB 7.9 MB/s eta 0:00:01[K     |██▍                             | 122 kB 7.9 MB/s eta 0:00:01[K     |██▋                             | 133 kB 7.9 MB/s eta 0:00:01

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize


wandb: Paste an API key from your profile and hit enter: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

# Step 1. Define the Sweep

Fundamentally, a Sweep combines a strategy for trying out a bunch of hyperparameter values with the code that evaluates them. We configure it with different hyperparameters and a method like bayesian optimization (`bayes`) or `random` search.

Usually we choose either random search or bayesian search. Bayesian is great, but it does not do so well when you have a ton of hyperparameters (scales poorly as a function of number of hyperparameters).

In [2]:
import math

sweep_config = {
    'method': 'random',
    'metric': {
    'name': 'loss',
    'goal': 'minimize'
    },
    'parameters': {
    'optimizer': {
        'values': ['adam', 'sgd']
    },
    'fc_layer_size': {
        'values': [128, 256, 512]
    },
    'dropout': {
        'values': [0.3, 0.4, 0.5]
    },
    'epochs': {
        'value': 1
    },
    'learning_rate': {
        # a flat distribution between 0 and 0.1
        'distribution': 'uniform',
        'min': 0,
        'max': 0.1
    },
    'batch_size': {
        # integers between 32 and 256
        # with evenly-distributed logarithms
        'distribution': 'q_log_uniform', # Quantized log uniform. Returns round(X, q) 
                                         # allows you to try every order of magnitude uniformly
        'q': 1,
        'min': math.log(32),
        'max': math.log(256),
    }
  }
}

# We can also include hyperband for early stopping

In [3]:
import pprint

pprint.pprint(sweep_config)

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'loss'},
 'parameters': {'batch_size': {'distribution': 'q_log_uniform',
                               'max': 5.545177444479562,
                               'min': 3.4657359027997265,
                               'q': 1},
                'dropout': {'values': [0.3, 0.4, 0.5]},
                'epochs': {'value': 1},
                'fc_layer_size': {'values': [128, 256, 512]},
                'learning_rate': {'distribution': 'uniform',
                                  'max': 0.1,
                                  'min': 0},
                'optimizer': {'values': ['adam', 'sgd']}}}


# Step 2. Initialize the Sweep

Weights and Biases has something called the Sweep Controller that handles the Sweep and issues a new set of instructions describing a new run to execute locally on our machines.

In [4]:
sweep_id = wandb.sweep(sweep_config, project='pytorch-sweeps-demo')

Create sweep with ID: 1an35jw5
Sweep URL: https://wandb.ai/jacquesthibs/pytorch-sweeps-demo/sweeps/1an35jw5


# Step 3. Run the Sweep agent

Before we can actually execute the sweep,
we need to define the training procedure that uses those values.

In the functions below, we define a simple fully-connected neural network in PyTorch, and add the following `wandb` tools to log model metrics, visualize performance and output and track our experiments:
* [**`wandb.init()`**](https://docs.wandb.com/library/init) – Initialize a new W&B Run. Each Run is a single execution of the training function.
* [**`wandb.config`**](https://docs.wandb.com/library/config) – Save all your hyperparameters in a configuration object so they can be logged. Read more about how to use `wandb.config` [here](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-config/Configs_in_W%26B.ipynb).
* [**`wandb.log()`**](https://docs.wandb.com/library/log) – log model behavior to W&B. Here, we just log the performance; see [this Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-log/Log_(Almost)_Anything_with_W%26B_Media.ipynb) for all the other rich media that can be logged with `wandb.log`.

For more details on instrumenting W&B with PyTorch, see [this Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Simple_PyTorch_Integration.ipynb).

In [6]:
import torch
import torch.optim as optim
import torch.nn.functional as F
import torch.nn as nn
from torchvision import datasets, transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):
      # If called by wandb.agent, as below,
      # this config will be set by Sweep Controller
      config = wandb.config

      loader = build_dataset(config.batch_size)
      network = build_network(config.fc_layer_size, config.dropout)
      optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

      for epoch in range(config.epochs):
          avg_loss = train_epoch(network, loader, optimizer)
          wandb.log({"loss": avg_loss, "epoch": epoch})

This cell defines the four pieces of our training procedure:
`build_dataset`, `build_network`, `build_optimizer`, and `train_epoch`. These are all part of the standard PyTorch pipeline.

In [8]:
def build_dataset(batch_size): # input is in sweep config

    transform = transforms.Compose(
        [transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081))]
    )
    # download MNIST training dataset
    dataset = datasets.MNIST('.', train=True, download=True,
                            transform=transform)
    sub_dataset = torch.utils.data.Subset(
        dataset, indices=range(0, len(dataset), 5)
    )
    loader = torch.utils.data.DataLoader(sub_dataset, batch_size=batch_size)

    return loader

def build_network(fc_layer_size, dropout): # inputs are in sweep config
    network = nn.Sequential(
        nn.Flatten(), # flatten image
        nn.Linear(784, fc_layer_size), nn.ReLU(), # fully-connected layer with activation function
        nn.Dropout(dropout),
        nn.Linear(fc_layer_size, 10),
        nn.LogSoftmax(dim=1)
    )

    return network.to(device)

def build_optimizer(network, optimizer, learning_rate):
    if optmizer == "sgd":
        optimizer = optim.SGD(network.parameters(),
                              lr=learning_rate, momentum=0.9)
    elif optimizer == "adam":
        optimizer = optim.Adam(network.parameters(),
                               lr=learning_rate)
        
    return optimizer

def train_epoch(network, loader, optimizer):
    cumu_loss = 0
    for _, (data, target) in enumerate(loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()

        # Forward pass
        loss = F.nll_loss(network(data), target)
        cumu_loss += loss.item()

        # Backward pass + weight update
        loss.backward()
        optimizer.step()

        wandb.log({"batch loss": loss.item})

    return cumu_loss / len(loader)