# Wandb tutorial

Yue Chen, Xin Zheng, and Tatsuo Okubo

2024/07/03

This notebook shows the basic functions of Wandb sweeps. 

## Google Colab preparation

Please run the code below if you are using Google Colab.

There is no need to run the code if you are using a local machine or server.

In [None]:
!git clone https://github.com/CIBR-Okubo-Lab/wandb_tutorial_public.git

In [None]:
!pip install wandb

In [None]:
import sys
sys.path.insert(0,"/content/wandb_tutorial_public")

## Start sweeps tutorial

In [1]:
import datetime
import torch
from utils import create_dataloaders, cnn, train_epoch, eval_epoch
import wandb

## Wandb login
To login to wandb, visit https://wandb.ai/authorize and copy your token

In [10]:
wandb.login(key='PASTE YOUR API KEY HERE')

## Step 1️: Define the search space with a sweep configuration




A sweep will try numerous combinations of hyperparameters with multiple runs. To run a sweep, the first step is to define the search space with a sweep configuration.

A sweep configuration can be defined in a nested dictionary for Jupyter Notbook or a [YAML](https://docs.wandb.ai/guides/sweeps/define-sweep-configuration) file for scripts. In this tutorial, we will use the nested dictionary.


Essential keys in a configuration: 
1. `name` (**optional**): the name of the sweep

1. `method` (**required**): the hyperparameter search strategy. You can choose among `grid`, `random`, and `bayes`.

2. `metric` (**optional**): the metric to monitor. This is optional for grid and random search but required for Bayesian search. It will be shown as the last column of the parallel coordinates chart. 

3. `parameters` (**required**): the hyperparameters to tune. Each hyperparameter can be assigned a list of values to choose from or a distribution to sample from.

### Sample code

In [2]:
# exmple of sweep config
sweep_config = {
    'name': 'random_exp',
    'method': 'random',
    'metric': {
        'name': 'val/acc',
        'goal': 'maximize'  
    },
    'parameters': {
        'hidden_layer_width': {
            'values': [32, 64, 128, 256],
        },
        'dropout_rate': {
            'values': [0.0, 0.2, 0.4, 0.6, 0.8],
        },
        'epochs': {
            'value': 5
        },
        'lr': {
            'distribution': 'log_uniform_values',
            'min': 1e-6,
            'max': 0.1,
        },
        'batch_size': {
            'values': [128, 256, 512, 1024]
        }
    }
    
}

In [1]:
# print configuration 
import pprint
pprint.pprint(sweep_config)

## Quiz 5: Define a sweep configuration

### 5.1 Define a sweep configuration with `name`, `method`, and `metric`

* **name**: Assign a meaningful name, such as "mnist_random".

* **method**: Use `random` search for this exercise. 

* **metric**: Specify the `metric` with `name` being `"val/acc"` and `goal` being `"maximize"`. 

* Defining `metric` will allow wandb to automatically create the following: 

    * a "metric vs. created time" scatter plot

    * a hyperparameter importance plot

    * the last column of the parallel coordinates chart

In [None]:
# Define a sweep configuration
sweep_config = {

}

## 5.2 Update hyperparameter space to search with learning rate. 

* Given the existing hyperparameter configuration, update it to include the learning rate. The key for learning rate is `"lr"`. 
    * `"distribution"`: "log_uniform_values"

    * `"min"`: 1e-6
    
    * `"max"`: 0.1 

* Update the sweep configuration with this hyperparameter search space. The key in sweep_config is "parameters".

You can print out `sweep_config` to verify it matches your expectations. 

In [6]:
# parameter configuration
parameters_dict = {
    'hidden_layer_width': {
        'values': [32, 64, 128, 256],
    },
    'dropout_rate': {
        'values': [0.0, 0.2, 0.4, 0.6, 0.8],
    },
    'epochs': {
        'value': 5
    },
    'batch_size': {
        'values': [128, 256, 512, 1024]
    }
}

In [5]:
''' 
Modify the code below. 
'''
# 1). update with learning rate
parameters_dict['lr'] = {

}

# 2). update sweep config




# Step 2: Initialize the sweep

To initialize a sweep, call `wandb.sweep()` and specify sweep configuration, project name, and entity (if applicable).

A `sweep_id` will be returned and it can be used to `start` or `resume` the sweep. 

### sample code

In [2]:
sweep_id = wandb.sweep(sweep=sweep_config, project="wandb-demo")

## Quiz 6: Initialize a sweep

* Initialize a sweep using the configuration you just defined. Specify the project name. 

* Assign the returned sweep id to a variable named `sweep_id`. 

In [3]:
## initialize a sweep here


# Step 3: Define the training method

In your training code, you need to initialize wandb runs. 

The run configuration will be handled by sweep controller, so there's no need to pass the run configuration, project name, or entity in `wandb.init()`. 

Retrieve the configuration from `wandb.config` to use the hyperparameter values in the training code.

### sample code

In [4]:
def train(config=None):
    
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S')

    # config for specific runs will be handled by Wandb Controller
    wandb.init(name=nowtime)
    config = wandb.config

    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.Adam(model.parameters(),
                                lr=config['lr'])

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_acc= {100 * val_acc:.2f}%")   
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb

    return model

## Quiz 7: Modify the training code to initialize wandb runs

1. Initialize W&B runs using `wandb.init()`and specify the run name with `nowtime` variable. 

2. Retreive the configuration using `wandb.config` and assign it to a variable named `config`.   

In [3]:
def train(config=None):
    
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S')

    # config for specific runs will be handled by Wandb Controller
    '''
    Initialize wandb run here.
    '''


    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_acc= {100 * val_acc:.2f}%")   
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb

    return model

# Step 4: use agent to start the sweep

Use `wandb.agent()` to start the sweep. Specify the following:

* `sweep_id`: sweep id 

* `function`: specify the function you want the agent to run.

* `count`: determine the number of times to run the function.

### sample code

In [5]:
# This will start the sweep.
# wandb.agent(sweep_id=sweep_id, function=train, count=10)

## Quiz 8: Run training code with agent

* Use `wandb.agent()` to start runs in sweep

* Specify `sweep_id`, `function`, and `count`

* If you have access to GPU, set `count` to 20. If you are using CPU, set `count` to 5. If you are using Google Colab, set `count` to 10. 

In [None]:
# start the sweep using agent


## Quiz 9 (Optional): Run sweep with bayes search

In [2]:
sweep_config = {
    'name': 'mnist_bayes',
    'method': 'bayes',
    'metric': {
        'name': 'val/acc',
        'goal': 'maximize'  
    },
    'parameters': {
        'hidden_layer_width': {
            'values': [32, 64, 128, 256],
        },
        'dropout_rate': {
            'values': [0.0, 0.2, 0.4, 0.6, 0.8],
        },
        'epochs': {
            'value': 5
        },
        'lr': {
            'distribution': 'log_uniform_values',
            'min': 1e-6,
            'max': 0.1,
        },
        'batch_size': {
            'values': [128, 256, 512, 1024]
        }
    }
    
}

In [6]:
sweep_id = wandb.sweep(sweep=sweep_config, project="wandb-demo")

In [7]:
def train(config=None):
    
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S')

    # config for specific runs will be handled by wandb controller
    '''
    Here is the answer
    '''
    wandb.init(name=nowtime)
    config = wandb.config

    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.Adam(model.parameters(),
                                lr=config['lr'])

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_acc= {100 * val_acc:.2f}%")   
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb

    return model

In [8]:
wandb.agent(sweep_id, train, count=10)