# Wandb tutorial (with answers)

Yue Chen, Xin Zheng, and Tatsuo Okubo

2024/06/26

This is the notebook for showing basic functions of `wandb`

## Step 0: Preparation (terminal)

We need to install `wandb`, register an account and log in.

### Install

Install `wandb` using `pip`.

In [None]:
!pip install wandb --quiet

### Login to wandb account

- Visit the [wandb website](https://wandb.ai/) to sign up for an account
- Get the [API key](https://wandb.ai/authorize) and copy it to the clipboard
- Login (paste the API key)

In [None]:
!wandb login

If we want to change an account, please force relogin

In [None]:
# !wandb login --relogin

## Step 1: Initialize wandb and record configurations

### Experiment example

Below is an example PyTorch training code on MNIST dataset without `wandb`.

In [None]:
import datetime
import torch
from utils import create_dataloaders, cnn, train_epoch, eval_epoch, show_cases  # utility functions for the tutorial

# Settings
config = {'batch_size': 512,
          'hidden_layer_width': 64, 
          'dropout_rate': 0.2,
          'lr': 1e-4,
          'optimizer': 'Adam',
          'epochs': 20}

def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")
    return model

### Initialize run

We need to use [`wandb.init()`](https://docs.wandb.ai/ref/python/init) method to start a new run to track and log to W&B  

During this step, we can specify project, group, name etc., we can also save the configurations.

We can also control wandb with [environment variables](https://docs.wandb.ai/guides/track/environment-variables).

In [None]:
# import os
# os.environ["WANDB_ENTITY"] = 'okubo-lab-org'
import wandb

At the beginning of an experiment, before training loop!

### Quiz 1

Please modify the code to specify the project name and run name as following:
- project name: "wandb_demo"
- run name: current time (`nowtime`)

Please also record experiment hyperparameters recorded in the Python dictionary `config`.

In [None]:
def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

    '''
    Please modify here
    '''

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)

After modifying the code, please uncomment the last line and run the code above.

#### Answer

In [None]:
def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    '''
    Here is the answer
    '''
    wandb.init(project='wandb_demo', name=nowtime, config=config)

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)

Now go to the wandb page.
- Can you see the training loss curve and the validation accuracy curve? Why not?
- What OS are you using? What Python version are you using? Check at `Overview` tab on the left.
- You can check the output of the print function in `Logs`.

## Step 2: Track experimental results

We can use [`wandb.log()`](https://docs.wandb.ai/ref/python/log) to log a dictionary of data to the current run's history.  

The most basic usage is to provide a Python dictionary `{"name": value}` to `wandb.log()` function, for example `wandb.log({"train-loss": 0.5, "accuracy": 0.9})`.   
Note that compared to configuration that don't change within a single experiment, these values can get dynamically updated during the training loop.

### Quiz 2

Please modify the code to record training loss, validation loss, and validation accuracy.

In [None]:
def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    wandb.init(project='wandb_demo', name=nowtime, config=config)

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")
        
        '''
        Please modify here
        '''    

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)

After modifying the code, please uncomment the last line and run the code above.

#### Answer

In [None]:
def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    wandb.init(project='wandb_demo', name=nowtime, config=config)

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")
        
        '''
        Here is the answer
        '''    
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)

## Step 3: Version management

We can use [`wandb.Artifact`](https://docs.wandb.ai/ref/python/artifact) to save experiment-related datasets, codes, and models to the server. It is very convenient for us or others to reproduce the experiment.

### Example: save the dataset

In [None]:
def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    wandb.init(project='wandb_demo', name=nowtime, config=config)

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")  
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    arti_dataset = wandb.Artifact('mnist', type='dataset')
    arti_dataset.add_dir('data/')
    wandb.log_artifact(arti_dataset)

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)

### Quiz 3

Please modify the code to save the Jupyter notebook file.

Hint:
- Create a new `Artifact` object, specify name and type.
- Use `add_file` instead of `add_dir`.

In [None]:
def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    wandb.init(project='wandb_demo', name=nowtime, config=config)

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")  
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    '''
    Please modify here
    '''     

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)

After modifying the code, please uncomment the last line and run the code above.

#### Answer

In [None]:
def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    wandb.init(project='wandb_demo', name=nowtime, config=config)

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")  
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    '''
    Here is the answer
    '''
    arti_code = wandb.Artifact('ipynb', type='code')
    arti_code.add_file('wandb_tutorial_answer.ipynb')
    wandb.log_artifact(arti_code) 

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)

## Step 4: Case analysis

Using [`wandb.Table`](https://docs.wandb.ai/guides/tables), we can perform interactive visual case analysis on dashboard.

### Example

```
my_table = wandb.Table(columns=["a", "b"], data=[["a1", "b1"], ["a2", "b2"]])
wandb.log({"Table Name": my_table})
```

### Quiz 4

Please modify the code to create and log a table to show 10 validation results.

The table should include:
- Image: original input image
- Target: target class of the image
- Prediction: predicted class of the image

Notice the `results` have already been formatted using `show_cases`  
`results` format:  
- list of lists
- [n_cases*[Image, Target, Prediction]]

In [None]:
results = show_cases(model, show_num=3)
results

In [None]:
def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    wandb.init(project='wandb_demo', name=nowtime, config=config)

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")  
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    
    # results to show
    results = show_cases(model, 10)
    '''
    Please modify here
    '''

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)

After modifying the code, please uncomment the last line and run the code above.

#### Answer

In [None]:
from utils import show_cases

def train(config):
    train_loader, test_loader = create_dataloaders(config)
    model = cnn(config) 
    optimizer = torch.optim.__dict__[config['optimizer']](params=model.parameters(), lr=config['lr'])
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    wandb.init(project='wandb_demo', name=nowtime, config=config)

    for epoch in range(1, config['epochs']+1):
        model, train_loss = train_epoch(model, train_loader, optimizer)
        val_acc, val_loss = eval_epoch(model, test_loader)
        print(f"epoch {epoch}: train_loss={train_loss:.2f}, val_loss={val_loss:.2f}, val_acc= {100 * val_acc:.2f}%")  
        wandb.log({"train/loss": train_loss, "val/loss": val_loss, "val/acc": val_acc})

    
    # results to show
    results = show_cases(model, 10)
    '''
    Here is the answer
    '''
    column_name = ['Image', 'Target', 'Prediction']
    cases = wandb.Table(columns=column_name, data=results)
    wandb.log({'cases':cases})

    wandb.finish() # Notify wandb that your run has ended and upload all log data to wandb
    
    return model

# model = train(config)