<a href="https://colab.research.google.com/github/aditya2kahol/wandb-model-dev-course/blob/main/demo/Assignment_2_wandb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 2
# Hyperparameter opitmization using WandB Sweeps

## Dataset used: Imagenette database

<tt>[click here](https://course.fast.ai/datasets) to get to the dataset</tt>

``It's a continuation of the previous assignment``

[click here](https://wandb.ai/adi001/imagenette-project/reports/Assignment-1-Imagenette-Classification--VmlldzoyMjY1ODMx?accessToken=lel4cqgd1tkn32q9uohd9odys5iczycl8mhllfabemrkjqgj42ssn02ebugjizw3) for the report for Assignment-1

In [1]:
#set project name that you were working on.
PROJECT_NAME = 'imagenette-project'
ENTITY = None

In [2]:
!pip install wandb --upgrade --quiet

[K     |████████████████████████████████| 1.8 MB 33.5 MB/s 
[K     |████████████████████████████████| 146 kB 71.6 MB/s 
[K     |████████████████████████████████| 181 kB 68.7 MB/s 
[K     |████████████████████████████████| 63 kB 2.0 MB/s 
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [3]:
import os
import wandb
import tarfile
import numpy as np
import pandas as pd

In [4]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.optim.lr_scheduler import OneCycleLR

import torchvision
import torchvision.transforms as T
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision.utils import make_grid
from torch.utils.data import random_split
from torchvision.datasets import ImageFolder
from torchvision.datasets.utils import download_url

In [5]:
torch.set_default_dtype(d = torch.float32)

In [6]:
# Download dataset
dataset_url = "https://s3.amazonaws.com/fast-ai-imageclas/imagenette-160.tgz"
download_url(dataset_url, '.')

Downloading https://s3.amazonaws.com/fast-ai-imageclas/imagenette-160.tgz to ./imagenette-160.tgz


  0%|          | 0/98752094 [00:00<?, ?it/s]

In [7]:
# Extract from archive
with tarfile.open('./imagenette-160.tgz', 'r:gz') as tar:
    tar.extractall(path='./data')

In [8]:
data_dir = './data/imagenette-160/train'
test_dir = './data/imagenette-160/val'

In [9]:
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
Transform = T.Compose([T.ToTensor(),
                       T.Normalize(*stats, inplace = True),
                       T.Resize(size = (80,80))])

In [10]:
dataset = ImageFolder(root = data_dir, transform = Transform)
test_dataset = ImageFolder(root = test_dir, transform = Transform)
print(f"Dataset length = {len(dataset)}")

Dataset length = 12894


In [11]:
#decide validation and training size
val_size = int(0.1 * len(dataset))
train_size = len(dataset) - val_size

#split the data at random
train_ds, val_ds = random_split(dataset, [train_size, val_size])

len(train_ds), len(val_ds)

(11605, 1289)

## Login to wandb

In [12]:
wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

## Setup sweep configuration

<b>Useful beginner friendly resource used for this notebook</b>
1. [Youtube](https://www.youtube.com/watch?v=9zrmUIlScdY&list=PLD80i8An1OEGajeVo15ohAQYF1Ttle0lk&index=6)
2. [Colab notebook](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Organizing_Hyperparameter_Sweeps_in_PyTorch_with_W%26B.ipynb#scrollTo=Tn0Kr1EHDWak)

Following search strategies are available for sweeps:
- grid search
- random search
- bayesian search

we will work with ``random`` search

In [13]:
#set search strategy method
sweep_config = {
    'method': 'random'
    }

#set metric information --> we want to minimize the loss function.
metric = {
    'name': 'Validation accuracy',
    'goal': 'maximize'   
    }
sweep_config['metric'] = metric

#set all other hyperparameters
parameters_dict = {
    'epochs':{
        'values': [10,15,20]
    },
    'optimizer':{
        'values': ['sgd','adam']
    },
    'momentum':{
        'distribution': 'uniform',
        'min': 0.5,
        'max': 0.99
    },
    'dropout': {
          'values': [0, 0.1, 0.2]
    },
    'batch_size':{
        'distribution': 'q_log_uniform_values',
        'q': 8,
        'min': 70,
        'max': 110 
    }
    }
sweep_config['parameters'] = parameters_dict

#set early stopping criteria
early_stop_dict = {
    'type': 'hyperband',
    'max_iter': 20,
    's': 2
}
sweep_config['early_terminate'] = early_stop_dict

In [14]:
from pprint import pprint
pprint(sweep_config)

{'early_terminate': {'max_iter': 20, 's': 2, 'type': 'hyperband'},
 'method': 'random',
 'metric': {'goal': 'maximize', 'name': 'Validation accuracy'},
 'parameters': {'batch_size': {'distribution': 'q_log_uniform_values',
                               'max': 110,
                               'min': 70,
                               'q': 8},
                'dropout': {'values': [0, 0.1, 0.2]},
                'epochs': {'values': [10, 15, 20]},
                'momentum': {'distribution': 'uniform',
                             'max': 0.99,
                             'min': 0.5},
                'optimizer': {'values': ['sgd', 'adam']}}}


## Initialize sweep configuration

In [15]:
sweep_id = wandb.sweep(sweep_config, project= PROJECT_NAME, entity = ENTITY)

Create sweep with ID: 3ud9e51t
Sweep URL: https://wandb.ai/adi001/imagenette-project/sweeps/3ud9e51t


In [16]:
print(sweep_id)

3ud9e51t


## Define your training pipeline

In [17]:
#import ResNet9 for the training pipeline
from model import ResNet9

In [18]:
#set device
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
device

device(type='cuda')

In [28]:
def pipeline():
  #set default configuration for hyperparameters
  config_dict = dict(
      batch_size = 80,
      epochs = 2,
      optimizer = 'sgd',
      momentum = 0.9,
      dropout = 0.1
  )
  #set some other important parameters
  max_lr = 0.1 #we will manually set a learning rate scheduler.
  grad_clip = 0.7 #gradient clipping is performed manually.
  weight_decay = 1e-4 #weight decay is set manually.

  with wandb.init(config = config_dict):
    #set configuration
    config = wandb.config

    #build dataloaders
    train_dl, val_dl = build_dataloaders(train_ds, val_ds, config.batch_size)

    #build model
    model = build_model(config.dropout)
    
    #build optimizer
    optimizer = build_optimizer(model, max_lr, config.optimizer, config.momentum, weight_decay) 
    
    #set loss criterion
    criterion = nn.CrossEntropyLoss()

    #set schedular
    schedular = OneCycleLR(optimizer,
                           max_lr = max_lr,
                           steps_per_epoch = len(train_dl),
                           epochs = config.epochs) 

    #train the model
    torch.cuda.empty_cache()
    for epoch in range(config.epochs):
      
      model.train()
      for batch in train_dl:
        train_batch(batch, model, optimizer, criterion, grad_clip = grad_clip)
        schedular.step()
      
      model.eval()
      for batch in val_dl:
        evaluate_batch(batch, model, criterion)
    
    #prints final results for the whole validation data.
    Validation_acc = evaluate_model(model, val_dl)
    wandb.log({"Validation accuracy": Validation_acc,
               "Optimizer": config.optimizer,
               "Batch Size": config.batch_size,
               "Epochs": config.epochs,
               "Dropout":config.dropout,
               "Momentum": config.momentum,
               "Max-LR":max_lr})

### Define all utility functions

In [29]:
def build_dataloaders(train_ds, val_ds, batch_size):
  train_dl = DataLoader(dataset = train_ds,
                        batch_size = batch_size,
                        shuffle = True,
                        num_workers = 2,
                        pin_memory = True)
  val_dl = DataLoader(dataset = val_ds,
                      batch_size = 2*batch_size,
                      num_workers = 2,
                      pin_memory = True)
  
  return train_dl, val_dl

In [30]:
def build_model(dropout):
  model = ResNet9(3,10,dropout = dropout)
  return model.to(device)

In [31]:
def build_optimizer(model, max_lr, opt, momentum, weight_decay):
  if opt == 'sgd':
    optimizer = optim.SGD(model.parameters(),
                          lr = max_lr,
                          momentum = momentum,
                          weight_decay = weight_decay)
  elif opt == 'adam':
    optimizer = optim.Adam(model.parameters(),
                           lr = max_lr,
                           weight_decay = weight_decay)
  
  return optimizer

In [32]:
def train_batch(batch, model, optimizer, criterion, grad_clip = None):
  images, labels = batch
  images = images.to(device)
  labels = labels.to(device)
  
  optimizer.zero_grad()

  #forward pass
  probs = model(images)
  #evaluate loss
  loss = criterion(probs, labels)
  #backwards pass
  loss.backward()
  #gradient clipping
  if grad_clip:
    nn.utils.clip_grad_value_(model.parameters(), grad_clip)
  #weight update step
  optimizer.step()

  #log batch loss
  wandb.log({"BATCH-LOSS": loss.item()})

In [33]:
@torch.no_grad()
def evaluate_batch(batch, model, criterion):
  images, labels = batch
  probs = model(images.to(device))
  #evaluate loss
  loss = criterion(probs, labels.to(device))

  #evaluate accuracy
  _, preds = torch.max(probs, dim = 1)
  accuracy = torch.sum(preds == labels.to(device)).item() / len(preds)

  wandb.log({"VAL-LOSS":loss.item(),
             "VAL-ACC": accuracy})

In [34]:
def evaluate_model(model, dl):
  acc = []
  for images, labels in dl:
    probs = model(images.to(device))
    _, preds = torch.max(probs, dim = 1)
    acc.append(torch.sum(preds == labels.to(device)).item() / len(preds))
  
  print(f"Final Validation Accuracy: {np.mean(acc)}")
  
  return np.mean(acc)

## Let's sweep

In [35]:
wandb.agent(sweep_id, pipeline, count = 4)

[34m[1mwandb[0m: Agent Starting Run: h7gx9ue9 with config:
[34m[1mwandb[0m: 	batch_size: 80
[34m[1mwandb[0m: 	dropout: 0
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	momentum: 0.9549371687257296
[34m[1mwandb[0m: 	optimizer: adam


Final Validation Accuracy: 0.6203703703703703


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
BATCH-LOSS,▇▇█▆▅▄▄▄▄▃▃▃▄▄▃▃▄▃▃▃▃▂▃▃▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁▂
Batch Size,▁
Dropout,▁
Epochs,▁
Max-LR,▁
Momentum,▁
VAL-ACC,▁▂▂▂▂▂▂▁▁▁▁▂▄▄▄▅▄▄▆▅▄▄▄▄▅▃▅▅▆▆█▆▇▇█▇▇▇█▃
VAL-LOSS,▃▃▃▃▃▃▄▄█▇█▇▂▃▃▂▂▂▂▂▃▃▂▃▂▃▂▂▂▂▁▂▁▁▁▁▁▁▁▃
Validation accuracy,▁

0,1
BATCH-LOSS,0.62084
Batch Size,80
Dropout,0
Epochs,10
Max-LR,0.1
Momentum,0.95494
Optimizer,adam
VAL-ACC,0.33333
VAL-LOSS,1.84968
Validation accuracy,0.62037


[34m[1mwandb[0m: Agent Starting Run: s6cpmt5i with config:
[34m[1mwandb[0m: 	batch_size: 96
[34m[1mwandb[0m: 	dropout: 0.1
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	momentum: 0.5809982376212655
[34m[1mwandb[0m: 	optimizer: adam


Final Validation Accuracy: 0.7593413277719847


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
BATCH-LOSS,█▆▅▅▅▄▄▄▃▃▃▄▃▃▃▃▄▃▃▃▃▃▃▃▃▂▃▂▂▂▂▁▂▁▁▂▁▁▁▁
Batch Size,▁
Dropout,▁
Epochs,▁
Max-LR,▁
Momentum,▁
VAL-ACC,▁▂▁▁▄▄▃▄▅▅▄▄▃▄▃▃▅▅▄▄▄▅▅▅▅▆▇▆▇▇█▇▇██▇▇███
VAL-LOSS,▇▇██▄▄▅▄▃▃▄▄▆▅▅▅▄▃▄▄▄▃▃▃▂▂▂▃▂▂▁▂▁▁▁▂▁▁▁▁
Validation accuracy,▁

0,1
BATCH-LOSS,0.57232
Batch Size,96
Dropout,0.1
Epochs,10
Max-LR,0.1
Momentum,0.581
Optimizer,adam
VAL-ACC,0.77372
VAL-LOSS,0.70523
Validation accuracy,0.75934


[34m[1mwandb[0m: Agent Starting Run: 93ruerw8 with config:
[34m[1mwandb[0m: 	batch_size: 96
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	momentum: 0.5702403101283393
[34m[1mwandb[0m: 	optimizer: sgd


Final Validation Accuracy: 0.8444723236009731


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
BATCH-LOSS,██▇▇▅▅▄▄▃▃▃▃▂▃▂▃▂▂▂▂▂▃▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁
Batch Size,▁
Dropout,▁
Epochs,▁
Max-LR,▁
Momentum,▁
VAL-ACC,▁▁▂▂▅▄▃▄▆▆▆▆▇▆▅▆▇▇▇▇▆▆▇▇▇▇▇▇▇▇████▇██▇██
VAL-LOSS,▇▇▇█▃▄▆▅▂▃▂▂▂▂▃▂▁▂▂▂▂▃▂▁▁▁▂▁▂▂▂▁▂▂▂▁▁▂▂▂
Validation accuracy,▁

0,1
BATCH-LOSS,0.0054
Batch Size,96
Dropout,0.2
Epochs,20
Max-LR,0.1
Momentum,0.57024
Optimizer,sgd
VAL-ACC,0.85401
VAL-LOSS,0.89105
Validation accuracy,0.84447


[34m[1mwandb[0m: Agent Starting Run: bzwp5u69 with config:
[34m[1mwandb[0m: 	batch_size: 112
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	momentum: 0.5265038404227127
[34m[1mwandb[0m: 	optimizer: sgd


Final Validation Accuracy: 0.8446349323753171


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
BATCH-LOSS,█▆▆▅▅▄▄▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁
Batch Size,▁
Dropout,▁
Epochs,▁
Max-LR,▁
Momentum,▁
VAL-ACC,▂▁▁▃▃▃▄▃▄▅▄▇▆▆▆▆▆▅▆▆▆▇▇▇▆▇▇▇▇▇▇█████▇▇██
VAL-LOSS,▇██▆▅▅▄▅▄▄▄▂▂▂▃▃▃▃▃▂▂▂▂▂▂▁▂▁▁▂▂▂▁▁▂▁▂▂▂▂
Validation accuracy,▁

0,1
BATCH-LOSS,0.02506
Batch Size,112
Dropout,0.2
Epochs,15
Max-LR,0.1
Momentum,0.5265
Optimizer,sgd
VAL-ACC,0.85799
VAL-LOSS,0.73016
Validation accuracy,0.84463
