# Q6: Improve Performance (20 pts)

Many techniques have been proposed in the literature to improve classification performance for deep networks. In this section, we try to use a recently proposed technique called [mixup](https://arxiv.org/abs/1710.09412). The main idea is to augment the training set with linear combinations of images and labels. Read through the paper and modify your model to implement mixup. Report your performance, along with training/test curves, and comparison with baseline in the report.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
!tar -xf VOCtrainval_06-Nov-2007.tar

!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar 
!tar -xf VOCtest_06-Nov-2007.tar

--2022-03-01 18:33:54--  http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
Resolving host.robots.ox.ac.uk (host.robots.ox.ac.uk)... 129.67.94.152
Connecting to host.robots.ox.ac.uk (host.robots.ox.ac.uk)|129.67.94.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 460032000 (439M) [application/x-tar]
Saving to: ‘VOCtrainval_06-Nov-2007.tar’


2022-03-01 18:34:08 (30.9 MB/s) - ‘VOCtrainval_06-Nov-2007.tar’ saved [460032000/460032000]

--2022-03-01 18:34:10--  http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
Resolving host.robots.ox.ac.uk (host.robots.ox.ac.uk)... 129.67.94.152
Connecting to host.robots.ox.ac.uk (host.robots.ox.ac.uk)|129.67.94.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 451020800 (430M) [application/x-tar]
Saving to: ‘VOCtest_06-Nov-2007.tar’


2022-03-01 18:34:24 (30.8 MB/s) - ‘VOCtest_06-Nov-2007.tar’ saved [451020800/451020800]



In [3]:
%cd drive/MyDrive/spring22/16824/hw1

/content/drive/MyDrive/spring22/16824/hw1


In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
%matplotlib inline
import trainer
import utils

from collections import OrderedDict
from simple_cnn import SimpleCNN
from torch.distributions import Beta
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from torchvision import models
from utils import ARGS
from voc_dataset import VOCDataset

In [5]:
def collate_fn_mixup(data):
    """
       data: is a list of tuples with (image, label, wgt).
    """

    n = len(data)
    beta = Beta(torch.FloatTensor([0.5]), torch.FloatTensor([0.5]))

    alphas = torch.Tensor([beta.sample() for i in range(n)])
    indices = torch.randint(low=0, high=n, size=(n,))

    images = []
    labels = []
    wgts = []

    for i in range(n):
      
        alpha, j = alphas[i], indices[i]

        image = alpha * data[i][0] + (1. - alpha) * data[j][0]
        label = alpha * data[i][1] + (1. - alpha) * data[j][1]
        wgt = alpha * data[i][2] + (1. - alpha) * data[j][2]

        images.append(image)
        labels.append(label)
        wgts.append(wgt)

    return (torch.stack(images), torch.stack(labels), torch.stack(wgts))

def get_data_loader(name='voc', 
                    train=True, 
                    batch_size=64, 
                    split='train', 
                    inp_size=224,
                    perform_transforms=True):
    if name == 'voc':
        from voc_dataset import VOCDataset
        dataset = VOCDataset(split, 
                             inp_size, 
                             perform_transforms=perform_transforms)
    else:
        raise NotImplementedError

    if split == 'trainval' or split == 'train':

        loader = DataLoader(
            dataset,
            batch_size=batch_size,
            shuffle=train,
            collate_fn=collate_fn_mixup,
            num_workers=2,
        )

    elif split == 'test':

        loader = DataLoader(
            dataset,
            batch_size=batch_size,
            shuffle=train,
            num_workers=2,
        )
      
    return loader

In [6]:
from trainer import save_this_epoch, save_model

def customBCEWithLogitsLoss(y_hat, y, eps=1e-7):

    y_hat = torch.sigmoid(y_hat)

    return - torch.mean(y * torch.log(y_hat + eps) + \
                        (1 - y) * torch.log(1 - y_hat + eps))

def train(args,
          model, 
          optimizer, 
          scheduler=None, 
          model_name='model', 
          perform_transforms=True):
    # TODO Q1.5: Initialize your tensorboard writer here!
    train_loader = get_data_loader('voc',
                                   train=True,
                                   batch_size=args.batch_size,
                                   split='trainval',
                                   inp_size=args.inp_size,
                                   perform_transforms=perform_transforms)
    test_loader = get_data_loader('voc', 
                                  train=False, 
                                  batch_size=args.test_batch_size, 
                                  split='test', 
                                  inp_size=args.inp_size,
                                  perform_transforms=perform_transforms)
    train_writer = SummaryWriter('runs/{}/train/'.format(model_name))
    test_writer = SummaryWriter('runs/{}/test/'.format(model_name))

    # Ensure model is in correct mode and on right device
    model.train()
    model = model.to(args.device)
    criterion = customBCEWithLogitsLoss

    cnt = 0
    for epoch in range(args.epochs):
        for batch_idx, (data, target, wgt) in enumerate(train_loader):

            # print(len(data))
            # print(len(data[0]))

            # Get a batch of data.
            # data = torch.cat(data)
            data, target, wgt = data.to(args.device), \
                                  target.to(args.device), \
                                  wgt.to(args.device), 

            optimizer.zero_grad()
            # Forward pass
            output = model(data)
            # Calculate the loss
            # TODO Q1.4: your loss for multi-label classification
            loss = criterion(output, target)
            # Calculate gradient w.r.t the loss
            loss.backward()
            # Optimizer takes one step
            optimizer.step()
            # Log info
            if cnt % args.log_every == 0:
                # TODO Q1.5: Log training loss to tensorboard
                print('Train Epoch: {} [{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, cnt, 100. * batch_idx / len(train_loader), loss.item()))
                train_writer.add_scalar('Loss', loss.item(), cnt)
                # TODO Q3.2: Log histogram of gradients
                for name, param in model.named_parameters():
                  train_writer.add_histogram(name, param, cnt)

            # Validation iteration
            if cnt % args.val_every == 0:
                model.eval()
                ap, map = utils.eval_dataset_map(model, args.device, test_loader)
                # TODO Q1.5: Log MAP to tensorboard
                print("Test MAP: {}".format(map))
                test_writer.add_scalar('MAP', map, cnt)
                model.train()
            cnt += 1

        # TODO Q3.2: Log Learning rate
        if scheduler is not None:
            scheduler.step()
            train_writer.add_scalar('Learning Rate',
                                    scheduler.get_last_lr()[0],
                                    epoch)

        # save model
        if save_this_epoch(args, epoch):
            save_model(epoch, model_name, model)

    train_writer.close()
    test_writer.close()

    # Validation iteration
    test_loader = utils.get_data_loader('voc', train=False, batch_size=args.test_batch_size, split='test', inp_size=args.inp_size)
    ap, map = utils.eval_dataset_map(model, args.device, test_loader)
    return ap, map

In [7]:
# Pre-trained weights up to second-to-last layer
# final layers should be initialized from scratch!
class PretrainedResNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.resnet = models.resnet18(pretrained=True)
        n_inputs = self.resnet.fc.in_features

        # Initializing a new final layer.
        classifier = nn.Sequential(OrderedDict([
            ('fc1', nn.Linear(n_inputs, 20))
        ]))
        self.resnet.fc = classifier
    
    def forward(self, x):

        # Just running the entire model on the data.
        return self.resnet(x)

args = ARGS(epochs=10,
            batch_size=32,
            lr=0.0001,
            use_cuda=True,
            step_size=5,
            save_freq=5,
            save_at_end=True,
            val_every=100)
model_name = 'PretrainedResNetMixup3'
model = PretrainedResNet()
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                            step_size=args.step_size,
                                            gamma=args.gamma)
test_ap, test_map = train(args,
                          model,
                          optimizer,
                          scheduler,
                          model_name=model_name)
print('test map:', test_map)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


  0%|          | 0.00/44.7M [00:00<?, ?B/s]

Test MAP: 0.07488589076267867
Test MAP: 0.6917148356571825
Test MAP: 0.756920050627253
Test MAP: 0.7751576088381446
Test MAP: 0.7775135723754663
Test MAP: 0.8035716481726333
Test MAP: 0.8028374220870322
Test MAP: 0.8027681345277162
Test MAP: 0.7992594607378356
Test MAP: 0.8118295254674928
Test MAP: 0.8165006090347499
Test MAP: 0.8198434258352195
Test MAP: 0.8146744729693138
Test MAP: 0.824499993346393
Test MAP: 0.8172020622557616
Test MAP: 0.8250154289308755
test map: 0.8253699763231044


**Plots**

![image](images/mixup.png)

I added the mixup augmentation procedure using a collate function which would run everytime a batch was called. While I didn't see any performance improvements, which I believe is mostly due to the transforms I am currently using, I believe that this would have helped if I had been training the network from scratch. I also think that having a smarter way of deciding image pairs would also greatly benefit the training using this approach.