# Q6: Improve Performance (20 pts)

Many techniques have been proposed in the literature to improve classification performance for deep networks. In this section, we try to use a recently proposed technique called [mixup](https://arxiv.org/abs/1710.09412). The main idea is to augment the training set with linear combinations of images and labels. Read through the paper and modify your model to implement mixup. Report your performance, along with training/test curves, and comparison with baseline in the report.

In [1]:
# implement mixup regularization here and show performance
import torch
from utils import ARGS
import nbimporter
from q6_utils import train
from q4_imagenet_finetune_pascal import PretrainedResNet
from resnet import ResNet
from q2_caffenet_pascal import CaffeNet


## Pre-trained resnet

In [2]:
args = ARGS(batch_size=64, epochs=10, lr=0.0001, save_at_end=True, \
            save_freq=10, use_cuda=True, val_every=100, gamma=0.5, step_size=2, log_every=100)
model = PretrainedResNet()
optimizer = torch.optim.Adam(model.parameters(), args.lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, args.step_size, args.gamma)
test_ap, test_map = train(args, model, optimizer, scheduler, 'resnet18_pretrained_mixup')
print('test map:', test_map)

test map: 0.8358738446542814


### Test map vs iteration
![Screenshot%20from%202022-03-03%2003-23-18.png](attachment:Screenshot%20from%202022-03-03%2003-23-18.png)

### Training loss vs iteration
![Screenshot%20from%202022-03-03%2003-23-19.png](attachment:Screenshot%20from%202022-03-03%2003-23-19.png)

### lr vs iteration
![Screenshot%20from%202022-03-03%2003-23-16.png](attachment:Screenshot%20from%202022-03-03%2003-23-16.png)

#### final map without mixup = 0.7485, with mixup 0.8387

## Resnet scratch

In [2]:
args = ARGS(batch_size=64, epochs=50, lr=0.001, save_at_end=True, \
            save_freq=50, use_cuda=True, val_every=250, gamma=0.2, step_size=10)
model = ResNet()
optimizer = torch.optim.Adam(model.parameters(), args.lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, args.step_size, args.gamma)
test_ap, test_map = train(args, model, optimizer, scheduler, 'resnet_scratch_mixup')
print('test map:', test_map)

test map: 0.4573248418379056


### test map vs iteration
![Screenshot%20from%202022-03-03%2015-44-57.png](attachment:Screenshot%20from%202022-03-03%2015-44-57.png)

### train loss vs iteration
![Screenshot%20from%202022-03-03%2015-44-41.png](attachment:Screenshot%20from%202022-03-03%2015-44-41.png)

### learning rate vs epoch
![Screenshot%20from%202022-03-03%2015-44-14.png](attachment:Screenshot%20from%202022-03-03%2015-44-14.png)

## Caffenet scratch

In [2]:
args = ARGS(batch_size=32, epochs=50,lr=0.0001, save_at_end=True, \
            save_freq=20, use_cuda=True, val_every=250, step_size=5, gamma=0.8)
model = CaffeNet()
optimizer = torch.optim.Adam(model.parameters(), args.lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, args.step_size, args.gamma)
test_ap, test_map = train(args, model, optimizer, scheduler, 'caffenet_mixup')
print('test map:', test_map)

test map: 0.3799856437694381


### test map vs iter
![MAP.svg](attachment:MAP.svg)

### train loss vs iter
![training%20loss.svg](attachment:training%20loss.svg)

### lr vs iter
![Learning%20Rate%20%281%29.svg](attachment:Learning%20Rate%20%281%29.svg)