# Q3: Even deeper! Resnet18 for PASCAL classification (15 pts)

Hopefully we all got much better accuracy with the deeper model! Since 2012, much deeper architectures have been proposed. [ResNet](https://arxiv.org/abs/1512.03385) is one of the popular ones. In this task, we attempt to further improve the performance with the “very deep” ResNet-18 architecture.


## 3.1 Build ResNet-18 (1 pts)
Write a network modules for the Resnet-18 architecture (refer to the original paper). You can use `torchvision.models` for this section, so it should be very easy! 
Do not load the pretrained weights for this question. We will get to that in the next question.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
import matplotlib.pyplot as plt
%matplotlib inline

import trainer
from utils import ARGS
from simple_cnn import SimpleCNN
from voc_dataset import VOCDataset


# you could write the whole class....
# or one line :D
ResNet = models.resnet18

## 3.2 Add Tensorboard Summaries (6 pts)
You should've already written tensorboard summary generation code into `trainer.py` from q1. However, you probably just added the most basic summary features. Please implement the more advanced summaries listed here:
* training loss (should be done)
* testing MAP curves (should be done)
* learning rate
* [histogram of gradients](https://www.tensorflow.org/api_docs/python/tf/summary/histogram)

## 3.3 Train and Test (8 pts)
Use the same hyperparameter settings from Task 2, and train the model for 50 epochs. Tune hyperparameters properly to get mAP around 0.5. Report tensorboard screenshots for *all* of the summaries listed above (for image summaries show screenshots at $n \geq 3$ iterations). For the histograms, include the screenshots of the gradients of layer1.1.conv1.weight and layer4.0.bn2.bias.

**REMEMBER TO SAVE A MODEL AT THE END OF TRAINING**

In [2]:
args = ARGS(lr=0.001, gamma=0.875, epochs=50, log_every=250, val_every=250, test_batch_size=512, batch_size=32, use_cuda=True)
model = nn.Sequential(ResNet(), nn.Sigmoid())
model[0].fc = nn.Linear(512, 20)
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, args.step_size, gamma=args.gamma)
test_ap, test_map = trainer.train(args, model, optimizer, scheduler, model_name='resnet_scratch')
print('test map:', test_map)

0.08056013123039628
0.19369932969348108
0.24129273070398694
0.2914690526842999
0.32696730589792555
0.33650263841865513
0.3896580987865251
0.41077113781918184
0.4309862924880952
0.43199099463757645
0.45671356130161955
0.46101131060291795
0.48304875963039784
0.4789194153899964
0.48592370297543336
0.48686607085233746
0.49475154572567054
0.4941897346300742
0.49501151668474
0.4953016367471147
0.4958042838856202
0.49785620074452286
0.49991952186519917
0.5015002885558572
0.49734695843173177
0.4984315046008737
0.5004873684450372
0.5004997066357303
0.500199641332797
0.5022052840910909
0.5006379439974482
0.501648418937741
test map: 0.5105096170848368


![q3_tensorboard_summary.png](attachment:q3_tensorboard_summary.png)

![q3_layer1.png](attachment:q3_layer1.png)

![q3_layer4.png](attachment:q3_layer4.png)