# Q4 Shoulders of Giants (15 points)
As we have already seen, deep networks can sometimes be hard to optimize. Often times they heavily overfit on small training sets. Many approaches have been proposed to counter this, eg, [Krahenbuhl et al. (ICLR’16)](http://arxiv.org/pdf/1511.06856.pdf), self-supervised learning, etc. However, the most effective approach remains pre-training the network on large, well-labeled supervised datasets such as ImageNet. 

While training on the full ImageNet data is beyond the scope of this assignment, people have already trained many popular/standard models and released them online. In this task, we will initialize a ResNet-18 model with pre-trained ImageNet weights (from `torchvision`), and finetune the network for PASCAL classification.

## 4.1 Load Pre-trained Model (7 pts)
Load the pre-trained weights up to the second last layer, and initialize last layer from scratch (the very last layer that outputs the classes).

The model loading mechanism is based on names of the weights. It is easy to load pretrained models from `torchvision.models`, even when your model uses different names for weights. Please briefly explain how to load the weights correctly if the names do not match ([hint](https://discuss.pytorch.org/t/loading-weights-from-pretrained-model-with-different-module-names/11841)).

**YOUR ANSWER HERE**

In [8]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
import matplotlib.pyplot as plt
%matplotlib inline

import trainer
from utils import ARGS
from simple_cnn import SimpleCNN
from voc_dataset import VOCDataset


# Pre-trained weights up to second-to-last layer
# final layers should be initialized from scratch!
class PretrainedResNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = models.resnet18(pretrained=True)
        self.model = nn.Sequential(*list(self.model.children())[:-1])
        self.flat_dim = 512
        self.fc = nn.Linear(512, 20) #NOT SURE ABOUT THIS (512, 20)
    
    def forward(self, x):
        N = x.size(0)
        x = self.model(x)
        flat_x = x.view(N, self.flat_dim)
        out = self.fc(flat_x)
        return out

Train the model with a similar hyperparameter setup as in the scratch case. No need to freeze the loaded weights. Show the learning curves (training loss, testing MAP) for 10 epochs. Please evaluate your model to calculate the MAP on the testing dataset every 100 iterations. Also feel free to tune the hyperparameters to improve performance.

**REMEMBER TO SAVE MODEL AT END OF TRAINING**

In [None]:
args = ARGS(epochs=50, batch_size=32, lr=0.0001, log_every = 250, val_every = 250, use_cuda = True)
model = PretrainedResNet()
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=args.step_size, gamma=args.gamma)
test_ap, test_map = trainer.train(args, model, optimizer, scheduler, 'ResNet_Pretrained_True')
print('test map:', test_map)

Validation MAP =  0.07555371783464107
Validation MAP =  0.803977872501467
Validation MAP =  0.8221376870145249
Validation MAP =  0.8257848303149841
Validation MAP =  0.8255479901556283
Validation MAP =  0.8258925095347417
Validation MAP =  0.8255822846764772
Validation MAP =  0.8251914919560971


**YOUR TENSORBOARD SCREENSHOTS HERE**