# [Deep Learning](https://www.cc.gatech.edu/~hays/compvision/proj6/)

## Setup

In [1]:
%matplotlib notebook
%load_ext autoreload
%autoreload 2
import cv2
import numpy as np
import random
import torch.nn as nn
import torch.optim as optim
import os.path as osp
import matplotlib.pyplot as plt
from utils import *
import student_code as sc
from torchvision.models import alexnet

data_path = osp.join('../data', '15SceneData')
num_classes = 15

# If you have a good Nvidia GPU with an appropriate environment, 
# try setting the use_GPU flag to True. The environment provided does
# not support GPUs. Now that we will not provide any support for GPU
# computation in this project. Please also note that 
# we will evaluate your implementations only using CPU mode, so even if
# you use a GPU, make sure your code runs in the CPU mode and using the
# environment we provided. 
use_GPU = False
if use_GPU:
    from utils_gpu import *

To train a network in PyTorch, we need 4 components:
1. Dataset - an object which can load the data and labels given an index.
2. Model - an object that contains the network architecture definition.
3. Loss function - a function that measures how far the network output is from the ground truth label.
4. Optimizer - an object that optimizes the network parameters to reduce the loss value.

This project has two main parts. In Part 1, you will train a deep network from scratch. In Part 2, you will "fine-tune" a trained network. 

## Part 0. Warm up! Training a Deep Network from Scratch

In [2]:
# Fix random seeds so that results will be reproducible
set_seed(0, use_GPU)

You do not need to code anything for this part. You will simply run the code we provided, but we want you to report the result you got. This section will also familiarize you with the steps of training a deep network from scratch. 

In [2]:
# Training parameters.
input_size = (64, 64)
RGB = False  
base_lr = 1e-2  # may try a smaller lr if not using batch norm
weight_decay = 5e-4
momentum = 0.9

We will first create our datasets, by calling the create_datasets function from the student_code. This function returns a seperate dataset loader for both the training and the testing/validation dataset. Each dataset loader is then used load the datasets after appling some pre-processing transforms. In Part 1. you will be asked to add a few more pre-processing transforms to the dataset loaders by modifying this function.

In [4]:
# Create the training and testing datasets.
train_dataset, test_dataset = sc.create_datasets(data_path=data_path, input_size=input_size, rgb=RGB)
assert test_dataset.classes == train_dataset.classes

Computing pixel mean and stdev...
Batch 0 / 30
Batch 20 / 30
Done, mean = 
[0.45579668]
std = 
[0.23624939]
Computing pixel mean and stdev...
Batch 0 / 60
Batch 20 / 60
Batch 40 / 60
Done, mean = 
[0.45517009]
std = 
[0.2350788]


Now we will create our network model using the SimpleNet class from the student_code. The implementation provided in SimpleNet class gives you a basic network. In Part 1. you will be asked to add a few more layers to this network. 

In [5]:
# Create the network model.
model = sc.SimpleNet(num_classes=num_classes, rgb=False, verbose=False)
if use_GPU:
    model = model.cuda()
print(model)

SimpleNet(
  (features): Sequential(
    (0): Conv2d(1, 10, kernel_size=(9, 9), stride=(1, 1), bias=False)
    (1): MaxPool2d(kernel_size=7, stride=7, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
  )
  (classifier): Conv2d(10, 15, kernel_size=(8, 8), stride=(1, 1))
)


Next we will create the loss function and the optimizer. 

In [6]:
# Create the loss function.
# see http://pytorch.org/docs/0.3.0/nn.html#loss-functions for a list of available loss functions
loss_function = nn.CrossEntropyLoss()

In [7]:
# Create the optimizer and a learning rate scheduler
optimizer = optim.SGD(params=model.parameters(), lr=base_lr, weight_decay=weight_decay, momentum=momentum)
# Currently a simple step scheduler.
# See http://pytorch.org/docs/0.3.0/optim.html#how-to-adjust-learning-rate for various LR schedulers
# and how to use them
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=60, gamma=0.1)

Finally we are ready to train our network! We will start a local server to see the training progress of our network. Open a new terminal and activate the environment for this project. Then run the following command: ***"python -m visdom.server"***. This will start a local server. The terminal output should give out a link like: "http://localhost:8097". Open this link in your browser. After you run the following block, visit this link again, and you will be able to see graphs showing the progress of your training! If you do not see any graphs, select Part 1 on the top left bar where is says Environment (only select Part 1, do not check main or Part 2).

In [9]:
# train the network!
params = {'n_epochs': 100, 'batch_size': 50, 'experiment': 'part1'}
trainer = Trainer(train_dataset, test_dataset, model, loss_function, optimizer, lr_scheduler, params)
best_prec1 = trainer.train_val()
print('Best top-1 Accuracy = {:4.3f}'.format(best_prec1))
#Best so far: 36.832 (included in report)

---------------------------------------
Experiment: part1
resume_optim: True
experiment: part1
checkpoint_file: None
num_workers: 4
shuffle: True
batch_size: 50
print_freq: 100
n_epochs: 100
do_val: True
val_freq: 1
---------------------------------------
part1 Epoch 0 / 100
train part1: batch 0/29, loss 0.519, top-1 accuracy 86.000, top-5 accuracy 96.000
train part1: loss 0.419282
val part1: batch 0/59, loss 4.863, top-1 accuracy 18.000, top-5 accuracy 46.000
val part1: loss 2.913104
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 1 / 100
train part1: batch 0/29, loss 0.448, top-1 accuracy 88.000, top-5 accuracy 100.000
train part1: loss 0.417710
val part1: batch 0/59, loss 4.490, top-1 accuracy 18.000, top-5 accuracy 56.000
val part1: loss 2.935968
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 2 / 100
train part1: batch 0/29, loss 0.451, top-1 accuracy 94.000, top-5 accuracy 98.000
train part1: loss 0.416379
val part1: batch 0/59, loss 4.483, top-1 accuracy 18.000, to

train part1: batch 0/29, loss 0.410, top-1 accuracy 94.000, top-5 accuracy 100.000
train part1: loss 0.372401
val part1: batch 0/59, loss 4.629, top-1 accuracy 18.000, top-5 accuracy 56.000
val part1: loss 3.001126
Checkpoint saved
part1 Epoch 32 / 100
train part1: batch 0/29, loss 0.316, top-1 accuracy 94.000, top-5 accuracy 100.000
train part1: loss 0.372172
val part1: batch 0/59, loss 4.624, top-1 accuracy 18.000, top-5 accuracy 56.000
val part1: loss 3.001546
Checkpoint saved
part1 Epoch 33 / 100
train part1: batch 0/29, loss 0.473, top-1 accuracy 90.000, top-5 accuracy 100.000
train part1: loss 0.372061
val part1: batch 0/59, loss 4.598, top-1 accuracy 18.000, top-5 accuracy 58.000
val part1: loss 3.001674
Checkpoint saved
part1 Epoch 34 / 100
train part1: batch 0/29, loss 0.330, top-1 accuracy 94.000, top-5 accuracy 100.000
train part1: loss 0.371917
val part1: batch 0/59, loss 4.592, top-1 accuracy 18.000, top-5 accuracy 56.000
val part1: loss 3.003110
Checkpoint saved
part1 Epo

val part1: batch 0/59, loss 4.633, top-1 accuracy 18.000, top-5 accuracy 58.000
val part1: loss 3.014991
Checkpoint saved
part1 Epoch 64 / 100
train part1: batch 0/29, loss 0.317, top-1 accuracy 98.000, top-5 accuracy 100.000
train part1: loss 0.367052
val part1: batch 0/59, loss 4.576, top-1 accuracy 18.000, top-5 accuracy 58.000
val part1: loss 3.016247
Checkpoint saved
part1 Epoch 65 / 100
train part1: batch 0/29, loss 0.256, top-1 accuracy 96.000, top-5 accuracy 100.000
train part1: loss 0.366855
val part1: batch 0/59, loss 4.675, top-1 accuracy 18.000, top-5 accuracy 56.000
val part1: loss 3.012013
Checkpoint saved
part1 Epoch 66 / 100
train part1: batch 0/29, loss 0.310, top-1 accuracy 98.000, top-5 accuracy 100.000
train part1: loss 0.366448
val part1: batch 0/59, loss 4.628, top-1 accuracy 18.000, top-5 accuracy 58.000
val part1: loss 3.014980
Checkpoint saved
part1 Epoch 67 / 100
train part1: batch 0/29, loss 0.385, top-1 accuracy 92.000, top-5 accuracy 100.000
train part1: lo

train part1: batch 0/29, loss 0.488, top-1 accuracy 86.000, top-5 accuracy 98.000
train part1: loss 0.362929
val part1: batch 0/59, loss 4.645, top-1 accuracy 18.000, top-5 accuracy 58.000
val part1: loss 3.020581
Checkpoint saved
part1 Epoch 97 / 100
train part1: batch 0/29, loss 0.437, top-1 accuracy 90.000, top-5 accuracy 98.000
train part1: loss 0.362864
val part1: batch 0/59, loss 4.641, top-1 accuracy 18.000, top-5 accuracy 58.000
val part1: loss 3.020832
Checkpoint saved
part1 Epoch 98 / 100
train part1: batch 0/29, loss 0.372, top-1 accuracy 94.000, top-5 accuracy 98.000
train part1: loss 0.362878
val part1: batch 0/59, loss 4.647, top-1 accuracy 18.000, top-5 accuracy 58.000
val part1: loss 3.020657
Checkpoint saved
part1 Epoch 99 / 100
train part1: batch 0/29, loss 0.231, top-1 accuracy 96.000, top-5 accuracy 100.000
train part1: loss 0.362833
val part1: batch 0/59, loss 4.644, top-1 accuracy 18.000, top-5 accuracy 58.000
val part1: loss 3.020859
Checkpoint saved
Best top-1 A

Expect this code to take around 5 minutes if not using a GPU (which may run in around 3 minutes). Now you are ready to actually modify the functions we used to train our model. Before you move, make sure to record the accuracy of your network from Part 0, and report it in your write up. 

## Part 1: Modifying the Dataloaders and the Simple Network create_datasets

In [3]:
# Fix random seeds so that results will be reproducible
set_seed(0, use_GPU)

Now you will modify the create_datasets function from the student_code. You will add random left-right mirroring and normalization to the transformations applied to the training dataset. You will add normalization to the transformations applied to the testing dataset. 

In [4]:
# Create the training and testing datasets.
train_dataset, test_dataset = sc.create_datasets(data_path=data_path, input_size=input_size, rgb=RGB)
assert test_dataset.classes == train_dataset.classes

Computing pixel mean and stdev...
Batch 0 / 30
Batch 20 / 30
Done, mean = 
[0.45579668]
std = 
[0.23624939]
Computing pixel mean and stdev...
Batch 0 / 60
Batch 20 / 60
Batch 40 / 60
Done, mean = 
[0.45517009]
std = 
[0.2350788]


Now you will modify the SimpleNet by adding droppout, batch normalization, and additional convolution/maxpool/relu layers. You should exceed an accuracy of **50%**. Make sure your network passes this threshold, it is required for full credit on this section!

You can also use the following two blocks to determine the stucture of your network.

In [19]:
# create the network model
model = sc.SimpleNet(num_classes=num_classes, rgb=False, verbose=False)
if use_GPU:
    model = model.cuda()
print(model)

SimpleNet(
  (features): Sequential(
    (0): Conv2d(1, 10, kernel_size=(7, 7), stride=(1, 1), bias=False)
    (1): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): MaxPool2d(kernel_size=4, stride=3, padding=0, dilation=1, ceil_mode=False)
    (3): ReLU()
    (4): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (5): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
    (7): ReLU()
    (8): Dropout(p=0.5)
  )
  (classifier): Conv2d(10, 15, kernel_size=(5, 5), stride=(1, 1))
)


In [20]:
# Use this block to determine the kernel size of the conv2d layer in the classifier
# first, set the kernel size of that conv2d layer to 1, and run this block
# then, use that size of input to the classifier printed by this block to
# go back and update the kernel size of the conv2d layer in the classifier
# Finally, run this block again and verify that the network output size is a scalar
# Don't forget to re-run the block above every time you update the SimpleNet class!
from torch.autograd import Variable
data, _ = train_dataset[0]
s = data.size()
data = Variable(data.view(1, *s))
if use_GPU:
    data = data.cuda()
out = model(data)
print('Network output size is', out.size())

Network output size is torch.Size([15])


Next we will create the loss function and the optimizer. You do not have to modify the custom_part1_trainer in student_code if you use the same loss_function, optimizer, scheduler and parameters (n_epoch, batch_size etc.) as provided in this notebook to hit the required threshold of 50% accuracy. If you changed any of these values, it is important that you modify this function in student_code since your submission will be autograded (we are not going to open your notebook). 

In [21]:
# Set up the trainer. You can modify custom_part1_trainer in
# student_copy.py if you want to try different learning settings.
custom_part1_trainer = sc.custom_part1_trainer(model)

if custom_part1_trainer is None:
    # Create the loss function.
    # see http://pytorch.org/docs/0.3.0/nn.html#loss-functions for a list of available loss functions
    loss_function = nn.CrossEntropyLoss()

    # Create the optimizer and a learning rate scheduler.
    optimizer = optim.SGD(params=model.parameters(), lr=base_lr, weight_decay=weight_decay, momentum=momentum)
    # Currently a simple step scheduler, but you can get creative.
    # See http://pytorch.org/docs/0.3.0/optim.html#how-to-adjust-learning-rate for various LR schedulers
    # and how to use them
    lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=60, gamma=0.1)

    params = {'n_epochs': 100, 'batch_size': 50, 'experiment': 'part1'}
    
else:
    if 'loss_function' in custom_part1_trainer:
        loss_function = custom_part1_trainer['loss_function']
    if 'optimizer' in custom_part1_trainer:
        optimizer = custom_part1_trainer['optimizer']
    if 'lr_scheduler' in custom_part1_trainer:
        lr_scheduler = custom_part1_trainer['lr_scheduler']
    if 'params' in custom_part1_trainer:
        params = custom_part1_trainer['params']

We are ready to train our network! As before, we will start a local server to see the training progress of our network (if you server is already running, you should not start another one). Open a new terminal and activate the environment for this project. Then run the following command: ***"python -m visdom.server"***. This will start a local server. The terminal output should give out a link like: "http://localhost:8097". Open this link in your browser. After you run the following block, visit this link again, and you will be able to see graphs showing the progress of your training! If you do not see any graphs, select Part 1 on the top left bar where is says Environment (only select Part 1, do not check main or Part 2).

In [22]:
# Train the network!
trainer = Trainer(train_dataset, test_dataset, model, loss_function, optimizer, lr_scheduler, params)
best_prec1 = trainer.train_val()
print('Best top-1 Accuracy = {:4.3f}'.format(best_prec1))

#P1 - 44.054 (jittering)
#P2 - 49.414 (jittering + normalization)
#P3 - 60.335 (jittering + normalization + dropout)
#P4 - 54.908 (jittering + normalization + dropout + deeper model)
#P5 - 59.866 (jittering + normalization + dropout + deeper model + batch normalization)

---------------------------------------
Experiment: part1
val_freq: 1
checkpoint_file: None
do_val: True
num_workers: 4
batch_size: 50
n_epochs: 100
shuffle: True
print_freq: 100
resume_optim: True
experiment: part1
---------------------------------------
part1 Epoch 0 / 100
train part1: batch 0/29, loss 2.751, top-1 accuracy 8.000, top-5 accuracy 38.000
train part1: loss 2.614287
val part1: batch 0/59, loss 1.698, top-1 accuracy 58.000, top-5 accuracy 86.000
val part1: loss 2.481616
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 1 / 100
train part1: batch 0/29, loss 2.159, top-1 accuracy 36.000, top-5 accuracy 72.000
train part1: loss 2.302848
val part1: batch 0/59, loss 2.359, top-1 accuracy 16.000, top-5 accuracy 56.000
val part1: loss 2.140824
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 2 / 100
train part1: batch 0/29, loss 2.332, top-1 accuracy 28.000, top-5 accuracy 64.000
train part1: loss 2.144364
val part1: batch 0/59, loss 2.131, top-1 accuracy 24.000, top-

train part1: batch 0/29, loss 1.335, top-1 accuracy 48.000, top-5 accuracy 96.000
train part1: loss 1.457010
val part1: batch 0/59, loss 1.946, top-1 accuracy 20.000, top-5 accuracy 88.000
val part1: loss 1.481050
Checkpoint saved
part1 Epoch 31 / 100
train part1: batch 0/29, loss 1.438, top-1 accuracy 54.000, top-5 accuracy 90.000
train part1: loss 1.476745
val part1: batch 0/59, loss 1.598, top-1 accuracy 38.000, top-5 accuracy 92.000
val part1: loss 1.470819
Checkpoint saved
part1 Epoch 32 / 100
train part1: batch 0/29, loss 1.324, top-1 accuracy 60.000, top-5 accuracy 96.000
train part1: loss 1.488961
val part1: batch 0/59, loss 1.866, top-1 accuracy 22.000, top-5 accuracy 82.000
val part1: loss 1.415109
Checkpoint saved
part1 Epoch 33 / 100
train part1: batch 0/29, loss 1.355, top-1 accuracy 52.000, top-5 accuracy 92.000
train part1: loss 1.408670
val part1: batch 0/59, loss 2.281, top-1 accuracy 8.000, top-5 accuracy 72.000
val part1: loss 1.354949
Checkpoint saved
BEST TOP1 ACCU

train part1: loss 1.238643
val part1: batch 0/59, loss 1.746, top-1 accuracy 28.000, top-5 accuracy 94.000
val part1: loss 1.284885
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 63 / 100
train part1: batch 0/29, loss 1.295, top-1 accuracy 50.000, top-5 accuracy 96.000
train part1: loss 1.221270
val part1: batch 0/59, loss 1.804, top-1 accuracy 26.000, top-5 accuracy 88.000
val part1: loss 1.275647
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 64 / 100
train part1: batch 0/29, loss 1.368, top-1 accuracy 50.000, top-5 accuracy 94.000
train part1: loss 1.248082
val part1: batch 0/59, loss 1.797, top-1 accuracy 24.000, top-5 accuracy 88.000
val part1: loss 1.274571
Checkpoint saved
part1 Epoch 65 / 100
train part1: batch 0/29, loss 1.126, top-1 accuracy 62.000, top-5 accuracy 94.000
train part1: loss 1.239270
val part1: batch 0/59, loss 1.828, top-1 accuracy 24.000, top-5 accuracy 84.000
val part1: loss 1.272355
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 66 / 

train part1: loss 1.168942
val part1: batch 0/59, loss 1.661, top-1 accuracy 32.000, top-5 accuracy 90.000
val part1: loss 1.259962
Checkpoint saved
part1 Epoch 95 / 100
train part1: batch 0/29, loss 1.211, top-1 accuracy 66.000, top-5 accuracy 94.000
train part1: loss 1.178277
val part1: batch 0/59, loss 1.777, top-1 accuracy 24.000, top-5 accuracy 86.000
val part1: loss 1.265139
Checkpoint saved
part1 Epoch 96 / 100
train part1: batch 0/29, loss 1.075, top-1 accuracy 62.000, top-5 accuracy 100.000
train part1: loss 1.165763
val part1: batch 0/59, loss 1.758, top-1 accuracy 24.000, top-5 accuracy 88.000
val part1: loss 1.256746
Checkpoint saved
part1 Epoch 97 / 100
train part1: batch 0/29, loss 1.099, top-1 accuracy 66.000, top-5 accuracy 96.000
train part1: loss 1.174997
val part1: batch 0/59, loss 1.702, top-1 accuracy 30.000, top-5 accuracy 90.000
val part1: loss 1.257786
Checkpoint saved
part1 Epoch 98 / 100
train part1: batch 0/29, loss 1.271, top-1 accuracy 70.000, top-5 accurac

Make sure you get at least 50% accuracy in this section! If you tried different settings than the ones provided to get 50%, you should modify custom_part1_trainer in student code to return a dictionary with your changed settings. 

## Part 2. Fine-Tuning a Pre-Trained Network

In [23]:
# Fix random seeds so that results will be reproducible
set_seed(0, use_GPU)

Training a network from scratch takes a lof of time. Instead of training from scratch, we can take a pre-trained model, and fine tune it for our purposes. This is the goal of Part 2. You will train a pre-trained network, and achieve at least 80% accuracy. 

In [29]:
# training parameters
input_size = (224, 224)
RGB = True
base_lr = 1e-3
weight_decay = 5e-4
momentum = 0.9
backprop_depth = 3

In [30]:
# Create the training and testing datasets.
train_dataset, test_dataset = sc.create_datasets(data_path=data_path, input_size=input_size, rgb=RGB)
assert test_dataset.classes == train_dataset.classes

Computing pixel mean and stdev...
Batch 0 / 30
Batch 20 / 30
Done, mean = 
[0.45611589 0.45611589 0.45611589]
std = 
[0.24786406 0.24786406 0.24786406]
Computing pixel mean and stdev...
Batch 0 / 60
Batch 20 / 60
Batch 40 / 60
Done, mean = 
[0.45549639 0.45549639 0.45549639]
std = 
[0.24698076 0.24698076 0.24698076]


Following block loads a pretrained alexnet.

In [31]:
# Create the network model.
model = alexnet(pretrained=True)
print(model)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_feature

Now, you modify call create_part2_model from student code to change alexnet in order to fine tune it. As you can see at https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py, and in the model printout above, AlexNet has 2 parts: 'features', which constists of conv layers that extract feature maps from the image, and 'classifier' which consists of FC layers that classify the features. We want to replace the last Linear layer in model.classifier. 

In [32]:
model = sc.create_part2_model(model, num_classes)
if use_GPU:
    model = model.cuda()
print(model)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_feature

Next we will create the loss function and the optimizer. Just as with part 1, if you modify any of the setttings to hit the required accuracy, you must modify custom_part2_trainer function to return a dictionary containing your changes. 

In [36]:
# Set up the trainer. You can modify custom_part2_trainer in
# student_copy.py if you want to try different learning settings.
custom_part2_trainer = sc.custom_part2_trainer(model)

if custom_part2_trainer is None:
    # Create the loss function
    # see http://pytorch.org/docs/0.3.0/nn.html#loss-functions for a list of available loss functions
    loss_function = nn.CrossEntropyLoss()

    # Since we do not want to optimize the whole network, we must extract a list of parameters of interest that will be
    # optimized by the optimizer.
    params_to_optimize = []

    # List of modules in the network
    mods = list(model.features.children()) + list(model.classifier.children())

    # Extract parameters from the last `backprop_depth` modules in the network and collect them in
    # the params_to_optimize list.
    for m in mods[::-1][:backprop_depth]:
        params_to_optimize.extend(list(m.parameters()))

    # Construct the optimizer    
    optimizer = optim.SGD(params=params_to_optimize, lr=base_lr, weight_decay=weight_decay, momentum=momentum)

    # Create a scheduler, currently a simple step scheduler, but you can get creative.
    # See http://pytorch.org/docs/0.3.0/optim.html#how-to-adjust-learning-rate for various LR schedulers
    # and how to use them
    lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
    
    params = {'n_epochs': 4, 'batch_size': 10, 'experiment': 'part2'} 
    
else:
    if 'loss_function' in custom_part2_trainer:
        loss_function = custom_part2_trainer['loss_function']
    if 'optimizer' in custom_part2_trainer:
        optimizer = custom_part2_trainer['optimizer']
    if 'lr_scheduler' in custom_part2_trainer:
        lr_scheduler = custom_part2_trainer['lr_scheduler']
    if 'params' in custom_part2_trainer:
        params = custom_part2_trainer['params']

We are ready to fine tune our network! Just like before, we will start a local server to see the training progress of our network. Open a new terminal and activate the environment for this project. Then run the following command: ***"python -m visdom.server"***. This will start a local server. The terminal output should give out a link like: "http://localhost:8097". Open this link in your browser. After you run the following block, visit this link again, and you will be able to see graphs showing the progress of your training! If you do not see any graphs, select Part 2 on the top left bar where is says Environment (only select Part 2, do not check main or Part 1).

In [37]:
# Train the network!
trainer = Trainer(train_dataset, test_dataset, model, loss_function, optimizer, lr_scheduler, params)
best_prec1 = trainer.train_val()
print('Best top-1 Accuracy = {:4.3f}'.format(best_prec1))

---------------------------------------
Experiment: part2
batch_size: 10
do_val: True
val_freq: 1
shuffle: True
num_workers: 4
n_epochs: 4
experiment: part2
checkpoint_file: None
print_freq: 100
resume_optim: True
---------------------------------------
part2 Epoch 0 / 4
train part2: batch 0/149, loss 3.059, top-1 accuracy 0.000, top-5 accuracy 30.000
train part2: batch 100/149, loss 0.473, top-1 accuracy 70.000, top-5 accuracy 100.000
train part2: loss 0.873899
val part2: batch 0/298, loss 0.450, top-1 accuracy 90.000, top-5 accuracy 100.000
val part2: batch 100/298, loss 0.426, top-1 accuracy 90.000, top-5 accuracy 100.000
val part2: batch 200/298, loss 0.788, top-1 accuracy 70.000, top-5 accuracy 100.000
val part2: loss 0.477183
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part2 Epoch 1 / 4
train part2: batch 0/149, loss 0.648, top-1 accuracy 70.000, top-5 accuracy 100.000
train part2: batch 100/149, loss 0.430, top-1 accuracy 70.000, top-5 accuracy 100.000
train part2: loss 0.373888


Expect this code to take around 10 minutes if not using a GPU (which may run in around 30 seconds). You should hit 80% accuracy. 