Attempting to apply transfer learning to train a VGG-16 

Importing all the modules needed

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
from torch.optim import Adam
import matplotlib.pyplot as plt

Load in the VGG-16 network

In [4]:
vgg16 = models.vgg16(pretrained=True)

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/checkpoints/vgg16-397923af.pth


HBox(children=(FloatProgress(value=0.0, max=553433881.0), HTML(value='')))




Look at the features and the classifier of the VGG-16 network

In [5]:
print(vgg16.features)
print(vgg16.classifier)

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace=True)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace=True)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace=True)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace=True)
  (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (17): Conv2d(256, 512, kernel_si

Freeze the network: Grab all the parameters, and set the requires grad to False.

In [0]:
for param in vgg16.parameters():
  param.requires_grad = False

Remove the last fully connected layer, and treat the rest of the network as a fixed feature eztractor.
We then add a linear classifier, like logsoftmax to train on

In [0]:
vgg16.classifier[-1] = nn.Sequential(
    nn.Linear(in_features=4096, out_features=2),
    nn.LogSoftmax(dim=1)
)

Now look again at the classifer to see what has changed

In [9]:
print(vgg16.classifier)

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Sequential(
    (0): Linear(in_features=4096, out_features=2, bias=True)
    (1): LogSoftmax()
  )
)


Set the criterion

In [0]:
criterion = nn.Sigmoid

The model doesn't now the difference between urban water and land.
VGG-16 model can be retrained to be able to determine the difference if its given lots of examples.
The loss function takes in what the model predicts, and its label, then computes a value for how far away the prediction is from the actual value. 

The task is a Binary Classification task, therefore the best loss function is the binary crossentropy. (And sigmoid should be the Loss function of the last year.)

To minimise the criterion we have to adjust the parameters of the network. 
We have to adjust the weights in the network. 

All of the weights in the network have been frozen except for the last fully connected layer - The goal is to train these weights. 

This is achieved by using gradient descent, in PyTorch this is achieved using Autograd. 

The loss functoin is differentiable with respect to the loss.

In the forward pass we calculate the loss, then in the backward pass we calculate the gradient of the loss with respect to each of the weights. 

note
Set .requires_grad to True  - Essentially the same as unfreezing the network
call .backward() - Automatically computes all the gradients. 

This only has to be done during the training phase of the model.

Once trained, we turn autograd off. 


In [12]:
print(" ")

 


Using Autograd.

Autograd is used to automatically calculate the gradients of tensors 
- To make sure the network keeps track of the gradients, we need to set 
.requires_grad = True


In [15]:
w = torch.randn(4, 3, requires_grad=True)
w

tensor([[-0.1350, -1.1881, -0.0651],
        [-0.8247, -1.3694, -0.2737],
        [ 1.9208,  2.2549,  0.3004],
        [-0.5083, -0.1567, -0.0266]], requires_grad=True)

freeze

In [16]:
w.requires_grad_(False)
w

tensor([[-0.1350, -1.1881, -0.0651],
        [-0.8247, -1.3694, -0.2737],
        [ 1.9208,  2.2549,  0.3004],
        [-0.5083, -0.1567, -0.0266]])

Unfreeze

In [17]:
w.requires_grad_(True)
w

tensor([[-0.1350, -1.1881, -0.0651],
        [-0.8247, -1.3694, -0.2737],
        [ 1.9208,  2.2549,  0.3004],
        [-0.5083, -0.1567, -0.0266]], requires_grad=True)

In [18]:
y = torch.exp(w)
print(y)

tensor([[0.8737, 0.3048, 0.9370],
        [0.4384, 0.2543, 0.7606],
        [6.8262, 9.5346, 1.3504],
        [0.6015, 0.8549, 0.9737]], grad_fn=<ExpBackward>)


Define an operation to get a scalar value

In [19]:
output = y.mean()
print(output)

tensor(1.9758, grad_fn=<MeanBackward0>)


We can see that the gradients are currently empty.
It returns none, because we haven't called the backward function yet. 

In [20]:
print(w.grad)

None


So we do this. 

In [23]:
output.backward()
print(w.grad)

RuntimeError: ignored