# Transfer learning for classification

## What is transfer learning?

Transfer learning is the concept of applying knowledge to a new task, after it was gained from learning to accomplish a different task. That is, how do we transfer what we know to a new task. 
In this notebook, we will get a pre-trained image classification model, and use transfer learning to apply what it already knows to a new task: distinguishing between images of dogs and hotdogs, which we have collected in our own dataset.

## Data processing

Imagine you were the one that created the folder called ```data``` which contains folders of images for each class that we want to classify. 
Now we need to turn that into a PyTorch dataset, which consists of examples that can be processed by PyTorch models.

Firstly, let's create a CSV which maps file names to their corresponding integer labels.


In [11]:
import os
import pandas as pd

def create_csv(root='./data/', out_name='labels.csv'):
    """Creates a CSV file where each row contains an image file path and it's corresponding integer class label"""
    subfolders = [f.path for f in os.scandir(root) if f.is_dir()] # get the path of the subfolders in the data root (each of which contains images for certain class)
    df = pd.DataFrame(columns=['file_path', 'label']) # create empty dataframe with file_path and label columns
    for i, path in enumerate(subfolders):
        files = [f.path for f in os.scandir(path) if f.is_file()]
        for f in files:
            df = df.append({'file_path':f, 'label':i}, ignore_index=True) #add each image as a row to the dataframe
    df.to_csv(root+out_name, index=False) #save the dataframe to a csv file

In [12]:
create_csv()

Now let's create our PyTorch dataset. As with all PyTorch datasets, it should inherit from PyTorch's Dataset class and implement a ```__getitem__``` method which defines how indexing the dataset with square brackets works, and a ```__len__``` method which defines how the dataset length should be calculated.

In [13]:
import numpy as np
from PIL import Image
import torch

class ClassificationDataset():
    def __init__(self, csv='./data/labels.csv', transform=None):
        self.csv = pd.read_csv(csv) #read the data csv
        self.transform = transform #save the transform variable as part of the class object

    def __len__(self):
        return len(self.csv)

    def __getitem__(self, idx):
        filepath, label = self.csv['file_path'][idx], self.csv['label'][idx] #get the image filepath and label from at that index from the csv
        img = Image.open(filepath).convert("RGB") #open with PIL and convert to rgb
        if self.transform:
            img, label = self.transform((img, label)) #apply transforms
        return img, label

class SquareCrop():
    """Adjust aspect ratio of image to make it square and crop it to given size"""
    def __init__(self, output_size):
        assert isinstance(output_size, (int)) # assert output_size is integer
        self.output_size = output_size

    def __call__(self, sample):        
        image, label = sample
        h, w = image.size
        if h > w:
            new_w = self.output_size
            scale = new_w/w
            new_h = scale*h
        elif w > h:
            new_h = self.output_size
            scale = new_h/h
            new_w = scale*w
        else:
            new_h, new_w = self.output_size, self.output_size
        new_h, new_w = int(new_h), int(new_w) # account for non-integer computed dimensions (rounds to nearest int)
        image = image.resize((new_h, new_w))
        crop_start_w = np.random.randint((new_w - self.output_size)+1)
        crop_start_h = np.random.randint((new_h - self.output_size)+1)
        image = image.crop((crop_start_h, crop_start_w, crop_start_h + self.output_size, crop_start_w + self.output_size))
        return image, label

class ImageToTensor():
    def __init__(self):
        pass
    
    def __call__(self, sample):
        image, label = sample
        image = np.array(image)/255 #convert to numpy array and normalise between 0-1
        image = image.transpose((2, 0, 1)) #swap channel dimension
        return torch.Tensor(image), label

In [14]:
from torchvision import transforms
from torch.utils.data import DataLoader

create_csv()

classnames = [f.name for f in os.scandir('./data/') if f.is_dir()] #get the class names from the folders
classname_to_id = dict(zip(classnames, range(len(classnames)))) #create the mapping from classname to class id
id_to_classname = dict(zip(classname_to_id.values(), classname_to_id.keys())) # create the reverse mapping from class id to classname
n_classes = len(classnames)
print(id_to_classname)

img_crop_size = 224
train_split = 0.8 # percentage that will be training set
val_split = 0.1 # percentage that will be validation set
batch_size = 16

mytransforms = []
mytransforms.append(SquareCrop(img_crop_size)) #add square crop transform
mytransforms.append(ImageToTensor()) #add to tensor transform
mytransforms = transforms.Compose(mytransforms)

mydataset = ClassificationDataset(csv='./data/labels.csv', transform=mytransforms)

data_size=len(mydataset)
train_size = int(train_split * data_size)
val_size = int(val_split * data_size)
test_size = data_size - (val_size + train_size)
train_data, val_data, test_data = torch.utils.data.random_split(mydataset, [train_size, val_size, test_size])

train_samples = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_samples = DataLoader(val_data, batch_size=batch_size)
test_samples = DataLoader(test_data, batch_size=batch_size)

{0: 'dog', 1: 'hotdog'}


## Loading in pretrained model

Now let's make our classifier model. 
This is where the transfer learning takes place.
For the first part of our model, we pass our data through a pre-trained model. 
For our pre-trained model, we will use [VGG11](https://pytorch.org/hub/pytorch_vision_vgg/).

Here's a diagram of the VGG11 architecture.

![](images/vgg11.png)

We don't pass our data through the whole model though. 
This is because VGG is trained to output a probability distribution over more than 20K classes. 
It's not those class confidences that will be helpful to us, but the features that were combined in the last few layers of the VGG11 neural network to compute them.
As such, we can use the pre-trained model as a **feature extractor** by passing our data not all of the way through model, but through most of it, such that what it outputs is a set of activation maps, which represent where and by how much complex features of the image are present.
These high-level features are the ones which VGG11 learnt when being trained to perform it's original task.
Our hope is that these same features will be useful for our new task.
This here is the transfer learning.

To relate that to the image shown below, what we will do is take the first convolutional layers (shown as grids of circles rather than vectors) which contain the activation maps for high level features, and discard the fully connected layers on the end and replace them with our own.

![](images/features.png)

Once we get the feature maps of the input from VGG11, we will flatten them and stack a few extra linear layers on the end. These layers will culminate in a single node which represents the confidence of an image being a hotdog, as opposed to a dog.

Our input data needs to be transformed to have the input size which VGG11 expects. And our additional layers will have to expect inputs with the same shape as the output from the VGG11 feature extractor. 



## Fine tuning our model

Obviously, our extra linear layers will have initially random weights, so we'll need to train them.
But we know that the VGG model weights are probably pretty close to what we want, so we don't want to adjust them (too much).

### Method 1: Weight freezing

We will **freeze** the VGG weights so that they don't change, whilst we train our final layer weights in the usual way.
We do this by setting the ```requires_grad``` attribute of the VGG weights to false.
This means that when we backpropagate our error, it's rate of change with respect to these weights is not computed, and they are not updated when we call ```loss.backward()```.
By only adjusting the extra layers which we added, we effectively **fine-tune** VGG11 to our task, whilst taking advantage of what it has already learnt.

### Method 2: Discriminative learning rates

Alternatively to freezing the VGG model's weights, we can use a smaller learning rate so that they are still updated, but not as much as our fine-tuning layers.

### Which should I use?

Freezing the weights by setting their ```requires_grad``` attribute to true will remove them from the computational graph and will increase the speed at which you can make updates by preventing you having to compute their gradients, compute the optimisation step and then update their values. However, if your pretrained model was trained on a significantly different dataset, you may need to update its weights to fine-tune it to your specific problem to achieve the desired level of performance. In this case, using discriminitive learning rates may be a better approach.

Let's get our pretrained model from torchvision (PyTorch's computer vision library) and build our model which uses transfer learning.

In [15]:
from torchvision import models
import torch.nn.functional as F

class VGGClassifier(torch.nn.Module):
    def __init__(self, out_size):
        super().__init__()
        self.features = models.vgg11(pretrained=True).features # get the convolutional layers of vgg11. output size is 512x7x7
        self.regressor = torch.nn.Sequential(
            torch.nn.Linear(512*7*7, 4096), # first arg is the size of the flattened output from VGG11
            torch.nn.ReLU(),
            torch.nn.Dropout(),
            torch.nn.Linear(4096, 1024),
            torch.nn.ReLU(),
            torch.nn.Linear(1024, out_size),
            torch.nn.Softmax(dim=1)
        )

    def forward(self, x):
        x = self.features(x)
        x = F.relu(x)
        x = x.reshape(-1, 512*7*7)
        x = self.regressor(x)
        return x

    # def freeze(self):
    #     for param in self.features.parameters():
    #         param.requires_grad=False

    # def unfreeze(self):
    #     for param in self.features.parameters():
    #         param.requires_grad=True

## Training using discriminative learning rates

In [16]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

lr = [3e-4, 3e-5] # discriminative learning rate. #lr[0] is main lr #lr[1] is lr of early layers
weight_decay = 0#1e-4
train_split = 0.8
val_split = 0.9

mymodel = VGGClassifier(out_size=n_classes).to(device)

optimizer = torch.optim.Adam([
    {
        'params': mymodel.regressor.parameters(),
        'lr': lr[0]
    },
    {
        'params': mymodel.features.parameters(), 
        'lr': lr[1]
    }
],
# lr=lr[0],
weight_decay=weight_decay
)

Now let's write the training loop

In [17]:
import matplotlib.pyplot as plt

def train(epochs):
    plt.close()
    mymodel.train()
    
    bcosts = []
    ecosts = []
    valcosts = []
    plt.ion()
    fig = plt.figure(figsize=(10, 5))
    ax = fig.add_subplot(121)
    ax2 = fig.add_subplot(122)
    
    plt.show()
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Cost')

    ax2.axis('off')
    img_label_text = ax2.text(0, -5, '', fontsize=15)
    
    for e in range(epochs):
        ecost=0
        valcost=0
        for i, (x, y) in enumerate(train_samples):
            x, y = x.to(device), y.to(device)

            h = mymodel(x) #calculate hypothesis
            cost = F.cross_entropy(h, y, reduction='sum') #calculate cost
            
            optimizer.zero_grad() #zero gradients
            cost.backward() # calculate derivatives of values of filters
            optimizer.step() #update parameters

            bcosts.append(cost.item()/batch_size)
            
            y_ind=0
            im = np.array(x[y_ind]).transpose(1, 2, 0)
            predicted_class = id_to_classname[h.max(1)[1][y_ind].item()]
            ax2.imshow(im)
            img_label_text.set_text('Predicted class: '+ predicted_class)
            
            fig.canvas.draw()
            ecost += cost.item()
            print(f'example {i}\tLoss: {cost.item()}')
        for i, (x, y) in enumerate(val_samples):
            x, y = x.to(device), y.to(device)
            h = mymodel.forward(x) #calculate hypothesis
            cost = F.cross_entropy(h, y, reduction='sum') #calculate cost
            y_ind=0
            im = np.array(x[y_ind]).transpose(1, 2, 0)
            predicted_class = id_to_classname[h.max(1)[1][y_ind].item()]
            ax2.imshow(im)
            img_label_text.set_text('Predicted class: '+ predicted_class)
            fig.canvas.draw()
            valcost += cost.item()
            print(f'val example {i}\tLoss: {valcost}')
        ecost /= train_size
        valcost /= val_size
        ecosts.append(ecost)
        valcosts.append(valcost)
        ax.plot(ecosts, 'b', label='Train cost')
        ax.plot(valcosts, 'r', label='Validation cost')
        if e==0: ax.legend()
        fig.canvas.draw()

        print('Epoch', e, '\tCost', ecost)

In [18]:
%matplotlib notebook
mymodel.freeze()
train(2)
#mymodel.unfreeze()
#train(5)

<IPython.core.display.Javascript object>

example 0	Loss: 11.45760726928711
example 1	Loss: 8.036450386047363
example 2	Loss: 6.07204532623291
example 3	Loss: 5.245741367340088
example 4	Loss: 5.344182014465332
example 5	Loss: 5.027478218078613
example 6	Loss: 5.017707347869873
example 7	Loss: 5.942976951599121
example 8	Loss: 5.01218843460083
example 9	Loss: 5.01219367980957


AttributeError: 'float' object has no attribute 'item'

## Testing our model

In [19]:
def test():
    print('Started evaluation...')
    mymodel.eval() #put model into evaluation mode    
    #calculate the accuracy of our model over the whole test set in batches
    correct = 0
    for x, y in test_samples:
        x, y = x.to(device), y.to(device)
        h = mymodel.forward(x)
        pred = h.data.max(1)[1]
        correct += pred.eq(y).sum().item()
    return round(correct/len(test_data), 4)

In [20]:
acc = test()
print('Test accuracy: ', acc)

Started evaluation...
Test accuracy:  0.95
