# Fine Tuning an Image Classifier in Pytorch

In this practical, we will fine-tune a pretrained ResNet-18 model to work with a custom image dataset. The task in this case is to train the model to classify photos depending on which of 10 different global cities they were taken in. 

Run the cell below to import the necessary dependencies.

In [6]:
import torch
from dataset import CitiesDataset
from torchvision.models import resnet50
from torchvision.models import ResNet50_Weights
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch
from torch.utils.tensorboard import SummaryWriter
import numpy as np
from torch.utils.data import random_split
from torchvision import transforms
from torch.optim import lr_scheduler
%load_ext tensorboard

In the cell below, we will define our network. We create a class that inherits from `torch.nn.Module`, and call the `super().__init__()` method to inherit the methods from the parent class. We have also loaded the `ResNet50` model using the pretrained weights.

- Add code to set the `grad_required` argument to False for all the `ResNet` layers.
- Define a set of three linear layers for the model, assigning them to the variable `linear_layers`. 
- The output size of the last layer of `ResNet` is 2048.
- The middle layer should have input size 256 and output size of 128.
- The output layer should have an output size equivalent to the number of classes in the dataset.

- Define the forward method.


In [7]:
class TransferLearning(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = resnet50(weights=ResNet50_Weights)
        # TODO - set grad_required = False for all layers of the ResNet50.
        
        # TODO define a three layer network, complete with activation functions between the layers, asssign to a variable called "linear layers"
        linear_layers = torch.nn.Sequential(
            torch.nn.Linear(2048, 256),
            torch.nn.ReLU(),
            torch.nn.Linear(256, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 10),
        )
        self.layers.fc = linear_layers
       

    def forward(self, x):
        # define the forward method
        pass


Next we define a transform, and split the data between train, test and validation sets.

- Split the data into train, validation and test sets. The training set should be 70% of dataset length, with the validation and test sets 15% each.
- Create a dataloader for each split. Set the batch size to 32, and make sure to set shuffle=True for the training set loader.

In [None]:
size = 128
transform = transforms.Compose([
    transforms.Resize(size),
    transforms.RandomCrop((size, size), pad_if_needed=True),
    transforms.ToTensor(),
    ])

dataset = CitiesDataset(transform=transform)
model = TransferLearning()

# TODO - Split the dataset into train, validation and test sets. The train set should be 70% of dataset length, and the validation and test sets 15% each.
# TODO - Create dataloaders for train, validation and test sets. Set batch size to 32, and make sure shuffle=True in the train loader.




Prior to training, it will be interesting to see how the classifier performs straight out of the box. In the box below, pass a single example from the test set to the model, with the model in evaluation mode. Get the prediction and compare it to the real label. How did the model do?

In [20]:
features,label=test_set[1]
features=features.unsqueeze(0)

# TODO - Set model to evaluation mode.
# TODO - Pass the features to the model, and assign to a variable called 'outputs'.
# TODO - Get a prediction from the output logits, and assign to a variable called 'prediciton'

print("Prediction label: ", prediction.item())
class_label = dataset.idx_to_city_name[prediction.item()]
print("Prediction category: ", class_label)

print("target label:", label )
target_label=dataset.idx_to_city_name[label]
print( "target city ", target_label )

Prediction label:  8
Prediction category:  Sydney, Australia
target label: 0
target city  Beijing, China


Define the train loop. Some of this has been filled in for you already, but manually fill in the sections corresponding to the `TODO` instructions. This training loop also utilises a learning rate scheduler, which steps the learning rate after a given number of epochs. You don't need to worry about this, but it's good to understand why this can be useful in terms of gradient descent. 

In [21]:
def train(
    model,
    train_loader,
    val_loader,
    test_loader,
    lr=0.1,
    epochs=15,
    optimiser=torch.optim.SGD
):
 
    writer = SummaryWriter()
    # initialise an optimiser
    optimiser = optimiser(model.parameters(), lr=lr, weight_decay=0.001)
    scheduler = lr_scheduler.MultiStepLR(optimiser, milestones=[5,15], gamma=0.1,verbose=True) # learning rate scheduler drops the LR after n epochs, given by "milestones"
    batch_idx = 0
    epoch_idx= 0
    for epoch in range(epochs):  # for each epoch
        # 
        
        print('Epoch:', epoch_idx,'LR:', scheduler.get_lr())
        epoch_idx +=1
        
        for batch in train_loader:  # for each batch in the dataloader
            features, labels = batch
            # TODO - Make a prediction by passing the features to the model
            # TODO - Calculate the loss 
            # TODO - Calculate the gradient of the loss with respect to each model parameter
            # TODO - Use the optimiser to update the model parameters using those gradients
            print("Epoch:", epoch, "Batch:", batch_idx,
                  "Loss:", loss.item())  # log the loss

            # TODO - Zero the grad
            writer.add_scalar("Loss/Train", loss.item(), batch_idx)
            batch_idx += 1
                   
        scheduler.step() # step the learning rate scheduler
        print('Evaluating on valiudation set')
        val_loss, val_acc = evaluate(model, val_loader)
        writer.add_scalar("Loss/Val", val_loss, batch_idx)
        writer.add_scalar("Accuracy/Val", val_acc, batch_idx)
    
    
    print('Evaluating on test set')
    test_loss = evaluate(model, test_loader)
    model.test_loss = test_loss
    
    return model   # return trained model
    

def evaluate(model, dataloader):
    losses = []
    correct = 0
    n_examples = 0
    for batch in dataloader:
        features, labels = batch
        prediction = model(features)
        loss = F.cross_entropy(prediction, labels)
        losses.append(loss.detach())
        correct += torch.sum(torch.argmax(prediction, dim=1) == labels)
        n_examples += len(labels)
    avg_loss = np.mean(losses)
    accuracy = correct / n_examples
    print("Loss:", avg_loss, "Accuracy:", accuracy.detach().numpy())
    return avg_loss, accuracy







 

Run the code block below to train the model, and leave it running until it completes.

In [None]:



trained_model=train(
                model,
                train_loader,
                val_loader,
                test_loader,
                epochs=15,
                lr=0.0001,
                optimiser=torch.optim.AdamW
                )


Now, let's view training performance in `tensorboard`. Run the cell below to open a `tensorboard` instance, then select `time series` from the dropdown, and press refresh to view the training loss.

In [None]:
%tensorboard --logdir runs

Finally, let's re-run our prediction code to see how the trained model performs.

In [None]:
features,label=test_set[1]
features=features.unsqueeze(0)
model.eval()
outputs=model(features)

dummy, pred = torch.max(outputs, 1)
print("Prediction label: ", pred.item())
class_label = dataset.idx_to_city_name[pred.item()]
print("Prediction category: ", class_label)

print("target label:", label )
target_label=dataset.idx_to_city_name[label]
print( "target city ", target_label )