# Fine Tuning an Image Classifier in Pytorch

In this practical, we will fine-tune a pretrained ResNet-18 model to work with a custom image dataset. The task in this case is to train the model to classify photos depending on which of 10 different global cities they were taken in. 

In [None]:
#@title # Run the following cell to download the necessary files for this practical. { display-mode: "form" } 
#@markdown Don't worry about what's in this collapsed cell

print('Downloading dataset.py...')
!wget https://s3-eu-west-1.amazonaws.com/aicore-portal-public-dev-524288083424/practicals_files/41982379-5961-4188-91d7-22fcb7f1c6ef/dataset.py -q
import dataset
print('Downloading images.zip...')
!wget https://s3-eu-west-1.amazonaws.com/aicore-portal-public-dev-524288083424/practicals_files/41982379-5961-4188-91d7-22fcb7f1c6ef/images.zip -q
!unzip images.zip > /dev/null
!rm images.zip


Run the cell below to import the necessary dependencies.

In [6]:
import torch
from dataset import CitiesDataset
from torchvision.models import resnet50
from torchvision.models import ResNet50_Weights
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch
from torch.utils.tensorboard import SummaryWriter
import numpy as np
from torch.utils.data import random_split
from torchvision import transforms
from torch.optim import lr_scheduler
%load_ext tensorboard

In the cell below, we will define our network. We create a class that inherits from `torch.nn.Module`, and call the `super().__init__()` method to inherit the methods from the parent class. We have also loaded the `ResNet50` model using the pretrained weights.

- Add code to set the `grad_required` argument to False for all the `ResNet` layers.
- Define a set of three linear layers for the model, assigning them to the variable `linear_layers`. 
- The output size of the last layer of `ResNet` is 2048.
- The middle layer should have input size 256 and output size of 128.
- The output layer should have an output size equivalent to the number of classes in the dataset.

- Define the forward method.


In [7]:
class TransferLearning(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = resnet50(weights=ResNet50_Weights)
        for param in self.layers.parameters():
            param.grad_required = False
        linear_layers = torch.nn.Sequential(
            torch.nn.Linear(2048, 256),
            torch.nn.ReLU(),
            torch.nn.Linear(256, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 10),
        )
        self.layers.fc = linear_layers
        # print(self.layers)

    def forward(self, x):
        return self.layers(x)


In [9]:
size = 128
transform = transforms.Compose([
    transforms.Resize(size),
    transforms.RandomCrop((size, size), pad_if_needed=True),
    transforms.ToTensor(),
    ])

dataset = CitiesDataset(transform=transform)
model = TransferLearning()

# TODO - Split the dataset into train, validation and test sets. The train set should be 70% of dataset length, and the validation and test sets 15% each.
# TODO - Create dataloaders for train, validation and test sets. Set batch size to 32, and make sure shuffle=True in the train loader.

train_set_len = round(0.7*len(dataset))
val_set_len = round(0.15*len(dataset))
test_set_len = len(dataset) - val_set_len - train_set_len
split_lengths = [train_set_len, val_set_len, test_set_len]
# split the data to get validation and test sets
train_set, val_set, test_set = random_split(dataset, split_lengths)

batch_size = 32
train_loader = DataLoader(train_set, shuffle=True, batch_size=batch_size)
val_loader = DataLoader(val_set, batch_size=batch_size)
test_loader = DataLoader(test_set, batch_size=batch_size)





Prior to training, it will be interesting to see how the classifier performs straight out of the box. In the box below, pass a single example from the test set to the model, with the model in evaluation mode. Get the prediction and compare it to the real label. How did the model do?

In [20]:
features,label=test_set[1]
features=features.unsqueeze(0)
model.eval()
outputs=model(features)

dummy, pred = torch.max(outputs, 1)
print("Prediction label: ", pred.item())
class_label = dataset.idx_to_city_name[pred.item()]
print("Prediction category: ", class_label)

print("target label:", label )
target_label=dataset.idx_to_city_name[label]
print( "target city ", target_label )

Prediction label:  8
Prediction category:  Sydney, Australia
target label: 0
target city  Beijing, China


Define the train loop

In [21]:
def train(
    model,
    train_loader,
    val_loader,
    test_loader,
    lr=0.1,
    epochs=20,
    optimiser=torch.optim.SGD
):
 
    writer = SummaryWriter()
    # initialise an optimiser
    optimiser = optimiser(model.parameters(), lr=lr, weight_decay=0.001)
    scheduler = lr_scheduler.MultiStepLR(optimiser, milestones=[5,15], gamma=0.1,verbose=True)
    batch_idx = 0
    epoch_idx= 0
    for epoch in range(epochs):  # for each epoch
        # 
        
        print('Epoch:', epoch_idx,'LR:', scheduler.get_lr())
        epoch_idx +=1
        
        for batch in train_loader:  # for each batch in the dataloader
            features, labels = batch
            prediction = model(features)  # make a prediction
            # compare the prediction to the label to calculate the loss (how bad is the model)
            loss = F.cross_entropy(prediction, labels)
            loss.backward()  # calculate the gradient of the loss with respect to each model parameter
            optimiser.step()  # use the optimiser to update the model parameters using those gradients
            print("Epoch:", epoch, "Batch:", batch_idx,
                  "Loss:", loss.item())  # log the loss
            optimiser.zero_grad()  # zero grad
            writer.add_scalar("Loss/Train", loss.item(), batch_idx)
            batch_idx += 1
            if batch_idx % 25 == 0:
                print('Evaluating on valiudation set')
                # evaluate the validation set performance
                val_loss, val_acc = evaluate(model, val_loader)
                writer.add_scalar("Loss/Val", val_loss, batch_idx)
                writer.add_scalar("Accuracy/Val", val_acc, batch_idx)

        scheduler.step()
    # evaluate the final test set performance
    
    print('Evaluating on test set')
    test_loss = evaluate(model, test_loader)
    # writer.add_scalar("Loss/Test", test_loss, batch_idx)
    model.test_loss = test_loss
    
    return model   # return trained model
    

def evaluate(model, dataloader):
    losses = []
    correct = 0
    n_examples = 0
    for batch in dataloader:
        features, labels = batch
        prediction = model(features)
        loss = F.cross_entropy(prediction, labels)
        losses.append(loss.detach())
        correct += torch.sum(torch.argmax(prediction, dim=1) == labels)
        n_examples += len(labels)
    avg_loss = np.mean(losses)
    accuracy = correct / n_examples
    print("Loss:", avg_loss, "Accuracy:", accuracy.detach().numpy())
    return avg_loss, accuracy







 

In [None]:



trained_model=train(
                model,
                train_loader,
                val_loader,
                test_loader,
                epochs=5,
                lr=0.001,
                optimiser=torch.optim.AdamW
                )


Now, let's view training performance in `tensorboard`. Run the cell below to open a `tensorboard` instance, then select `time series` from the dropdown, and press refresh to view the training loss.

In [None]:
%tensorboard --logdir runs

Finally, let's re-run our prediction code to see how the trained model performs.

In [None]:
features,label=test_set[1]
features=features.unsqueeze(0)
model.eval()
outputs=model(features)

dummy, pred = torch.max(outputs, 1)
print("Prediction label: ", pred.item())
class_label = dataset.idx_to_city_name[pred.item()]
print("Prediction category: ", class_label)

print("target label:", label )
target_label=dataset.idx_to_city_name[label]
print( "target city ", target_label )