# Fine-tuning AlexNet

In this lab session we are going to fetch a **pre-trained** version of the [AlexNet](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) architecture, and **fine-tune** it for the task of **object recognition**. In particular, the network has been pre-trained on the [ILSVRC-2012](https://image-net.org/challenges/LSVRC/2012/) dataset which contains more than **1 million** images portraying object belonging to 1000 classes. The code for AlexNet is publicly available [here](https://pytorch.org/hub/pytorch_vision_alexnet/).

![Alexnet architecture](https://www.oreilly.com/library/view/tensorflow-for-deep/9781491980446/assets/tfdl_0106.png)

## The Office-Home dataset
This dataset contains images belonging to 4 different **domains**, each containing 65 categories. In this particular instance, we are going to use the *Real World* domain. 

![Office-Home](http://hemanthdv.github.io/profile/images/DataCollage.jpg)

## Fine-tuning pipeline
In order to fine-tune our model on this dataset, we will be going through the following steps:
1. **discard** the output layer of AlexNet, which is originally structured for 1000 classes
2. randomly **initialize** the parameters of a new output layer with output dimensionality equal to the number of classes of our dataset (i.e. 65) using `torch.nn.Linear`. We will keep all other layers untouched
3. **train** the network, using a **low** learning rate for the pretrained layers and a **higher** one for the newly defined one

## Mount your Google Drive folder on Colab
We will store data in our Google Drive account, and copy them to the Colab local drive for our experiment

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

## Download the dataset
We will now **download** the Office-Home dataset. Please place [this tar file](https://drive.google.com/file/d/1OnhVN2T5sB_Jo2jCbt-IeM5tXYxAlGDF/view?usp=sharing) in your UniTN Google Drive storage. We will then **copy** it to the local Colab drive to **speed up data loading**. First, let's create a directory in our current path; then, we will copy to the local drive and check the content.

In [None]:
!mkdir dataset
!cp "gdrive/My Drive/datasets/OfficeHomeDataset.tar" dataset/
!ls dataset

Let us now unzip the tar file

In [None]:
!tar -xf dataset/OfficeHomeDataset.tar -C dataset
!ls dataset

We can now get started with the actualy coding. As usual let's start by importing the necessary libraries

In [None]:
import torch
import torchvision
import torch.nn.functional as F
import torchvision.transforms as T
from torch.utils.tensorboard import SummaryWriter

## Loading data from the file system
We know that PyTorch provides built-in dataset utility for many existing benchmark, but it might often be the case that we still need to access data stored in our local file system. When it comes to images, the toolkit provides a generic `Dataset` class with the [`torchvision.dataset.ImageFolder`](https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageFolder.html#torchvision.datasets.ImageFolder) module. This class spares us from writing our own custom `Dataset`, but it is important to keep in mind that it assumes the images to be stored in the following fashion:

        |
        |--- Alarm_Clock
        |                
        |      |--- 00046.jpg
        |      |--- 00050.jpg
        |          
        |--- Couch
               | --- 00007.jpg
               | --- 00023.jpg

In other words, we are going to have a parent folder (`OfficeHomeDataset`) which contains a sub-folder for each category. Each of these subfolders contains all the images of the dataset corresponding to that category. PyTorch will take care of assigning the labels accordingly at dataloading time.

## Data loading
Let's define a method to compactly return the dataloaders that we need, introducing some transformations.

In [None]:
'''
Input arguments:
  batch_size: mini batch size used during training
  img_root: path to the dataset parent folder. 
            The folder just above the sub-folders or class folders
'''

def get_data(batch_size, img_root):
  
  # prepare data transformations for the train loader
  transform = list()
  transform.append(T.Resize((256, 256)))                      # resize each PIL image to 256 x 256
  transform.append(T.RandomCrop((224, 224)))                  # randomly crop a 224 x 224 patch
  transform.append(T.ToTensor())                              # convert Numpy to Pytorch Tensor
  transform.append(T.Normalize(mean=[0.485, 0.456, 0.406], 
                               std=[0.229, 0.224, 0.225]))    # normalize with ImageNet mean
  transform = T.Compose(transform)                            # compose the above transformations into one
    
  # load data
  officehome_dataset = torchvision.datasets.ImageFolder(root=img_root, transform=transform)
  
  # create train and test splits (80/20)
  num_samples = len(officehome_dataset)
  training_samples = int(num_samples * 0.8 + 1)
  test_samples = num_samples - training_samples

  training_data, test_data = torch.utils.data.random_split(officehome_dataset, 
                                                           [training_samples, test_samples])

  # initialize dataloaders
  train_loader = torch.utils.data.DataLoader(training_data, batch_size, shuffle=True)
  test_loader = torch.utils.data.DataLoader(test_data, batch_size, shuffle=False)
  
  return train_loader, test_loader

## Define the AlexNet model
As previously mentioned, PyTorch provides several models that can optionally be loaded with the parameters trained on ImageNet. If interested, take a look [here](https://pytorch.org/vision/stable/models.html). We will use the `torchvision` library to access the model, and we will then apply the aforementioned modifications. Before that, let's take a look at the original [code](https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py). 

In [None]:
'''
Input arguments
  num_classes: number of classes in the dataset.
               This is equal to the number of output neurons.
'''

def initialize_alexnet(num_classes):

  # load the pre-trained Alexnet
  alexnet = torchvision.models.alexnet(pretrained=True)
  
  # get the number of neurons in the second last layer
  in_features = alexnet.classifier[6].in_features
  
  # re-initalize the output layer
  alexnet.classifier[6] = torch.nn.Linear(in_features=in_features, 
                                          out_features=num_classes)
  
  return alexnet

Let us now print the description of the model we just instatiated and customized

In [None]:
print(initialize_alexnet(65))

## Cost function
Being this a standard classification task, we opt for a Cross Entropy Loss

In [None]:
def get_cost_function():
  cost_function = torch.nn.CrossEntropyLoss()
  return cost_function

## Optimizer
Unlike previous cases, where we were using a unique learning rate, in this case we need some additional coding in order to apply the distinction mentioned above according to the involved layers. In particular, the pre-trained layers need to be updated at a lesser rate than the newly initialized layer. Details are available [here](https://pytorch.org/docs/stable/optim.html).

In order to achieve what we want, we will create **two groups of parameters/weights**, one for the newly initialized layer and the other for the rest of the network. We will then assign two distinct learning rates accordingly.

In [None]:
def get_optimizer(model, lr, wd, momentum):
  
  # we will create two groups of weights, one for the newly initialized layer
  # and the other for rest of the layers of the network
  
  final_layer_weights = []
  rest_of_the_net_weights = []
  
  # iterate through the layers of the network
  for name, param in model.named_parameters():
    if name.startswith('classifier.6'):
      final_layer_weights.append(param)
    else:
      rest_of_the_net_weights.append(param)
  
  # assign the distinct learning rates to each group of parameters
  optimizer = torch.optim.SGD([
      {'params': rest_of_the_net_weights},
      {'params': final_layer_weights, 'lr': lr}
  ], lr=lr / 10, weight_decay=wd, momentum=momentum)
  
  return optimizer

## Training and test steps
Similary to the previous sessions, let's now define our loops.

In [None]:
def training_step(net, data_loader, optimizer, cost_function, device='cuda'):

  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.

  # set the network to training mode: particularly important when using dropout!
  net.train() 

  # iterate over the training set
  for batch_idx, (inputs, targets) in enumerate(data_loader):

    # load data into GPU
    inputs = inputs.to(device)
    targets = targets.to(device)
      
    # forward pass
    outputs = net(inputs)

    # loss computation
    loss = cost_function(outputs,targets)

    # backward pass
    loss.backward()
    
    # parameters update
    optimizer.step()
    
    # gradients reset
    optimizer.zero_grad()

    # fetch prediction and loss value
    samples += inputs.shape[0]
    cumulative_loss += loss.item()
    _, predicted = outputs.max(dim=1) # max() returns (maximum_value, index_of_maximum_value)

    # compute training accuracy
    cumulative_accuracy += predicted.eq(targets).sum().item()

  return cumulative_loss/samples, cumulative_accuracy/samples*100

def test_step(net, data_loader, cost_function, device='cuda'):

  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.

  # set the network to evaluation mode
  net.eval() 

  # disable gradient computation (we are only testing, we do not want our model to be modified in this step!)
  with torch.no_grad():

    # iterate over the test set
    for batch_idx, (inputs, targets) in enumerate(data_loader):
      
      # load data into GPU
      inputs = inputs.to(device)
      targets = targets.to(device)
        
      # forward pass
      outputs = net(inputs)

      # loss computation
      loss = cost_function(outputs, targets)

      # fetch prediction and loss value
      samples+=inputs.shape[0]
      cumulative_loss += loss.item() # Note: the .item() is needed to extract scalars from tensors
      _, predicted = outputs.max(1)

      # compute accuracy
      cumulative_accuracy += predicted.eq(targets).sum().item()

  return cumulative_loss/samples, cumulative_accuracy/samples*100

## Put it together
As usual, we will wrap our pipeline in a main function which takes care of initializing all our components and hyperparameters, performing a loop over multiple epochs and logging the results.

In [None]:
'''
Input arguments
  batch_size: Size of a mini-batch
  device: GPU where you want to train your network
  weight_decay: Weight decay co-efficient for regularization of weights
  momentum: Momentum for SGD optimizer
  epochs: Number of epochs for training the network
  num_classes: Number of classes in your dataset
  visualization_name: Name of the visualization folder
  img_root: The root folder of images
'''

def main(batch_size=128, 
         device='cuda:0', 
         learning_rate=0.001, 
         weight_decay=0.000001, 
         momentum=0.9, 
         epochs=50, 
         num_classes=65, 
         visualization_name='alexnet_sgd', 
         img_root=None):
  
  writer = SummaryWriter(log_dir="runs/exp1")

  # instantiates dataloaders
  train_loader, test_loader = get_data(batch_size=batch_size, img_root=img_root)
  
  # instantiates the model
  net = initialize_alexnet(num_classes=num_classes).to(device)
  
  # instantiates the optimizer
  optimizer = get_optimizer(net, learning_rate, weight_decay, momentum)
  
  # instantiates the cost function
  cost_function = get_cost_function()

  # perform a preliminar step
  print('Before training:')
  train_loss, train_accuracy = test_step(net, train_loader, optimizer, cost_function)
  test_loss, test_accuracy = test_step(net, test_loader, cost_function)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')
  
  # add values to logger
  writer.add_scalar('Loss/train_loss', train_loss, 0)
  writer.add_scalar('Loss/test_loss', test_loss, 0)
  writer.add_scalar('Accuracy/train_accuracy', train_accuracy, 0)
  writer.add_scalar('Accuracy/test_accuracy', test_accuracy, 0)

  # range over the number of epochs
  for e in range(epochs):
    train_loss, train_accuracy = training_step(net, train_loader, optimizer, cost_function)
    test_loss, test_accuracy = test_step(net, test_loader, cost_function)
    print('Epoch: {:d}'.format(e+1))
    print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
    print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
    print('-----------------------------------------------------')
    
    # add values to logger
    writer.add_scalar('Loss/train_loss', train_loss, e + 1)
    writer.add_scalar('Loss/test_loss', test_loss, e + 1)
    writer.add_scalar('Accuracy/train_accuracy', train_accuracy, e + 1)
    writer.add_scalar('Accuracy/test_accuracy', test_accuracy, e + 1)

  # perform final test step and print the final metrics
  print('After training:')
  train_loss, train_accuracy = test_step(net, train_loader, optimizer, cost_function)
  test_loss, test_accuracy = test_step(net, test_loader, cost_function)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')

  # close the logger
  writer.close()

## Let's make it happen!

In [None]:
main(visualization_name='alexnet_sgd_0.01_RW', img_root = '/content/dataset/OfficeHomeDataset_10072016/Real World')

Let's now plot the performance using Tensorboard

In [None]:
%load_ext tensorboard
%tensorboard --logdir=runs