# **Task \#4 A**: Machine Learning MC886/MO444
##**Convolution Models and Transfer Learning**##

In [None]:
print('Gabriel Borges Gutierrez' + ' 237300')

## Objective:

The objective of this project is to implement alternative approaches to **Convolutional Neural Networks** (CNNs) and **Transfer Learning Techniques** in order to devise the most effective model for addressing the given problems.

**Obs: In this work, you can use scikit-learn and PyTorch.**

## Dataset

The COCO (Common Objects in Context) dataset is a widely used benchmark dataset in computer vision research. It serves as a valuable resource for various tasks including object recognition, segmentation, and captioning. The dataset comprises a vast collection of images, each meticulously annotated with detailed information about the objects present in the image. It covers a diverse range of object categories, encompassing everyday objects such as people, animals, vehicles, and household items.

Dataset Information:

- The dataset consists of approximately 115,000 images. However, for your convenience, you can work with a subset that contains at least 30,000 images. You can utilize the function get_partial_dataset to create this partial dataset.

- The following code cell will download the dataset, but please note that if the runtime gets disconnected, you will need to download it again. In case the authorization key doesn't work, you can download the dataset from the links provided below.

- The data is available at: ([Link of the Dataset](https://drive.google.com/drive/folders/12dZ4lkKkAZ6CKcvDtwzXSLrYOy_avWW8?usp=sharing)): ```Multiclass Classfication``` and ```COCO JSON```


More information about the dataset: *Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014.*

## Libraries

In [None]:
import os
import cv2
import json
import torch
import numpy as np
import glob as glob
import pandas as pd
import torch.nn as nn
import albumentations as A
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import torch.optim as optim

from PIL import Image
from torchvision import models
from tqdm.notebook import tqdm
from albumentations.pytorch import ToTensorV2
from torchvision.transforms import Resize, Compose, ToTensor
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from torch_snippets import *

# suppress warnings
import warnings
warnings.filterwarnings("ignore")

## Classification Task with COCO

In the COCO dataset, each sample can have multiple labels. Therefore, using the CrossEntropy loss function, which relies on softmax activation, is not suitable for the multi-label classification problem. Let's explore why CrossEntropy is not appropriate in this case.

![loss_definition_1](https://drive.google.com/uc?export=view&id=1BDkR2n6aNq6VvXnQNYw7dxtzfveijysB)

The above image illustrates how we calculate the CrossEntropy loss in a simple multi-class classification scenario, where the target labels are mutually exclusive. The loss computation focuses on the logit corresponding to the true target label and its relative magnitude compared to other labels. However, softmax ensures that all predicted probabilities sum to 1, making it impossible to have several correct answers.

![loss_definition_2](https://drive.google.com/uc?export=view&id=1tMQ0WFY1HAIlBnp3bSVic4gy1GJuJyc4)

To address this, we need to treat each prediction independently. One solution is to use the Sigmoid function as a normalizer for each logit value individually. This way, we can have multiple correct labels and their respective predicted probabilities for each label. We can then compare these probabilities with the probabilities of the correct labels (set to 1) using the BinaryCrossEntropy loss.

![loss_definition_3](https://drive.google.com/uc?export=view&id=1Mp5lo3EFEM7vMNE_5TM-Zts1NgY8oTrn)

Hence, the appropriate solution is to use the BinaryCrossEntropy loss.

**Consequently, models should have sigmoid as the last activation function to handle multi-label classification tasks correctly.**

In [None]:
## ----- Global Variables ----- ##
batch_size      = 100
learning_rate   = 0.001
epochs          = 10
evaluate_period = 2

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f'Device: {device}')

### Auxiliar functions


In [None]:
def get_partial_dataset(path, save_filename='partial_dataset', n_samples=30000):
  '''
    Creates a partial dataset for training

    Parameters
    ----------
    path : str
      Path to the _classes.csv file.

    save_filename : str
      Name of the file to be saved.

    n_samples : int
      Specifies the number of samples for training.
  '''

  df = pd.read_csv(path)
  idxs = []

  # --- Remove samples without class labels --- #
  for i, row in df.iterrows():
      if row[1:].sum() == 0:
          idxs.append(i)

  df.drop(idxs, inplace=True)

  # --- Randomly remove samples --- #
  idxs = df.sample(df.shape[0] - n_samples).index
  df   = df.drop(idxs).reset_index(drop=True)

  # --- Save locally --- #
  # Include the Google Drive path to ensure the preservation of this information!
  df.to_csv(f'{save_filename}.csv', index=False)

# get_partial_dataset('COCO-multiclass/train/full_dataset.csv')

### Class Dataset and DataLoader

*Obs: Learn more in [Dataset and Dataloader Tutorial Pytorch](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html)*

In [None]:
class COCOMulticlass(Dataset):
  '''
    Dataset class

    Parameters:
    -----------
    __init__():
      annotations_file : str
        Path to the _classes.csv file or partial_dataset.csv file

      img_dir : str
        Path to the directory containing the images

      transform : torchvision.transforms
        Image transformations from the torchvision library.
  '''

  def __init__(self, annotations_file, img_dir, transform=None):
      self.img_labels = pd.read_csv(annotations_file)
      self.img_dir    = img_dir
      self.transform  = transform
      self.classes_names = self.img_labels.columns[1:]

  def __len__(self):
      return len(self.img_labels)

  def __getitem__(self, idx):
      img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
      image    = Image.open(img_path)
      label    = torch.Tensor(self.img_labels.iloc[idx, 1:].values.astype(float))

      if self.transform:
          image = self.transform(image)

      return image, label

In [None]:
# --- Image transformations --- #
data_transform = Compose([Resize((224,224)), ToTensor()])

# --- Datasets --- #
train_dataset = COCOMulticlass('COCO-multiclass/train/partial_dataset.csv', 'COCO-multiclass/train', transform=data_transform)
valid_dataset = COCOMulticlass('COCO-multiclass/valid/_classes.csv', 'COCO-multiclass/valid', transform=data_transform)

# --- DataLoaders --- #
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=True)

# --- Classes --- #
class_names = train_dataset.classes_names

In [None]:
## ------ Plot Data ----- ##
fig, axes = plt.subplots(4, 10, figsize=(30,15), subplot_kw={'xticks':[], 'yticks':[]})
for i, ax in enumerate(axes.flat):
    data, target = train_dataset.__getitem__(i*10)
    ax.imshow(data.permute(1,2,0), cmap='binary', interpolation='nearest')
    ax.set_title(''.join(class_names[target == 1]))

### Train and Evaluate functions

In [None]:
def Criterion(preds, targets):
    bce = nn.BCELoss().to(device)
    loss = bce(preds + 1e-10, targets)
    pred_labels = (preds > 0.5).float()
    acc = accuracy_score(targets.cpu(), pred_labels.cpu())
    f1 = f1_score(targets.cpu(), pred_labels.cpu(), average='samples')
    precision = precision_score(targets.cpu(), pred_labels.cpu(), average='samples')
    recall = recall_score(targets.cpu(), pred_labels.cpu(), average='samples')
    return loss, acc, f1, precision, recall

In [None]:
def train_batch(model, data, optimizer, criterion, device):
    model.train()
    ims, targets = data
    ims     = ims.to(device=device)
    targets = targets.to(device=device)
    preds   = model(ims)
    loss, acc, f1, pre, rec = criterion(preds, targets)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    return loss.item(), acc.item(), f1.item(), pre.item(), rec.item()

@torch.no_grad()
def validate_batch(model, data, criterion, device):
    model.eval()
    ims, targets = data
    ims     = ims.to(device=device)
    targets = targets.to(device=device)
    preds   = model(ims)
    loss, acc, f1, pre, rec = criterion(preds, targets)
    return loss.item(), acc.item(), f1.item(), pre.item(), rec.item()

def save_model(model, best_loss, current_loss, sufix):
  '''
    Save the best model weights.
    This function saves the weights locally.
    To prevent data loss, consider adding the Google Drive path in the `torch.save()` function.

    Parameters:
    -----------
    model : nn.Module
      Model to save the weights.

    best_loss : float
      Best loss achieved so far.

    current_loss : float
      Current loss to compare with the best loss.
  '''
  if best_loss == None:
    best_loss = current_loss
    torch.save(model.state_dict(), 'weights'+ sufix +'.pth')

  elif best_loss > current_loss:
    best_loss = current_loss
    torch.save(model.state_dict(), 'weights'+ sufix +'.pth')

  else: pass
  return best_loss

def load_model(path, model):
  '''
    Load the model weights.

    Parameters:
    -----------
    path : str
      Path to the .pth file containing the weights.

    model : nn.Module
      Model to load the weights into.
  '''
  model.load_state_dict(torch.load(path))
  return model

### 1. (3 points) Buil and train a Convolutional Neural Network (CNN) for Multi-Label Image Classification.

*Tip 1: Apply a weight regularization to avoid overfitting and improve the performance of the CNN (for example, l1, l2, l1 and l2).*

*Tip 2: Remember to use regularization layers, such as Dropout, BatchNorm and LayerNorm.*

In [None]:
def conv_block(nchannels_in, nchannels_out, stride_val, conv_kernel_size=3, pool_kernel_size=2):
    return nn.Sequential(
        # defining convolutional layer
        nn.Conv2d(
            in_channels=nchannels_in,
            out_channels=nchannels_out,
            kernel_size=conv_kernel_size,
            stride=1,
            padding=1,
            bias=True,
        ),
        # defining activation layer
        nn.ReLU(),
        # defining a pooling layer
        nn.MaxPool2d(kernel_size=pool_kernel_size,
                     stride=stride_val, padding=1),
    )


class HandcraftCNN_v1(nn.Module):
    def __init__(self):
        super(HandcraftCNN_v1, self).__init__()
        self.features = nn.Sequential(
            conv_block(3, 16, 2),
            conv_block(16, 64, 2),
        )
        self.classifier = nn.Sequential(
            nn.Linear(in_features=207936, out_features=64, bias=True),
            nn.ReLU(),
            nn.Dropout(0.20),
            nn.Linear(64, 128, bias=True),
            nn.ReLU(),
            nn.Dropout(0.20),
            nn.Linear(128, len(class_names)),
            nn.Sigmoid(),
        )

    def forward(self, x):
        x = self.features(x)
        # transforms outputs into a 2D tensor
        x = torch.flatten(x, start_dim=1)
        # classifies features
        y = self.classifier(x)
        return y


class HandcraftCNN_v2(nn.Module):
    def __init__(self):
        super(HandcraftCNN_v2, self).__init__()
        self.features = nn.Sequential(
            conv_block(3, 16, 1),
            conv_block(16, 32, 2),
            conv_block(32, 64, 3)
        )
        self.classifier = nn.Sequential(
            nn.Linear(in_features=92416, out_features=128, bias=True),
            nn.ReLU(),
            nn.Dropout(0.20),
            nn.Linear(128, 256, bias=True),
            nn.ReLU(),
            nn.Dropout(0.20),
            nn.Linear(256, 512, bias=True),
            nn.ReLU(),
            nn.Dropout(0.20),
            nn.Linear(512, len(class_names)),
            nn.Sigmoid(),
        )

    def forward(self, x):
        x = self.features(x)
        # transforms outputs into a 2D tensor
        x = torch.flatten(x, start_dim=1)
        # classifies features
        y = self.classifier(x)
        return y


model_v1 = HandcraftCNN_v1()
model_v2 = HandcraftCNN_v2()

In [None]:

optimizer = torch.optim.Adam(model_v1.parameters(), lr=learning_rate)
criterion = Criterion
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=25, gamma=0.1)
log = Report(epochs)

model_v1.to(device)

best_loss = None

for epoch in range(0, epochs):
    # --- Train Model --- #
    N = len(train_loader)
    for bx, data in enumerate(train_loader):
        loss, acc, f1, pre, rec = train_batch(
            model_v1, data, optimizer, criterion, device)
        # report results for the batch
        log.record((epoch+(bx+1)/N), trn_loss=loss, trn_acc=acc,
                   trn_f1=f1, trn_pre=pre, trn_rec=rec, end='\r')

    loss = 0
    N = len(valid_loader)
    for bx, data in enumerate(valid_loader):
        loss, acc, f1, pre, rec = validate_batch(
            model_v1, data, criterion, device)
        log.record((epoch+(bx+1)/N), val_loss=loss, val_acc=acc,
                   val_f1=f1, val_pre=pre, val_rec=rec, end='\r')
        

    # --- Evaluate model in N epochs --- #
    if (epoch + 1) % evaluate_period == 0:
        log.report_avgs(epoch+1)
        # --- Save best model weights --- #
        best_loss = save_model(model_v1, best_loss, loss, '_v1')

    lr_scheduler.step()

In [None]:
log.plot_epochs(['trn_loss','val_loss'])
log.plot_epochs(['trn_acc','val_acc'])
log.plot_epochs(['trn_f1','val_f1'])
log.plot_epochs(['trn_pre','val_pre'])
log.plot_epochs(['trn_rec','val_rec'])

### Visualize model predictions

In [None]:
def show_prediction(model, dataloader, class_names):
  '''
  Show a sample prediction.

  Parameters:
  -----------
  model : nn.Module
    Model to be evaluated.

  dataloader : dataloader
    DataLoader for the example.

  class_names : list
    List containing the class names.

  '''
  data, target = next(iter(dataloader))
  data = data.to(device)
  logits = model(data.type(torch.float))
  np.set_printoptions(precision=5, suppress=True)
  print(logits.detach().cpu().numpy()[0])
  pred   = np.array(logits.cpu() > .5, dtype=float)
  plt.imshow(data[0].cpu().permute(1,2,0))
  plt.axis('off')
  plt.show()

  target = ''.join(class_names[target[0] == 1])
  pred   = ''.join(class_names[pred[0] == 1])
  print(f"Target: {target}\nPred: {pred if len(pred) != 0 else 'NONE'}")

In [None]:
model_v1.eval()
show_prediction(model_v1, valid_loader, class_names)

In [None]:

optimizer = torch.optim.Adam(model_v2.parameters(), lr=learning_rate)
criterion = Criterion
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=25, gamma=0.1)
log = Report(epochs)

model_v2.to(device)

best_loss = None

for epoch in range(0, epochs):
    # --- Train Model --- #
    N = len(train_loader)
    for bx, data in enumerate(train_loader):
        loss, acc, f1, pre, rec = train_batch(
            model_v2, data, optimizer, criterion, device)
        # report results for the batch
        log.record((epoch+(bx+1)/N), trn_loss=loss, trn_acc=acc,
                   trn_f1=f1, trn_pre=pre, trn_rec=rec, end='\r')

    b_loss = []
    N = len(valid_loader)
    for bx, data in enumerate(valid_loader):
        loss, acc, f1, pre, rec = validate_batch(
            model_v1, data, criterion, device)
        log.record((epoch+(bx+1)/N), val_loss=loss, val_acc=acc,
                   val_f1=f1, val_pre=pre, val_rec=rec, end='\r')
        

    # --- Evaluate model in N epochs --- #
    if (epoch + 1) % evaluate_period == 0:
        log.report_avgs(epoch+1)
        # --- Save best model weights --- #
        best_loss = save_model(model_v2, best_loss, loss, '_v2')

    lr_scheduler.step()

In [None]:
log.plot_epochs(['trn_loss','val_loss'])
log.plot_epochs(['trn_acc','val_acc'])
log.plot_epochs(['trn_f1','val_f1'])
log.plot_epochs(['trn_pre','val_pre'])
log.plot_epochs(['trn_rec','val_rec'])

In [None]:
model_v2.eval()
show_prediction(model_v2, valid_loader, class_names)

 > What are the conclusions? Was this model sufficient for the task? Do the hyperparameters, such as learning rate, batch size, and others, impact the final result? (1-2 paragraphs)

### 2. (3 points) Apply the Transfer Learning Technique by utilizing one of the pre-trained CNN models available in PyTorch as backbone.

In [None]:
def TransferLearningModel(nclasses):
    # get the vgg16 model pretrained on ImageNet
    model = models.vgg16(weights="IMAGENET1K_V1")
    # Specify you do not want to train the parameters of the model
    for param in model.parameters():
        param.requires_grad = False
    # The vgg16 model consists of three modules: features, avgpool, and classifier.
    # Change avgpool to return a feature map of size 1x1 instead of 7x7. This will create
    # batches with 512x1x1 tensors.
    model.avgpool = nn.AdaptiveAvgPool2d(output_size=(1, 1))
    # Change the classifier to one suitable for your dataset
    model.classifier = nn.Sequential(
        nn.Flatten(),
        nn.Linear(512, 128),
        nn.ReLU(),
        nn.Dropout(0.2),
        nn.Linear(128, nclasses),
        nn.Sigmoid(),
    )  
    criterion = Criterion
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    # Return the complete model information for training and evaluation
    return (model.to(device), criterion, optimizer)


model_tf, criterion, optimizer = TransferLearningModel(len(class_names))
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=25, gamma=0.1)

In [None]:
model_tf.to(device)

best_loss = None

for epoch in range(0, epochs):
    # --- Train Model --- #
    N = len(train_loader)
    for bx, data in enumerate(train_loader):
        loss, acc, f1, pre, rec = train_batch(
            model_tf, data, optimizer, criterion, device)
        # report results for the batch
        log.record((epoch+(bx+1)/N), trn_loss=loss, trn_acc=acc,
                   trn_f1=f1, trn_pre=pre, trn_rec=rec, end='\r')

    b_loss = []
    N = len(valid_loader)
    for bx, data in enumerate(valid_loader):
        loss, acc, f1, pre, rec = validate_batch(
            model_tf, data, criterion, device)
        log.record((epoch+(bx+1)/N), val_loss=loss, val_acc=acc,
                   val_f1=f1, val_pre=pre, val_rec=rec, end='\r')
        

    # --- Evaluate model in N epochs --- #
    if (epoch + 1) % evaluate_period == 0:
        log.report_avgs(epoch+1)
        # --- Save best model weights --- #
        best_loss = save_model(model_tf, best_loss, loss, '_tf')

    lr_scheduler.step()

In [None]:
log.plot_epochs(['trn_loss','val_loss'])
log.plot_epochs(['trn_acc','val_acc'])
log.plot_epochs(['trn_f1','val_f1'])
log.plot_epochs(['trn_pre','val_pre'])
log.plot_epochs(['trn_rec','val_rec'])

### Visualize model predictions

In [None]:
model_tf.eval()
show_prediction(model_tf, valid_loader, class_names)

 > What are the conclusions? Does the performance improve? Is it better to freeze the entire model or update all the weights in this case? (1-2 paragraphs)

### 3. (3 points) Apply the Data Augmentation technique to either the handcrafted model or the transfer learning model.

*Tip: Be careful to choose appropriate transformations that do not destroy the information of the sample.*

In [None]:
## TODO: Implement data augmentation during training. Choose appropriate transformations.
# Link: https://pytorch.org/vision/stable/transforms.html

train_transforms = transforms.Compose([
    transforms.Resize((300,300), interpolation=transforms.InterpolationMode.BILINEAR, 
                      max_size=None, antialias=True),
    transforms.RandomAffine(degrees=10, translate=(0.05,0.10), scale=(0.9,1.1), shear=(-2,2),
                            interpolation=transforms.InterpolationMode.BILINEAR, 
                            fill=0),
    transforms.CenterCrop(250),
    transforms.Resize((224,224), interpolation=transforms.InterpolationMode.BILINEAR, 
                      max_size=None, antialias=True),
    transforms.ToTensor()
])
valid_transforms = transforms.Compose([
    transforms.Resize((300,300), interpolation=transforms.InterpolationMode.BILINEAR, 
                      max_size=None, antialias=True),
    transforms.RandomAffine(degrees=10, translate=(0.05,0.10), scale=(0.9,1.1), shear=(-2,2),
                            interpolation=transforms.InterpolationMode.BILINEAR, 
                            fill=0),
    transforms.CenterCrop(250),
    transforms.Resize((224,224), interpolation=transforms.InterpolationMode.BILINEAR, 
                      max_size=None, antialias=True),
    transforms.ToTensor()  
])

train_dataset = COCOMulticlass('TCOCO-multiclass/train/partial_dataset.csv', 'COCO-multiclass/train', transform=train_transforms)
valid_dataset = COCOMulticlass('COCO-multiclass/valid/_classes.csv', 'COCO-multiclass/valid', transform=valid_transforms)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=True)

In [None]:
model_tf_aug, criterion, optimizer = TransferLearningModel(len(class_names))
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=25, gamma=0.1)

In [None]:
model_tf_aug.to(device)

best_loss = None

for epoch in range(0, epochs):
    # --- Train Model --- #
    N = len(train_loader)
    for bx, data in enumerate(train_loader):
        loss, acc, f1, pre, rec = train_batch(
            model_tf_aug, data, optimizer, criterion, device)
        # report results for the batch
        log.record((epoch+(bx+1)/N), trn_loss=loss, trn_acc=acc,
                   trn_f1=f1, trn_pre=pre, trn_rec=rec, end='\r')

    b_loss = []
    N = len(valid_loader)
    for bx, data in enumerate(valid_loader):
        loss, acc, f1, pre, rec = validate_batch(
            model_tf_aug, data, criterion, device)
        log.record((epoch+(bx+1)/N), val_loss=loss, val_acc=acc,
                   val_f1=f1, val_pre=pre, val_rec=rec, end='\r')
        

    # --- Evaluate model in N epochs --- #
    if (epoch + 1) % evaluate_period == 0:
        log.report_avgs(epoch+1)
        # --- Save best model weights --- #
        best_loss = save_model(model_tf_aug, best_loss, loss, '_tf_aug')

    lr_scheduler.step()

In [None]:
log.plot_epochs(['trn_loss','val_loss'])
log.plot_epochs(['trn_acc','val_acc'])
log.plot_epochs(['trn_f1','val_f1'])
log.plot_epochs(['trn_pre','val_pre'])
log.plot_epochs(['trn_rec','val_rec'])

In [None]:
model_tf_aug.eval()
show_prediction(model_tf, valid_loader, class_names)

 > What are the conclusions? Does the performance improve? (1-2 paragraphs)