# HW2P2: Face Classification and Verification


Congrats on coming to the second homework in 11785: Introduction to Deep Learning. This homework significantly longer and tougher than the previous homework. You have 2 sub-parts as outlined below. Please start early!


*   Face Recognition: You will be writing your own CNN model to tackle the problem of classification, consisting of 7001 identities
*   Face Verification: You use the model trained for classification to evaluate the quality of its feature embeddings, by comparing the similarity of known and unknown identities

Common errors which you may face in this homeworks (because of the size of the model)


*   CUDA Out of Memory (OOM): You can tackle this problem by (1) Reducing the batch size (2) Calling `torch.cuda.empty_cache()` and `gc.collect()` (3) Finally restarting the runtime



# FINAL MODEL ARICHTECTURE USED: SEResNet (Squeeze-and-Excitation Residual Network)
The SEResNet is an architectural variant of the popular Residual Network (ResNet) that incorporates attention mechanisms. It was proposed to improve the performance of convolutional neural networks by enhancing feature interactions and adaptively reweighting feature maps.

Here are the key components of SE-ResNet:

Residual Blocks: SE-ResNet employs residual blocks similar to the standard ResNet architecture. These blocks consist of a stack of convolutional layers, batch normalization, and ReLU activations.

Squeeze-and-Excitation (SE) Module: The distinctive feature of SE-ResNet is the SE module. It's introduced within each residual block and aims to capture channel-wise dependencies and adaptively recalibrate the feature maps. The SE module has two steps:

Squeeze: Global Average Pooling (GAP) is applied to reduce the spatial dimensions of feature maps into a single value per channel. This step compresses the channel-wise information.
Excitation: A fully connected network (usually a couple of dense layers) is applied to the output of the squeeze step. This network learns to assign a weight to each channel.
Skip Connections: Skip connections are preserved throughout the network, allowing gradients to flow easily during training. The combination of residual blocks and skip connections helps in mitigating the vanishing gradient problem.

The SE-ResNet architecture has demonstrated improved performance in various computer vision tasks, such as image classification, object detection, and semantic segmentation. By incorporating the SE module, the network can adaptively focus on more informative channels and suppress less relevant ones, making it more efficient and accurate.

## SEResNet explanation in terms of face classification and verification

SEResNet, or Squeeze-and-Excitation Residual Network, is a convolutional neural network architecture that incorporates Squeeze-and-Excitation (SE) blocks to enhance feature recalibration and improve network performance. 
In the context of face classification and verification, SEResNet will be applied as follow:

1. Face Classification typically involves the task of categorizing faces into predefined classes, such as identifying individuals based on their facial features.
SEResNet can be used as a deep learning model for face classification tasks. It excels at learning discriminative features from facial images, making it effective for identifying people in images.
The Squeeze-and-Excitation blocks within SEResNet help the network focus on important facial features while reducing the impact of less relevant details, thus improving the accuracy of face classification.

2. Face Verification is the task of determining whether two facial images belong to the same individual or not. It's often used for applications like face-based access control or user authentication.
SEResNet can be used in face verification by training it to learn feature representations of faces and then comparing these representations to determine if the faces are from the same person or not.
The SE blocks in SEResNet can help capture subtle differences and similarities in facial features, making it suitable for fine-grained face verification tasks.

Overall, in both face classification and verification scenarios, SEResNet benefits from its ability to automatically recalibrate features within the network using the SE (Squeeze-and-Excitation ) blocks. This recalibration helps focus on important facial characteristics, which is critical for achieving high accuracy in these tasks. However, it's essential to note that the model's performance is highly dependent on the quality and diversity of the training data, as well as fine-tuning and hyperparameter tuning to suit the specific requirements of the face classification or verification task.

This project aims was to explore CNN arichtectures while performing this task of face classification and verification. I learned hyperparameter tuning to measure that the model perform well.

Hyperparameters used and resulted in the best score are:
300 epochs used, batch_size of 128, learning rate of 1e-3, rotation_angle of 30, and horizontal_flip of 0.5.

Data loader functions were optimized using transformation such as rotation, Augmentation, horizontal flipping, etc. All these techniques used to improve the goodness of the data so that the model will generalize well the output.
# Results

classification task Training accuracy was 99.89%
classification task validation accuracy was 90.346%

verification task  accuracy was 54.44%
verification task validation accuracy was 56.944%

# Fine tuning 
Different trial made: batch size: 64,256,128 Roration angle:25,30, Lr = 0.1,0.0001,0.001; brightness: 0.2,0.2,0.2

# code running

To ensure the code are running, Kaggle can be used with GPU runtime.
To run the code cells, start from the beginning.

# Preliminaries

In [None]:
!nvidia-smi # to see what GPU you have

In [None]:
!pip install wandb --quiet

In [None]:
!pip install --upgrade wandb --quiet

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
# from torchsummary import summary
import torchvision #This library is used for image-based operations (Augmentations)
import os
import gc
from tqdm import tqdm
from PIL import Image
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
import glob
import io
import wandb
import matplotlib.pyplot as plt
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", DEVICE)

In [None]:
# from google.colab import drive # Link your drive if you are a colab user
# drive.mount('/content/drive') # Models in this HW take a long time to get trained and make sure to save it her

# TODOs
As you go, please read the code and keep an eye out for TODOs!

# Download Data from Kaggle

In [None]:
# TODO: Use the same Kaggle code from HW1P2
!pip install --upgrade --force-reinstall --no-deps kaggle==1.5.8
!mkdir /root/.kaggle

with open("/root/.kaggle/kaggle.json", "w+") as f:
    f.write('{"username":"umuhozaalice","key":"9d299ae7b28d6e93d785c9337c271ec0"}')
    # Put your kaggle username & key here

!chmod 600 /root/.kaggle/kaggle.json

In [None]:
# !mkdir '/content/data'

# !kaggle competitions download -c 11-785-f23-hw2p2-classification
# !unzip -qo '11-785-f23-hw2p2-classification.zip' -d '/content/data'

# !kaggle competitions download -c 11-785-f23-hw2p2-verification
# !unzip -qo '11-785-f23-hw2p2-verification.zip' -d '/content/data'

In [None]:
# !mkdir '/kaggle/working'
!kaggle competitions download -c 11-785-f23-hw2p2-classification
!unzip -qo '11-785-f23-hw2p2-classification.zip' -d '/kaggle/working'

!kaggle competitions download -c 11-785-f23-hw2p2-verification
!unzip -qo '11-785-f23-hw2p2-verification.zip' -d '/kaggle/working'

# Configs

In [None]:
config = {
    'batch_size': 128,  # Increase this if your GPU can handle it
    'lr': 0.001,       
    'epochs': 300,      # 20 epochs is recommended ONLY for the early submission - you will have to train for much longer typically
    'rotation_angle': 30,
    'horizontal_flip': 0.5,
    'brightness': 0.15,
    'contrast': 0.15,
    'saturation': 0, 
    'hue': 0,
    # Include other parameters as needed.
} 


# Classification Dataset

In [None]:
DATA_DIR    = '/kaggle/working/11-785-f23-hw2p2-classification'# TODO: Path where you have downloaded the data
TRAIN_DIR   = os.path.join(DATA_DIR, "train")
VAL_DIR     = os.path.join(DATA_DIR, "dev")
TEST_DIR    = os.path.join(DATA_DIR, "test")



# Transforms using torchvision - Refer https://pytorch.org/vision/stable/transforms.html

train_transforms = torchvision.transforms.Compose([
    torchvision.transforms.ColorJitter(brightness=config['brightness'], contrast=config['contrast'], saturation=config['saturation'], hue=config['hue']),
    torchvision.transforms.RandomPerspective(0.4, 0.4),
    torchvision.transforms.RandomRotation(degrees=config['rotation_angle']),
    torchvision.transforms.RandomHorizontalFlip(p=config['horizontal_flip']),
    torchvision.transforms.RandAugment(3,8),
    torchvision.transforms.ToTensor() ])# Implementing the right train transforms/augmentation methods is key to improving performance.

# Most torchvision transforms are done on PIL images. So you convert it into a tensor at the end with ToTensor()
# But there are some transforms which are performed after ToTensor() : e.g - Normalization
# Normalization Tip - Do not blindly use normalization that is not suitable for this dataset

valid_transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()
])


train_dataset   = torchvision.datasets.ImageFolder(TRAIN_DIR, transform= train_transforms)
valid_dataset   = torchvision.datasets.ImageFolder(VAL_DIR, transform= valid_transforms)
# You should NOT have data augmentation on the validation set. Why?
# The goal of data augmentation is to improve the model's ability to generalize and perform well on unseen data 
# by exposing it to a wider variety of input variations during training.
# The validation set, on the other hand, is used to estimate how well the model generalizes to unseen data. 
# It serves as an independent dataset to evaluate the model's performance.
# Overall, to properly assess a model's performance and its ability to generalize to real-world scenarios, 
# it's important to keep the validation set separate and unaltered, without any form of data augmentation.


# Create data loaders
train_loader = torch.utils.data.DataLoader(
    dataset     = train_dataset,
    batch_size  = config['batch_size'],
    shuffle     = True,
    num_workers = 2,
    pin_memory  = True
)

valid_loader = torch.utils.data.DataLoader(
    dataset     = valid_dataset,
    batch_size  = config['batch_size'],
    shuffle     = False,
    num_workers = 2
)

In [None]:
# You can do this with ImageFolder as well, but it requires some tweaking
class ClassificationTestDataset(torch.utils.data.Dataset):

    def __init__(self, data_dir, transforms):
        self.data_dir   = data_dir
        self.transforms = transforms

        # This one-liner basically generates a sorted list of full paths to each image in the test directory
        self.img_paths  = list(map(lambda fname: os.path.join(self.data_dir, fname), sorted(os.listdir(self.data_dir))))

    def __len__(self):
        return len(self.img_paths)

    def __getitem__(self, idx):
        return self.transforms(Image.open(self.img_paths[idx]))

In [None]:
test_dataset = ClassificationTestDataset(TEST_DIR, transforms = valid_transforms) #Why are we using val_transforms for Test Data?
# we are using val_transforms for Test Data to ensure that the test data is processed in a consistent manner, just like the validation data, and to avoid introducing any bias or artifacts during testing
# again, test data are unseen data, similar to validation data. So, we need to use vel_transform to have the same data format as validation data used to evaluate the model, we need to maintain the data consistency while evaluate model generalization 
# in brief, same transfomation is needed for both test data and val data to ensure model evaluation is consistent, assess how well the model generalize to previously unseen data, and to ensure that the reported results are reliaibel and accurate to reflect model's performance on unseen data
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = config['batch_size'], shuffle = False,
                         drop_last = False, num_workers = 2)

In [None]:
print("Number of classes    : ", len(train_dataset.classes))
print("No. of train images  : ", train_dataset.__len__())
print("Shape of image       : ", train_dataset[0][0].shape)
print("Batch size           : ", config['batch_size'])
print("Train batches        : ", train_loader.__len__())
print("Val batches          : ", valid_loader.__len__())

## Data visualization

In [None]:
# Visualize a few images in the dataset
# You can write your own code, and you don't need to understand the code
# It is highly recommended that you visualize your data augmentation as sanity check

r, c    = [5, 5]
fig, ax = plt.subplots(r, c, figsize= (15, 15))

k       = 0
dtl     = torch.utils.data.DataLoader(
    dataset     = torchvision.datasets.ImageFolder(TRAIN_DIR, transform= train_transforms), # dont wanna see the images with transforms
    batch_size  = config['batch_size'],
    shuffle     = True,
)

for data in dtl:
    x, y = data

    for i in range(r):
        for j in range(c):
            img = x[k].numpy().transpose(1, 2, 0)
            ax[i, j].imshow(img)
            ax[i, j].axis('off')
            k+=1
    break

del dtl

# Very Simple Network (for Mandatory Early Submission)

In [None]:
class Squeeze_Excitation_Block(nn.Module):
    def __init__(self, in_channels, reduction_ratio=16):
        super(Squeeze_Excitation_Block, self).__init__()
        # Global average pooling layer
        self.glob_avg_pool_lay = nn.AdaptiveAvgPool2d(1)
        # First fully connected layer to reduce dimensionality
        self.fc_reduce = nn.Linear(in_channels, in_channels // reduction_ratio)
        # ReLU activation function
        self.relu = nn.ReLU(inplace=True)
        # Second fully connected layer to restore dimensionality
        self.fc_expand = nn.Linear(in_channels // reduction_ratio, in_channels)
        # Sigmoid activation function
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Global average pooling to obtain channel-wise statistics
        out_pooled = self.glob_avg_pool_lay(x).squeeze(-1).squeeze(-1)
        out_reduced = self.fc_reduce(out_pooled)
        out_activateed = self.relu(out_reduced)
        out_expanded= self.fc_expand(out_activateed)
        out_recal = self.sigmoid(out_expanded)
        out_recalibrated = out_recal.unsqueeze(-1).unsqueeze(-1)
        output = x * out_recalibrated 
        return output

class Residual_Block(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(Residual_Block, self).__init__()
        # First convolutional layer
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        # Batch normalization after the first convolution
        self.bn1 = nn.BatchNorm2d(out_channels)
        # ReLU activation function
        self.relu = nn.ReLU(inplace=True)
        # Second convolutional layer
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        # Batch normalization after the second convolution
        self.bn2 = nn.BatchNorm2d(out_channels)
        # Squeeze-Excitation block
        self.se_block = Squeeze_Excitation_Block(out_channels)
        # Downsample operation to match dimensions if needed
        self.downsample = downsample

    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        # Squeeze-Excitation block to recalibrate feature importance
        out = self.se_block(out)
        # When needed, adjust residual dimensions
        if self.downsample is not None:
            residual = self.downsample(x)
        # Add the residual and processed output
        out += residual
        out = self.relu(out)

        return out

class SEResNet(nn.Module):
    def __init__(self, block, layers, num_classes):
        super(SEResNet, self).__init__()
        self.in_channels = 64
        # Initial convolutional layer
        self.conv = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        # Batch normalization after the initial convolution
        self.bn = nn.BatchNorm2d(64)
        # ReLU activation function
        self.relu = nn.ReLU(inplace=True)
        # Max-pooling layer
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # Residual blocks for each stage
        self.layer1 = self.createlayer(block, 64, layers[0])
        self.layer2 = self.createlayer(block, 128, layers[1], stride=2)
        self.layer3 = self.createlayer(block, 256, layers[2], stride=2)
        self.layer4 = self.createlayer(block, 512, layers[3], stride=2)

        # Global average pooling layer
        self.glob_avg_pool_lay = nn.AdaptiveAvgPool2d(1)
        # Fully connected layer for classification
        self.fc_lay = nn.Linear(512, num_classes)

    def createlayer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels:
            # Downsample operation to match dimensions
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

        layers = [block(self.in_channels, out_channels, stride, downsample)]
        self.in_channels = out_channels

        for _ in range(1, blocks):
            layers.append(block(out_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x, return_feats=False):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.glob_avg_pool_lay(x)
        x = x.view(x.size(0), -1)
        output = self.fc_lay(x)

        if return_feats:
            return output, x
        else:
            return output

# Create an instance of the SEResNet model
model = SEResNet(Residual_Block, [4, 5, 6, 2], num_classes=7001).to(DEVICE)


In [None]:
# checking number of parameters used
number_parameters=[param.numel() for param in model.parameters() if param.requires_grad==True]
sum(number_parameters)

# Setup everything for training

In [None]:
criterion = torch.nn.CrossEntropyLoss(label_smoothing=0.2) # TODO: What loss do you need for a multi class classification problem?
optimizer = torch.optim.SGD(model.parameters(), lr=config['lr'], momentum=0.9, weight_decay=1e-4)
# TODO: Implement a scheduler (Optional but Highly Recommended)
# scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=3, factor=0.5, verbose=True)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=30, eta_min=0.00008)
# You can try ReduceLRonPlateau, StepLR, MultistepLR, CosineAnnealing, etc.
scaler = torch.cuda.amp.GradScaler() # Good news. We have FP16 (Mixed precision training) implemented for you
# It is useful only in the case of compatible GPUs such as T4/V100

# Let's train!

In [None]:
def train(model, dataloader, optimizer, criterion):

    model.train()

    # Progress Bar
    batch_bar   = tqdm(total=len(dataloader), dynamic_ncols=True, leave=False, position=0, desc='Train', ncols=5)

    num_correct = 0
    total_loss  = 0

    for i, (images, labels) in enumerate(dataloader):

        optimizer.zero_grad() # Zero gradients

        images, labels = images.to(DEVICE), labels.to(DEVICE)

        with torch.cuda.amp.autocast(): # This implements mixed precision. Thats it!
            outputs = model(images)
            loss    = criterion(outputs, labels)

        # Update no. of correct predictions & loss as we iterate
        num_correct     += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss      += float(loss.item())

        # tqdm lets you add some details so you can monitor training as you train.
        batch_bar.set_postfix(
            acc         = "{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss        = "{:.04f}".format(float(total_loss / (i + 1))),
            num_correct = num_correct,
            lr          = "{:.04f}".format(float(optimizer.param_groups[0]['lr']))
        )

        scaler.scale(loss).backward() # This is a replacement for loss.backward()
        scaler.step(optimizer) # This is a replacement for optimizer.step()
        scaler.update()

        # TODO? Depending on your choice of scheduler,

        # You may want to call some schdulers inside the train function. What are these?

        batch_bar.update() # Update tqdm bar

    batch_bar.close() # You need this to close the tqdm bar

    acc         = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss  = float(total_loss / len(dataloader))

    return acc, total_loss

In [None]:
def validate(model, dataloader, criterion):

    model.eval()
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Val', ncols=5)

    num_correct = 0.0
    total_loss = 0.0

    for i, (images, labels) in enumerate(dataloader):

        # Move images to device
        images, labels = images.to(DEVICE), labels.to(DEVICE)

        # Get model outputs
        with torch.inference_mode():
            outputs = model(images)
            loss = criterion(outputs, labels)

        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct)

        batch_bar.update()

    batch_bar.close()
    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))
    return acc, total_loss

In [None]:
gc.collect() # These commands help you when you face CUDA OOM error
torch.cuda.empty_cache()

# Wandb

In [None]:
wandb.login(key="07b4b09ae74690496d4fe8aaf8e2230dc720df35") #API Key is in your wandb account, under settings (wandb.ai/settings)

In [None]:
# Create your wandb run
run = wandb.init(
    name = "final_fin-submission", ## Wandb creates random run names if you skip this field
    #     reinit = True, ### Allows reinitalizing runs when you re-run this cell
    id = "bt064ps8", ### Insert specific run id here if you want to resume a previous run
    resume = "must", ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw2p2-ablations", ### Project should be created in your wandb account 
    config = config ### Wandb Config for your run
)

# Experiments

In [None]:
# path="/kaggle/input/cghnvjhb/checkpoint.pth"
# checkpoint = torch.load(path)
# model.load_state_dict(checkpoint['model_state_dict'])

In [None]:
best_valacc = 0.0
# root = '/kaggle/working/'

# model_directory = os.path.join(root, "models")

for epoch in range(config['epochs']):

    curr_lr = float(optimizer.param_groups[0]['lr'])

    train_acc, train_loss = train(model, train_loader, optimizer, criterion)

    print("\nEpoch {}/{}: \nTrain Acc {:.04f}%\t Train Loss {:.04f}\t Learning Rate {:.04f}".format(
        epoch + 1,
        config['epochs'],
        train_acc,
        train_loss,
        curr_lr))

    val_acc, val_loss = validate(model, valid_loader, criterion)

    print("Val Acc {:.04f}%\t Val Loss {:.04f}".format(val_acc, val_loss))

    wandb.log({"train_loss":train_loss, 'train_Acc': train_acc, 'validation_Acc':val_acc,
               'validation_loss': val_loss, "learning_Rate": curr_lr})

    # If you are using a scheduler in your train function within your iteration loop, you may want to log
    # your learning rate differently
    scheduler.step()

    # #Save model in drive location if val_acc is better than best recorded val_acc
    if val_acc >= best_valacc:
        #path = os.path.join(root, model_directory, 'checkpoint' + '.pth')
        print("Saving model")
        torch.save({'model_state_dict':model.state_dict(),
                  'optimizer_state_dict':optimizer.state_dict(),
                  'scheduler_state_dict':scheduler.state_dict(),
                  'val_acc': val_acc,
                  'epoch': epoch}, './checkpoint.pth')
        best_valacc = val_acc
        wandb.save(path)
      # You may find it interesting to exlplore Wandb Artifcats to version your models
run.finish()

# Classification Task: Testing

In [None]:
def test(model, dataloader):
    model.eval()
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Test')
    test_results = []

    for i, (images) in enumerate(dataloader):
        # TODO: Finish predicting on the test set.
        images = images.to(DEVICE)

        # Set the model to evaluation mode and use torch.no_grad()
        with torch.no_grad():
            outputs = model(images)

        outputs = torch.argmax(outputs, axis=1).detach().cpu().numpy().tolist()
        test_results.extend(outputs)

        batch_bar.update()

    batch_bar.close()
    return test_results


In [None]:
test_results = test(model, test_loader)

## Generate csv to submit to Kaggle

In [None]:
with open("classification_early_submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(test_dataset)):
        f.write("{},{}\n".format(str(i).zfill(6) + ".jpg", test_results[i]))

In [None]:
# !kaggle competitions submit -c 11-785-f23-hw2p2-classification -f classification_early_submission.csv -m "early submission"

# Verification Task: Validation

The verification task consists of the following generalized scenario:
- You are given X unknown identitites
- You are given Y known identitites
- Your goal is to match X unknown identities to Y known identities.

We have given you a verification dataset, that consists of 960 known identities, and 1080 unknown identities. The 1080 unknown identities are split into dev (360) and test (720). Your goal is to compare the unknown identities to the 1080 known identities and assign an identity to each image from the set of unknown identities. Some unknown identities do not have correspondence in known identities, you also need to identify these and label them with a special label n000000.

Your will use/finetune your model trained for classification to compare images between known and unknown identities using a similarity metric and assign labels to the unknown identities.

This will judge your model's performance in terms of the quality of embeddings/features it generates on images/faces it has never seen during training for classification.

In [None]:
# This obtains the list of known identities from the known folder
known_regex = "/kaggle/working/11-785-f23-hw2p2-verification/known/*/*"
known_paths = [i.split('/')[-2] for i in sorted(glob.glob(known_regex))]

# Obtain a list of images from unknown folders
unknown_dev_regex = "/kaggle/working/11-785-f23-hw2p2-verification/unknown_dev/*"
unknown_test_regex = "/kaggle/working/11-785-f23-hw2p2-verification/unknown_test/*"

# We load the images from known and unknown folders
unknown_dev_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_dev_regex)))]
unknown_test_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_test_regex)))]
known_images = [Image.open(p) for p in tqdm(sorted(glob.glob(known_regex)))]

# Why do you need only ToTensor() here?
# we need to change the images in original format to PyTorch's tensor format 
# to ensure that all the image data is in a consistent data type, which is necessary for further processing
# during cosine similarity computations between images as we are computing similarity (distance) between images and 
# ToTensor() is a prerequisite for using Cosine Similarity
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()])

unknown_dev_images = torch.stack([transforms(x) for x in unknown_dev_images])
unknown_test_images = torch.stack([transforms(x) for x in unknown_test_images])
known_images  = torch.stack([transforms(y) for y in known_images ])
#Print your shapes here to understand what we have done
print('unknown_dev_images\t', unknown_dev_images.shape)
print('unknown_test_images\t', unknown_test_images.shape)
print('known_images\t', known_images.shape)
# You can use other similarity metrics like Euclidean Distance if you wish
similarity_metric = torch.nn.CosineSimilarity(dim= 1, eps= 1e-6)

In [None]:
class CenterLoss(nn.Module):
    """Center Loss
    Center Loss Paper:
    https://ydwen.github.io/papers/WenECCV16.pdf
    Args:
        num_classes (int): The number of classes for your model.
        feat_dim (int): The dimension of your output feature.
    """
    def __init__(self, num_classes=7001, feat_dim=512):
        super(CenterLoss, self).__init__()
        self.num_classes = num_classes
        self.feat_dim = feat_dim

        # Initialize centers for each class.
        # The centers are learnable parameters, and you need to use the nn.Parameter
        # so that they are registered as model parameters.
        # We initialize them using random values, and they are moved to GPU.
        self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim).cuda())

    def forward(self, x, labels):
        """
        Args:
            x: Feature matrix with shape (batch_size, feat_dim).
            labels: Ground truth labels with shape (batch_size).
        """
        # Broadcast the centers for each input based on the labels.
        # This will create a tensor where centers[i] will contain the center of the true label of x[i].
        centers_batch = self.centers[labels]

        # Calculate the squared Euclidean distances between inputs and current centers.
        dist = torch.sum((x - centers_batch) ** 2, dim=1)

        # Clamp the distances to avoid NaN in log and to provide numerical stability.
        dist = torch.clamp(dist, min=1e-12, max=1e+12)

        # Calculate the mean loss across the batch.
        loss = torch.mean(dist)

        return loss


In [None]:
# Initialize the CenterLoss and its optimizer
center_loss = CenterLoss(num_classes=7001, feat_dim=512)
optimizer_center_loss = torch.optim.SGD(center_loss.parameters(), lr=0.1)
scheduler_center_loss = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer_center_loss, T_max=30, eta_min=0.001)

In [None]:
def train(model: nn.Module, 
          train_loader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer, 
          optimizer_center_loss: torch.optim.Optimizer, 
          criterion: nn.Module, 
          fine_tuning_loss: nn.Module,  # Center Loss as fine_tuning_loss
          loss_weight, 
          
          scaler: torch.cuda.amp.GradScaler, 
          device):
    
    model.train()
    batch_bar   = tqdm(total=len(train_loader), dynamic_ncols=True, leave=False, position=0, desc='Train', ncols=6)

    num_correct = 0
    total_loss_ft  = 0
    total_loss  = 0
    
    for i, (images, labels) in enumerate(train_loader):
        
        optimizer.zero_grad()
        optimizer_center_loss.zero_grad()
        
        images, labels = images.to(device), labels.to(device)

        with torch.cuda.amp.autocast():
            outputs, feats = model(images, return_feats=True)
            loss0 = criterion(outputs, labels)  # Calculate cross-entropy loss
            loss1 = fine_tuning_loss(feats, labels) * loss_weight  # Calculate weighted fine-tuning loss (Center Loss)
            
            
        # Update no. of correct predictions & loss as we iterate
        num_correct     += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss      += float(loss0.item())
        total_loss_ft      += float(loss1.item())
        
        batch_bar.set_postfix(
            acc         = "{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss        = "{:.04f}".format(float(total_loss / (i + 1))),
            loss_ft        = "{:.04f}".format(float(total_loss_ft / (i + 1))),
            num_correct = num_correct,
            lr          = "{:.04f}".format(float(optimizer_center_loss.param_groups[0]['lr']))
        )
        
        

        scaler.scale(loss0).backward(retain_graph=True)  # Backward pass for the classification loss
        scaler.scale(loss1).backward()  # Backward pass for the fine-tuning loss
        
        # update fine tuning loss' parameters
        # the paramerters should be adjusted according to the loss_weight you choose
        for parameter in fine_tuning_loss.parameters():
            parameter.grad.data *= (1.0 / loss_weight)

        scaler.step(optimizer_center_loss)  # Step optimizer for fine-tuning loss
        scaler.step(optimizer)  # Step optimizer for classification loss
        scaler.update()
    
        
        batch_bar.update() # Update tqdm bar
        
        del images, labels, outputs, loss0, loss1
        torch.cuda.empty_cache()

    batch_bar.close() # You need this to close the tqdm bar

    acc         = 100 * num_correct / (config['batch_size']* len(train_loader))
    total_loss  = float(total_loss / len(train_loader))
    total_loss_ft  = float(total_loss_ft / len(train_loader))
    
    return acc, total_loss, total_loss_ft

In [None]:
def validate_l(model, dataloader, criterion, fine_tuning_loss, loss_weight):

    model.eval()
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Val', ncols=6)

    num_correct = 0.0
    total_loss = 0.0
    total_loss_ft = 0.0

    for i, (images, labels) in enumerate(dataloader):

        # Move images to device
        images, labels = images.to(DEVICE), labels.to(DEVICE)

        # Get model outputs
        with torch.inference_mode():
            outputs, feats = model(images, return_feats=True)
            loss = criterion(outputs, labels)
            loss_ft = fine_tuning_loss(feats, labels)* loss_weight

        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())
        total_loss_ft      += float(loss_ft.item())

        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            loss_ft="{:.04f}".format(float(total_loss_ft / (i + 1))),
            num_correct=num_correct)

        batch_bar.update()

    batch_bar.close()
    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))
    total_loss_ft  = float(total_loss_ft / len(dataloader))
    return acc, total_loss, total_loss_ft

In [None]:
wandb.login(key="07b4b09ae74690496d4fe8aaf8e2230dc720df35")

In [None]:
# Create your wandb run
run = wandb.init(
    name = "final_fin-submission", ## Wandb creates random run names if you skip this field
    #     reinit = True, ### Allows reinitalizing runs when you re-run this cell
    id = "bt064ps8", ### Insert specific run id here if you want to resume a previous run
    resume = "must", ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw2p2-ablations", ### Project should be created in your wandb account 
    config = config ### Wandb Config for your run
)

In [None]:
# path ="/kaggle/input/cghnvjhb/checkpoint.pth"
# model.load_state_dict(torch.load(path)['model_state_dict'])

In [None]:
loss_weight = 0.002

best_loss = 2.5

for epoch in range(config['epochs']):

    curr_lr = float(optimizer_center_loss.param_groups[0]['lr'])

    train_acc, train_loss, ft_loss = train(model, train_loader, optimizer, optimizer_center_loss, criterion, center_loss, loss_weight, scaler, DEVICE)


    print("\nEpoch {}/{}: \nTrain Acc {:.04f}%\t Train Loss {:.04f}\t FT train loss {:.04f}\t Learning Rate ft {:.04f}".format(
        epoch + 1,
        config['epochs'],
        train_acc,
        train_loss,
        ft_loss,
        curr_lr))

    val_acc, val_loss, val_loss_ft = validate_l(model, valid_loader, criterion, center_loss, loss_weight)

    print("Val Acc {:.04f}%\t Val Loss {:.04f}\t FT Val Loss {:.04f}".format(val_acc, val_loss, val_loss_ft))

    wandb.log({"train_loss":train_loss, 'train_Acc': train_acc, 'validation_Acc':val_acc, 'ft_loss' :ft_loss,
               'validation_loss': val_loss, "learning_Rate_ft": curr_lr})

    # If you are using a scheduler in your train function within your iteration loop, you may want to log
    # your learning rate differently
    
    scheduler_center_loss.step()

    # #Save model in drive location if val_acc is better than best recorded val_acc
    if val_loss_ft <= best_loss:
#         path = os.path.join(root, model_directory, 'checkpoint.pth')
        
        print("Saving model")
        torch.save({'model_state_dict':model.state_dict(),
                  'optimizer_state_dict':optimizer.state_dict(),
                  'optimizer_center_loss_state_dict':optimizer_center_loss.state_dict(),
                  'scheduler_center_loss_state_dict':scheduler_center_loss.state_dict(),
                  'ft_loss': ft_loss,
                  'epoch': epoch}, './checkpoint.pth')
        best_loss = val_loss_ft
        wandb.save(path)
      # You may find it interesting to exlplore Wandb Artifcats to version your models
run.finish()

In [None]:
# path ="/kaggle/input/ghgghbjhb/checkpoint.pth"
# model.load_state_dict(torch.load(path)['model_state_dict'])

In [None]:
def eval_verification(unknown_images, known_images, model, similarity, batch_size= config['batch_size'], mode='val'):

    unknown_feats, known_feats = [], []

    batch_bar = tqdm(total=len(unknown_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    model.eval()

    # We load the images as batches for memory optimization and avoiding CUDA OOM errors
    for i in range(0, unknown_images.shape[0], batch_size):
        unknown_batch = unknown_images[i:i+batch_size] # Slice a given portion upto batch_size

        with torch.no_grad():
            _, unknown_feat = model(unknown_batch.float().to(DEVICE), return_feats=True) #Get features from model
        unknown_feats.append(unknown_feat)
        batch_bar.update()

    batch_bar.close()

    batch_bar = tqdm(total=len(known_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)

    for i in range(0, known_images.shape[0], batch_size):
        known_batch = known_images[i:i+batch_size]
        with torch.no_grad():
              _, known_feat = model(known_batch.float().to(DEVICE), return_feats=True)

        known_feats.append(known_feat)
        batch_bar.update()

    batch_bar.close()

    # Concatenate all the batches
    unknown_feats = torch.cat(unknown_feats, dim=0)
    known_feats = torch.cat(known_feats, dim=0)

    similarity_values = torch.stack([similarity(unknown_feats, known_feature) for known_feature in known_feats])
    # Print the inner list comprehension in a separate cell - what is really happening?

    max_similarity_values, predictions = similarity_values.max(0) #Why are we doing an max here, where are the return values?
    # we want to identify the target image that is most similar to the reference image
    # as we are doing image verification task, by finding the maximum similarity value and its corresponding index (prediction), 
    # you can determine which target image is the closest match to the reference image
    # therefore, the return values, max_similarity_values and predictions, will be used for subsequent processing or decision-making 
    # based on the identified most similar target image.
    max_similarity_values, predictions = max_similarity_values.cpu().numpy(), predictions.cpu().numpy()

    # Note that in unknown identities, there are identities without correspondence in known identities.
    # Therefore, these identities should be not similar to all the known identities, i.e. max similarity will be below a certain
    # threshold compared with those identities with correspondence.

    # In early submission, you can ignore identities without correspondence, simply taking identity with max similarity value
    # pred_id_strings = [known_paths[i] for i in predictions] # Map argmax indices to identity strings

    # After early submission, remove the previous line and uncomment the following code

    threshold = 0.5 # Choose a proper threshold
    NO_CORRESPONDENCE_LABEL = 'n000000'
    pred_id_strings = []
    for idx, prediction in enumerate(predictions):
        if max_similarity_values[idx] < threshold: # why < ? Think about what is your similarity metric
            pred_id_strings.append(NO_CORRESPONDENCE_LABEL)
        else:
            pred_id_strings.append(known_paths[prediction])

    if mode == 'val':
      true_ids = pd.read_csv('/kaggle/input/11-785-f23-hw2p2-verification/11-785-f23-hw2p2-verification/verification_dev.csv')['label'].tolist()
      accuracy = accuracy_score(pred_id_strings, true_ids)
      print("Verification Accuracy = {}".format(accuracy))

    return pred_id_strings

In [None]:
# verification eval
pred_id_strings = eval_verification(unknown_dev_images, known_images, model, similarity_metric, config['batch_size'], mode='val')
# verification test
pred_id_strings = eval_verification(unknown_test_images, known_images, model, similarity_metric, config['batch_size'], mode='test')

In [None]:
# add your finetune/retrain code here

## Generate csv to submit to Kaggle

In [None]:
with open("verification_early_submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(pred_id_strings)):
        f.write("{},{}\n".format(i, pred_id_strings[i]))

In [None]:
# !kaggle competitions submit -c 11-785-f23-hw2p2-verification -f verification_early_submission.csv -m "early submission"