# HW2P2: Face Classification and Verification


Congrats on coming to the second homework in 11785: Introduction to Deep Learning. This homework significantly longer and tougher than the previous homework. You have 2 sub-parts as outlined below. Please start early! 


*   Face Recognition: You will be writing your own CNN model to tackle the problem of classification, consisting of 7000 identities
*   Face Verification: You use the model trained for classification to evaluate the quality of its feature embeddings, by comparing the similarity of known and unknown identities

For this HW, you only have to write code to implement your model architecture. Everything else has been provided for you, on the pretext that most of your time will be used up in developing the suitable model architecture for achieving satisfactory performance.

Common errors which you may face in this homeworks (because of the size of the model)


*   CUDA Out of Memory (OOM): You can tackle this problem by (1) Reducing the batch size (2) Calling `torch.cuda.empty_cache()` and `gc.collect()` (3) Finally restarting the runtime



# Preliminaries

In [None]:
!nvidia-smi # to see what GPU you have

Wed Oct 26 22:56:54 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  A100-SXM4-40GB      Off  | 00000000:00:04.0 Off |                    0 |
| N/A   31C    P0    47W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!pip install wandb --quiet

[K     |████████████████████████████████| 1.9 MB 4.7 MB/s 
[K     |████████████████████████████████| 166 kB 78.1 MB/s 
[K     |████████████████████████████████| 182 kB 84.0 MB/s 
[K     |████████████████████████████████| 63 kB 1.7 MB/s 
[K     |████████████████████████████████| 166 kB 83.4 MB/s 
[K     |████████████████████████████████| 162 kB 86.8 MB/s 
[K     |████████████████████████████████| 162 kB 81.4 MB/s 
[K     |████████████████████████████████| 158 kB 83.3 MB/s 
[K     |████████████████████████████████| 157 kB 84.5 MB/s 
[K     |████████████████████████████████| 157 kB 86.2 MB/s 
[K     |████████████████████████████████| 157 kB 86.3 MB/s 
[K     |████████████████████████████████| 157 kB 85.0 MB/s 
[K     |████████████████████████████████| 157 kB 81.1 MB/s 
[K     |████████████████████████████████| 157 kB 88.7 MB/s 
[K     |████████████████████████████████| 157 kB 89.5 MB/s 
[K     |████████████████████████████████| 156 kB 77.6 MB/s 
[?25h  Building wheel for 

In [None]:
import torch
from torchsummary import summary
import torchvision #This library is used for image-based operations (Augmentations)
import os
import gc
from tqdm import tqdm
from PIL import Image
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
import glob
# from pytorch_metric_learning.losses import ArcFaceLoss
import wandb
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", device)

Device:  cuda


# TODOs
As you go, please read the code and keep an eye out for TODOs!

# Download Data from Kaggle

In [None]:
# !mkdir '/content/data'

!kaggle competitions download -c 11-785-f22-hw2p2-classification
!unzip -qo '11-785-f22-hw2p2-classification.zip' -d '/content/data'

!kaggle competitions download -c 11-785-f22-hw2p2-verification
!unzip -qo '11-785-f22-hw2p2-verification.zip' -d '/content/data'

Downloading 11-785-f22-hw2p2-classification.zip to /content
 99% 2.35G/2.37G [00:10<00:00, 274MB/s]
100% 2.37G/2.37G [00:11<00:00, 230MB/s]
Downloading 11-785-f22-hw2p2-verification.zip to /content
 71% 12.0M/16.8M [00:00<00:00, 124MB/s]
100% 16.8M/16.8M [00:00<00:00, 154MB/s]


# Configs

In [None]:
config = {
    'batch_size': 64, # Increase this if your GPU can handle it
    'lr': 0.1,
    'epochs': 120, 
    # "step_lr_step_size": 35,
    # "step_lr_gamma": 0.1 ,
    # "cosine_annealing_step_size": 20, 
}

# Classification Dataset

In [None]:
DATA_DIR = '/content/data/11-785-f22-hw2p2-classification/'
TRAIN_DIR = os.path.join(DATA_DIR, "classification/train") 
VAL_DIR = os.path.join(DATA_DIR, "classification/dev")
TEST_DIR = os.path.join(DATA_DIR, "classification/test")

# Transforms using torchvision - Refer https://pytorch.org/vision/stable/transforms.html

# Implementing the right transforms/augmentation methods is key to improving performance.
train_transforms = torchvision.transforms.Compose([
    torchvision.transforms.RandomHorizontalFlip(p=0.5),
    torchvision.transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    torchvision.transforms.RandomRotation(5),
    torchvision.transforms.ToTensor()
])

val_transforms = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

train_dataset = torchvision.datasets.ImageFolder(TRAIN_DIR, transform = train_transforms)
val_dataset = torchvision.datasets.ImageFolder(VAL_DIR, transform = val_transforms)

# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = config['batch_size'], 
                                           shuffle = True,num_workers = 4, pin_memory = True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size = config['batch_size'], 
                                         shuffle = False, num_workers = 2)

In [None]:
class ClassificationTestDataset(torch.utils.data.Dataset):
    def __init__(self, data_dir, transforms):
        self.data_dir   = data_dir
        self.transforms = transforms
        self.img_paths  = list(map(lambda fname: os.path.join(self.data_dir, fname), sorted(os.listdir(self.data_dir))))

    def __len__(self):
        return len(self.img_paths)
    
    def __getitem__(self, idx):
        return self.transforms(Image.open(self.img_paths[idx]))

In [None]:
test_dataset = ClassificationTestDataset(TEST_DIR, transforms = val_transforms)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = config['batch_size'], shuffle = False,
                         drop_last = False, num_workers = 2)

In [None]:
print("Number of classes: ", len(train_dataset.classes))
print("No. of train images: ", train_dataset.__len__())
print("Shape of image: ", train_dataset[0][0].shape)
print("Batch size: ", config['batch_size'])
print("Train batches: ", train_loader.__len__())
print("Val batches: ", val_loader.__len__())

Number of classes:  7000
No. of train images:  140000
Shape of image:  torch.Size([3, 224, 224])
Batch size:  64
Train batches:  2188
Val batches:  547


# Very Simple Network (for Mandatory Early Submission)

# SE Resnet50

In [None]:
import torch.nn.functional as F

class SEBlock(torch.nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, skip_connection=None):
        super().__init__()
        self.conv_layers = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1),
            torch.nn.BatchNorm2d(out_channels),
            torch.nn.ReLU(),
            torch.nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1),
            torch.nn.BatchNorm2d(out_channels),
            torch.nn.ReLU(),
            torch.nn.Conv2d(out_channels, out_channels * 4, kernel_size=1, stride=1),
            torch.nn.BatchNorm2d(out_channels * 4),
        )
        self.se_scaler = torch.nn.Sequential(
            torch.nn.AdaptiveAvgPool2d(1),
            torch.nn.Conv2d(out_channels * 4, out_channels * 4 // 16, kernel_size=1, stride=1),
            torch.nn.ReLU(),
            torch.nn.Conv2d(out_channels * 4 // 16, out_channels * 4, kernel_size=1, stride=1),
            torch.nn.Sigmoid()
        ) 
        self.skip_connection = torch.nn.Identity()
        if skip_connection is not None:
            self.skip_connection = skip_connection
        self.act = torch.nn.ReLU()

    def forward(self, input):
        output = self.conv_layers(input)
        scale = self.se_scaler(output)
        residual = self.skip_connection(input)
        return self.act(scale * output + residual)

class SEResnet50(torch.nn.Module):
    def __init__(self, in_channels, block, layers, num_classes=7000):
        super().__init__()
        self.curr_channels = 64
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            torch.nn.BatchNorm2d(64),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
        )
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.pre_final = torch.nn.Sequential(
            torch.nn.AdaptiveAvgPool2d((1,1)),
            torch.nn.Flatten()
        )
        self.cls_layer = torch.nn.Sequential(
            torch.nn.Linear(in_features=2048, out_features=num_classes),
        )
        # self.dropblock = DropBlock(7, 0.9)

    def _make_layer(self, block, layer_channels, num_blocks, stride=1):
            skip_connection = None
            if stride != 1 or self.curr_channels != layer_channels * 4:
                skip_connection = torch.nn.Sequential(
                    torch.nn.Conv2d(self.curr_channels, layer_channels * 4, kernel_size=1, stride=stride), 
                    torch.nn.BatchNorm2d(layer_channels * 4),
                )
            layers = []
            layers.append(block(self.curr_channels, layer_channels, stride, skip_connection))
            self.curr_channels = layer_channels * 4
            for i in range(1, num_blocks):
                layers.append(block(self.curr_channels, layer_channels))

            return torch.nn.Sequential(*layers)

    def forward(self, input, return_feats=False):
        feats = self.conv1(input)
        feats = self.layer1(feats)
        feats = self.layer2(feats)
        feats = self.layer3(feats)
        feats = self.layer4(feats)
        feats = self.pre_final(feats)
        if return_feats:
            return feats
        out = self.cls_layer(feats)
        return out

# model = SEResnet50(3, SEBlock, [3, 4, 6, 3]).to(device)
# summary(model, (3, 224, 224))

# Resnet34

In [None]:
# class ResidualBlock(torch.nn.Module):
#     def __init__(self, in_channels, out_channels, stride=1):
#         super().__init__()
#         self.conv_layers = torch.nn.Sequential(
#             torch.nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1),
#             torch.nn.BatchNorm2d(out_channels),
#             torch.nn.ReLU(),
#             torch.nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1),
#             torch.nn.BatchNorm2d(out_channels),
#         )
#         self.skip_connection = None
#         if in_channels == out_channels:
#             self.skip_connection = torch.nn.Identity()
#         else:
#             self.skip_connection = torch.nn.Sequential(
#                 torch.nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, padding=0),
#                 torch.nn.BatchNorm2d(out_channels),
#             )
#         self.act = torch.nn.ReLU()
    
#     def forward(self, input):
#         output = self.conv_layers(input)
#         output = self.act(output + self.skip_connection(input))
#         return output

In [None]:
# class Resnet34(torch.nn.Module):
#     def __init__(self, residual_block, in_channels=3, num_classes=7000):
#         super().__init__()
#         self.conv1 = torch.nn.Sequential(
#             torch.nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
#             torch.nn.BatchNorm2d(64),
#             torch.nn.ReLU(),
#             torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
#         )
#         self.conv2x = torch.nn.Sequential(
#             residual_block(in_channels=64, out_channels=64),
#             residual_block(in_channels=64, out_channels=64),
#             residual_block(in_channels=64, out_channels=64),
#         )
#         self.conv3x = torch.nn.Sequential(
#             residual_block(in_channels=64, out_channels=128, stride=2),
#             residual_block(in_channels=128, out_channels=128),
#             residual_block(in_channels=128, out_channels=128),
#             residual_block(in_channels=128, out_channels=128),
#         )
#         self.conv4x = torch.nn.Sequential(
#             residual_block(in_channels=128, out_channels=256, stride=2),
#             residual_block(in_channels=256, out_channels=256),
#             residual_block(in_channels=256, out_channels=256),
#             residual_block(in_channels=256, out_channels=256),
#             residual_block(in_channels=256, out_channels=256),
#             residual_block(in_channels=256, out_channels=256),
#         )
#         self.conv5x = torch.nn.Sequential(
#             residual_block(in_channels=256, out_channels=512, stride=2),
#             residual_block(in_channels=512, out_channels=512),
#             residual_block(in_channels=512, out_channels=512),
#         )
#         self.pre_final = torch.nn.Sequential(
#             torch.nn.AdaptiveAvgPool2d((1, 1)),
#             torch.nn.Flatten()
#         )
#         self.cls_layer = torch.nn.Linear(in_features=512, out_features=num_classes)

#     def forward(self, input, return_feats=False):
#         feats = self.conv1(input)
#         feats = self.conv2x(feats)
#         feats = self.conv3x(feats)
#         feats = self.conv4x(feats)
#         feats = self.conv5x(feats)
#         feats = self.pre_final(feats)
#         if return_feats:
#             return feats
#         out = self.cls_layer(feats)
#         return out

# model = Resnet34(ResidualBlock).to(device)
# summary(model, (3, 224, 224))

In [None]:
# class Network(torch.nn.Module):
#     """
#     The Very Low early deadline architecture is a 4-layer CNN.

#     The first Conv layer has 64 channels, kernel size 7, and stride 4.
#     The next three have 128, 256, and 512 channels. Each have kernel size 3 and stride 2.
    
#     Think about strided convolutions from the lecture, as convolutioin with stride= 1 and downsampling.
#     For stride 1 convolution, what padding do you need for preserving the spatial resolution? 
#     (Hint => padding = kernel_size // 2) - Why?)

#     Each Conv layer is accompanied by a Batchnorm and ReLU layer.
#     Finally, you want to average pool over the spatial dimensions to reduce them to 1 x 1. Use AdaptiveAvgPool2d.
#     Then, remove (Flatten?) these trivial 1x1 dimensions away.
#     Look through https://pytorch.org/docs/stable/nn.html 
    
#     TODO: Fill out the model definition below! 

#     Why does a very simple network have 4 convolutions?
#     Input images are 224x224. Note that each of these convolutions downsample.
#     Downsampling 2x effectively doubles the receptive field, increasing the spatial
#     region each pixel extracts features from. Downsampling 32x is standard
#     for most image models.

#     Why does a very simple network have high channel sizes?
#     Every time you downsample 2x, you do 4x less computation (at same channel size).
#     To maintain the same level of computation, you 2x increase # of channels, which 
#     increases computation by 4x. So, balances out to same computation.
#     Another intuition is - as you downsample, you lose spatial information. We want
#     to preserve some of it in the channel dimension.
#     """
#     def __init__(self, num_classes=7000):
#         super().__init__()
#         self.backbone = torch.nn.Sequential(
#             torch.nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=4),
#             torch.nn.BatchNorm2d(num_features=64),
#             torch.nn.ReLU(),
#             torch.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2),
#             torch.nn.BatchNorm2d(num_features=128),
#             torch.nn.ReLU(),
#             torch.nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2),
#             torch.nn.BatchNorm2d(num_features=256),
#             torch.nn.ReLU(),
#             torch.nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=2),
#             torch.nn.BatchNorm2d(num_features=512),
#             torch.nn.ReLU(),
#             torch.nn.AdaptiveAvgPool2d((1, 1)),
#             torch.nn.Flatten()
#         )
#         self.cls_layer = torch.nn.Linear(in_features=512, out_features=num_classes)
    
#     def forward(self, x, return_feats=False):
#         """
#         What is return_feats? It essentially returns the second-to-last-layer
#         features of a given image. It's a "feature encoding" of the input image,
#         and you can use it for the verification task. You would use the outputs
#         of the final classification layer for the classification task.

#         You might also find that the classification outputs are sometimes better
#         for verification too - try both.
#         """
#         feats = self.backbone(x)
#         out = self.cls_layer(feats)
#         if return_feats:
#             return feats
#         else:
#             return out
            
# model = Network().to(device)
# summary(model, (3, 224, 224))

# Setup everything for training

In [None]:
criterion = torch.nn.CrossEntropyLoss(label_smoothing=0.2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
# lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode=config["lr_rplt_mode"], factor=0.1, patience=config["lr_rplt_patience"], threshold=0.0001)
# lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=config['step_lr_step_size'], gamma=config['step_lr_gamma'])
cosine_annealing_t_max = len(train_loader) * config['epochs']
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=cosine_annealing_t_max)
# You can try ReduceLRonPlateau, StepLR, MultistepLR, CosineAnnealing, etc.
scaler = torch.cuda.amp.GradScaler() # Good news. We have FP16 (Mixed precision training) implemented for you
# It is useful only in the case of compatible GPUs such as T4/V100
# print(criterion)
# print(optimizer)
# print(lr_scheduler.state_dict())

CrossEntropyLoss()


# Let's train!

In [None]:
def train(model, dataloader, optimizer, criterion):
    model.train()
    # Progress Bar 
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, leave=False, position=0, desc='Train', ncols=5) 
    num_correct = 0
    total_loss = 0
    for i, (images, labels) in enumerate(dataloader):
        optimizer.zero_grad() # Zero gradients
        images, labels = images.to(device), labels.to(device)
        with torch.cuda.amp.autocast(): # This implements mixed precision. Thats it! 
            outputs_one = model(images)
            outputs_two = model(images, return_feats=True)
            loss = criterion(outputs_one, labels)
            # loss_two = criterion_two(outputs_two, labels)
            # loss = 0.7 * loss_one + 0.3 * loss_two
        # Update no. of correct predictions & loss as we iterate
        num_correct += int((torch.argmax(outputs_one, axis=1) == labels).sum())
        total_loss += float(loss.item())
        # tqdm lets you add some details so you can monitor training as you train.
        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct,
            lr="{:.04f}".format(float(optimizer.param_groups[0]['lr'])))
        scaler.scale(loss).backward() # This is a replacement for loss.backward()
        scaler.step(optimizer) # This is a replacement for optimizer.step()
        scaler.update() 
        batch_bar.update() # Update tqdm bar

    batch_bar.close() # You need this to close the tqdm bar
    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))

    return acc, total_loss

In [None]:
def validate(model, dataloader, criterion):
    model.eval()
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Val', ncols=5)
    num_correct = 0.0
    total_loss = 0.0
    for i, (images, labels) in enumerate(dataloader):
        # Move images to device
        images, labels = images.to(device), labels.to(device)
        # Get model outputs
        with torch.inference_mode():
            outputs = model(images)
            loss = criterion(outputs, labels)

        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())
        
        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct)

        batch_bar.update()
        
    batch_bar.close()
    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))
    return acc, total_loss

# Convnext Tiny

In [None]:
class ConvNextBlock(torch.nn.Module):
    def __init__(self, layer_channels, stride=1):
        super().__init__()
        self.conv_layers = torch.nn.Sequential(
            torch.nn.Conv2d(layer_channels, layer_channels,
                            kernel_size=7, stride=1, padding=3, groups=layer_channels),
            torch.nn.BatchNorm2d(layer_channels),
            torch.nn.Conv2d(layer_channels, layer_channels * 4, kernel_size=1,),
            torch.nn.GELU(),
            torch.nn.Conv2d(layer_channels * 4, layer_channels, kernel_size=1,)
        )
        self.skip_connection = torch.nn.Identity()

    def forward(self, input):
        output = self.conv_layers(input)
        residual = self.skip_connection(input)
        return output + residual


class ConvNextT(torch.nn.Module):
    def __init__(self, in_channels, block, layers, num_classes=7000):
        super().__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels, 96, kernel_size=4,
                            stride=4, padding=1,),
            torch.nn.BatchNorm2d(96),
        )
        self.layer1 = self._make_layer(block, 96, layers[0])
        self.downsample1 = torch.nn.Sequential(
            torch.nn.BatchNorm2d(96),
            torch.nn.Conv2d(96, 192, stride=2, kernel_size=2)
        )

        self.layer2 = self._make_layer(block, 192, layers[1])
        self.downsample2 = torch.nn.Sequential(
            torch.nn.BatchNorm2d(192),
            torch.nn.Conv2d(192, 384, stride=2, kernel_size=2)
        )

        self.layer3 = self._make_layer(block, 384, layers[2])
        self.downsample3 = torch.nn.Sequential(
            torch.nn.BatchNorm2d(384),
            torch.nn.Conv2d(384, 768, stride=2, kernel_size=2)
        )

        self.layer4 = self._make_layer(block, 768, layers[3])

        self.pre_final = torch.nn.Sequential(
            torch.nn.AdaptiveAvgPool2d((1, 1)),
            torch.nn.Flatten()
        )

        self.cls_layer = torch.nn.Sequential(
            torch.nn.Linear(in_features=768, out_features=num_classes),
        )

    def _make_layer(self, block, layer_channels, num_blocks):
        layers = []
        for _ in range(num_blocks):
            layers.append(block(layer_channels))
        return torch.nn.Sequential(*layers)

    def forward(self, input, return_feats=False):
        feats = self.conv1(input)
        feats = self.layer1(feats)
        feats = self.downsample1(feats)
        feats = self.layer2(feats)
        feats = self.downsample2(feats)
        feats = self.layer3(feats)
        feats = self.downsample3(feats)
        feats = self.layer4(feats)
        feats = self.pre_final(feats)
        if return_feats:
            return feats
        out = self.cls_layer(feats)
        return out

In [None]:
class MyEnsemble(torch.nn.Module):
    def __init__(self, modelA, modelB, modelC, w1, w2, w3):
        super(MyEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB
        self.modelC = modelC
        self.w1 = w1
        self.w2 = w2
        self.w3 = w3

    def forward(self, x, return_feats=False):
        if return_feats:
          pred_a = self.modelA(x, return_feats)
          pred_b = self.modelB(x, return_feats)
          # pred_c = self.modelC(x, return_feats)
          return torch.cat((pred_a, pred_b), dim=1)
        else:
          pred_a = self.modelA(x, return_feats)
          pred_b = self.modelB(x, return_feats)
          pred_c = self.modelC(x, return_feats)
          return self.w1 * pred_a + self.w2 * pred_b + self.w3 * pred_c

# Create models and load state_dicts
modelA = SEResnet50(3, SEBlock, [3, 4, 6, 3]).to(device)
modelB = ConvNextT(3, ConvNextBlock, [3, 3, 9, 3]).to(device)
modelC = SEResnet50(3, SEBlock, [3, 4, 6, 3]).to(device)
# Load state dicts
checkpointA = torch.load("./seresnet50_v2_checkpoint.pth")
checkpointB = torch.load("./convnext_checkpoint_v2.pth")
checkpointC = torch.load("./seresnet_checkpoint_v1.pth")

modelA.load_state_dict(checkpointA["model_state_dict"])
modelB.load_state_dict(checkpointB["model_state_dict"])
modelC.load_state_dict(checkpointC["model_state_dict"])

# while True:
#   ws = np.random.uniform(size=(3,))
#   ws = ws / np.sum(ws)
  # print(ws)
model = MyEnsemble(modelA, modelB, modelC, 0.45, 0.45, 0.1) # 0.5, 0.25, 0.25
# model = MyEnsemble(modelA, modelB, modelC)
val_acc, val_loss = validate(model, val_loader, criterion)
print(val_acc, val_loss)

                                                                                                     

91.33055301645338 3.0548557502913956




In [None]:
gc.collect() # These commands help you when you face CUDA OOM error
torch.cuda.empty_cache()

# Wandb

In [None]:
# Create your wandb run
run_name = "med-try20"
run = wandb.init(
    name = run_name, ## Wandb creates random run names if you skip this field
    reinit = True, ### Allows reinitalizing runs when you re-run this cell
    # run_id = ### Insert specific run id here if you want to resume a previous run
    # resume = "must", ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw2p2", ### Project should be created in your wandb account 
    config = config ### Wandb Config for your run
)

[34m[1mwandb[0m: Currently logged in as: [33mbevani[0m. Use [1m`wandb login --relogin`[0m to force relogin


# Experiments

In [None]:
# # model = SEResnet50(3, SEBlock, [3, 4, 6, 3]).to(device)
# # model = Resnet34(ResidualBlock).to(device)
model = ConvNextT(3, ConvNextBlock, [3, 3, 9, 3]).to(device)
print("Initialized Model")
checkpoint = torch.load("./checkpoint.pth")
model.load_state_dict(checkpoint["model_state_dict"])
# # print("Loaded checkpoint Model")
# # optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
# # print("Loaded checkpoint optimizer")
# # lr_scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
# # print("Loaded checkpoint scheduler")

Initialized Model


<All keys matched successfully>

In [None]:
# best_valacc = 0.0 # comment out later !!!
for epoch in range(config['epochs']):
    curr_lr = float(optimizer.param_groups[0]['lr'])
    train_acc, train_loss = train(model, train_loader, optimizer, criterion_one, criterion_two)
    print("\nEpoch {}/{}: \nTrain Acc {:.04f}%\t Train Loss {:.04f}\t Learning Rate {:.04f}".format(
        epoch + 1,
        config['epochs'],
        train_acc,
        train_loss,
        curr_lr))
    val_acc, val_loss = validate(model, val_loader, criterion_one)
    print("Val Acc {:.04f}%\t Val Loss {:.04f}".format(val_acc, val_loss))
    wandb.log({"train_loss":train_loss, 'train_Acc': train_acc, 'validation_Acc':val_acc, 
               'validation_loss': val_loss, "learning_Rate": curr_lr})
    # If you are using a scheduler in your train function within your iteration loop, you may want to log
    # your learning rate differently 
    # #Save model in drive location if val_acc is better than best recorded val_acc
    lr_scheduler.step()
    if val_acc > best_valacc:
      #path = os.path.join(root, model_directory, 'checkpoint' + '.pth')
      print("Saving model")
      torch.save({'model_state_dict': model.state_dict(),
                  'optimizer_state_dict': optimizer.state_dict(),
                  # 'scheduler_state_dict': lr_scheduler.state_dict(),
                  'val_acc': val_acc, 
                  'epoch': epoch}, './checkpoint.pth')
      best_valacc = val_acc
      wandb.save('checkpoint.pth')
      # You may find it interesting to exlplore Wandb Artifcats to version your models
run.finish()




Epoch 1/120: 
Train Acc 82.2569%	 Train Loss 13.1877	 Learning Rate 0.1000




Val Acc 69.9783%	 Val Loss 3.8784





Epoch 2/120: 
Train Acc 83.7887%	 Train Loss 12.9242	 Learning Rate 0.0994




Val Acc 71.2780%	 Val Loss 3.8339
Saving model





Epoch 3/120: 
Train Acc 85.3084%	 Train Loss 12.7099	 Learning Rate 0.0976




Val Acc 73.7431%	 Val Loss 3.7475
Saving model





Epoch 4/120: 
Train Acc 86.7373%	 Train Loss 12.5212	 Learning Rate 0.0946




Val Acc 75.8769%	 Val Loss 3.6567
Saving model





Epoch 5/120: 
Train Acc 88.0192%	 Train Loss 12.3497	 Learning Rate 0.0905




Val Acc 73.0919%	 Val Loss 3.7528





Epoch 6/120: 
Train Acc 89.1239%	 Train Loss 12.1874	 Learning Rate 0.0854




Val Acc 70.8809%	 Val Loss 3.8309





Epoch 7/120: 
Train Acc 90.6014%	 Train Loss 12.0177	 Learning Rate 0.0794




Val Acc 76.9196%	 Val Loss 3.6211
Saving model





Epoch 8/120: 
Train Acc 92.0340%	 Train Loss 11.8427	 Learning Rate 0.0727




Val Acc 77.6423%	 Val Loss 3.6016
Saving model





Epoch 9/120: 
Train Acc 93.3615%	 Train Loss 11.6606	 Learning Rate 0.0655




Val Acc 78.8191%	 Val Loss 3.5468
Saving model





Epoch 10/120: 
Train Acc 94.7333%	 Train Loss 11.4618	 Learning Rate 0.0578




Val Acc 79.2647%	 Val Loss 3.5497
Saving model





Epoch 11/120: 
Train Acc 96.0523%	 Train Loss 11.2464	 Learning Rate 0.0500




Val Acc 81.0700%	 Val Loss 3.4605
Saving model





Epoch 12/120: 
Train Acc 97.1149%	 Train Loss 11.0141	 Learning Rate 0.0422


                                               

Val Acc 82.0727%	 Val Loss 3.4550
Saving model





Epoch 13/120: 
Train Acc 98.1490%	 Train Loss 10.7534	 Learning Rate 0.0345




Val Acc 83.4124%	 Val Loss 3.4140
Saving model





Epoch 14/120: 
Train Acc 98.8067%	 Train Loss 10.4747	 Learning Rate 0.0273




Val Acc 84.0379%	 Val Loss 3.3941
Saving model





Epoch 15/120: 
Train Acc 99.3237%	 Train Loss 10.1811	 Learning Rate 0.0206




Val Acc 84.6092%	 Val Loss 3.4466
Saving model





Epoch 16/120: 
Train Acc 99.5858%	 Train Loss 9.8783	 Learning Rate 0.0146




Val Acc 85.6776%	 Val Loss 3.3968
Saving model





Epoch 17/120: 
Train Acc 99.7343%	 Train Loss 9.5863	 Learning Rate 0.0095




Val Acc 85.9918%	 Val Loss 3.4243
Saving model





Epoch 18/120: 
Train Acc 99.8086%	 Train Loss 9.3410	 Learning Rate 0.0054




Val Acc 86.3831%	 Val Loss 3.4286
Saving model





Epoch 19/120: 
Train Acc 99.8536%	 Train Loss 9.1637	 Learning Rate 0.0024




Val Acc 86.5745%	 Val Loss 3.4420
Saving model





Epoch 20/120: 
Train Acc 99.8793%	 Train Loss 9.0651	 Learning Rate 0.0006




Val Acc 86.5973%	 Val Loss 3.4730
Saving model





Epoch 21/120: 
Train Acc 99.8829%	 Train Loss 9.0377	 Learning Rate 0.0000




Val Acc 86.5059%	 Val Loss 3.5000





Epoch 22/120: 
Train Acc 99.8800%	 Train Loss 9.0444	 Learning Rate 0.0006




Val Acc 86.5402%	 Val Loss 3.4672





Epoch 23/120: 
Train Acc 99.8693%	 Train Loss 9.0727	 Learning Rate 0.0024




Val Acc 86.4574%	 Val Loss 3.4796


Train:  47%|████▋     | 1034/2188 [02:52<03:12,  6.00it/s, acc=99.8806%, loss=9.0954, lr=0.0054, num_correct=66097]

KeyboardInterrupt: ignored

# Classification Task: Testing

In [None]:
def test(model,dataloader):
  model.eval()
  batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Test')
  test_results = []
  for i, (images) in enumerate(dataloader):
      # TODO: Finish predicting on the test set.
      images = images.to(device)
      with torch.inference_mode():
        outputs = model(images)
      outputs = torch.argmax(outputs, axis=1).detach().cpu().numpy().tolist()
      test_results.extend(outputs)
      batch_bar.update()
      
  batch_bar.close()
  return test_results

In [None]:
test_results = test(model, test_loader)



## Generate csv to submit to Kaggle

In [None]:
with open("classification_submission.csv", "w") as f:
    f.write("id,label\n")
    for i in range(len(test_dataset)):
        f.write("{},{}\n".format(str(i).zfill(6) + ".jpg", test_results[i]))

In [None]:
### Submit to kaggle competition using kaggle API
!kaggle competitions submit -c 11-785-f22-hw2p2-classification -f ./classification_submission.csv -m "thirteenth submission"

100% 541k/541k [00:01<00:00, 438kB/s]
Successfully submitted to Face Recognition

# Verification Task: Validation

The verification task consists of the following generalized scenario:
- You are given X unknown identitites 
- You are given Y known identitites
- Your goal is to match X unknown identities to Y known identities.

We have given you a verification dataset, that consists of 1000 known identities, and 1000 unknown identities. The 1000 unknown identities are split into dev (200) and test (800). Your goal is to compare the unknown identities to the 1000 known identities and assign an identity to each image from the set of unknown identities. 

Your will use/finetune your model trained for classification to compare images between known and unknown identities using a similarity metric and assign labels to the unknown identities. 

This will judge your model's performance in terms of the quality of embeddings/features it generates on images/faces it has never seen during training for classification.

In [None]:
known_regex = "/content/data/verification/known/*/*"
known_paths = [i.split('/')[-2] for i in sorted(glob.glob(known_regex))] 
# This obtains the list of known identities from the known folder

unknown_regex = "/content/data/verification/unknown_dev/*" #Change the directory accordingly for the test set
unknown_regex_test = "/content/data/verification/unknown_test/*"

# We load the images from known and unknown folders
unknown_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_regex)))]
known_images = [Image.open(p) for p in tqdm(sorted(glob.glob(known_regex)))]
unknown_images_test = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_regex_test)))]

# Why do you need only ToTensor() here?
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()])

unknown_images = torch.stack([transforms(x) for x in unknown_images])
known_images  = torch.stack([transforms(y) for y in known_images])
unknown_images_test = torch.stack([transforms(x) for x in unknown_images_test])

#Print your shapes here to understand what we have done
print(unknown_images.size())
print(known_images.size())
print(unknown_images_test.size())

# You can use other similarity metrics like Euclidean Distance if you wish
similarity_metric = torch.nn.CosineSimilarity(dim= 1, eps= 1e-6)

100%|██████████| 200/200 [00:00<00:00, 6392.49it/s]
100%|██████████| 1000/1000 [00:00<00:00, 12559.19it/s]
100%|██████████| 800/800 [00:00<00:00, 12137.04it/s]


torch.Size([200, 3, 224, 224])
torch.Size([1000, 3, 224, 224])
torch.Size([800, 3, 224, 224])


In [None]:
def eval_verification(unknown_images, known_images, model, similarity, batch_size= config['batch_size'], mode='val'): 
    unknown_feats, known_feats = [], []
    batch_bar = tqdm(total=len(unknown_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    model.eval()

    # We load the images as batches for memory optimization and avoiding CUDA OOM errors
    for i in range(0, unknown_images.shape[0], batch_size):
        unknown_batch = unknown_images[i:i+batch_size] # Slice a given portion upto batch_size
        
        with torch.no_grad():
            unknown_feat = model(unknown_batch.float().to(device), return_feats=True) #Get features from model         
        unknown_feats.append(unknown_feat)
        batch_bar.update()
    
    batch_bar.close()
    
    batch_bar = tqdm(total=len(known_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    
    for i in range(0, known_images.shape[0], batch_size):
        known_batch = known_images[i:i+batch_size] 
        with torch.no_grad():
              known_feat = model(known_batch.float().to(device), return_feats=True)
          
        known_feats.append(known_feat)
        batch_bar.update()

    batch_bar.close()

    # Concatenate all the batches
    unknown_feats = torch.cat(unknown_feats, dim=0)
    known_feats = torch.cat(known_feats, dim=0)

    similarity_values = torch.stack([similarity(unknown_feats, known_feature) for known_feature in known_feats])
    # Print the inner list comprehension in a separate cell - what is really happening?

    predictions = similarity_values.argmax(0).cpu().numpy() #Why are we doing an argmax here?

    # Map argmax indices to identity strings
    pred_id_strings = [known_paths[i] for i in predictions]
    
    if mode == 'val':
      true_ids = pd.read_csv('/content/data/verification/dev_identities.csv')['label'].tolist()
      accuracy = accuracy_score(pred_id_strings, true_ids)
      print("Verification Accuracy = {}".format(accuracy))
    
    return pred_id_strings

In [None]:
pred_id_strings = eval_verification(unknown_images, known_images, modelB, similarity_metric, config['batch_size'], mode='val')



Verification Accuracy = 0.67


In [None]:
pred_id_strings_test = eval_verification(unknown_images_test, known_images, model, similarity_metric, config['batch_size'], mode='test')



In [None]:
with open("verification_submission.csv", "w") as f:
    f.write("id,label\n")
    for i in range(len(pred_id_strings_test)):
        f.write("{},{}\n".format(i, pred_id_strings_test[i]))

In [None]:
!kaggle competitions submit -c 11-785-f22-hw2p2-verification -f ./verification_submission.csv -m "thirteenth submission"

100% 9.28k/9.28k [00:01<00:00, 5.98kB/s]
Successfully submitted to Face Verification