# HW2P2: Face Classification and Verification


Congrats on coming to the second homework in 11785: Introduction to Deep Learning. This homework significantly longer and tougher than the previous homework. You have 2 sub-parts as outlined below. Please start early! 


*   Face Recognition: You will be writing your own CNN model to tackle the problem of classification, consisting of 7000 identities
*   Face Verification: You use the model trained for classification to evaluate the quality of its feature embeddings, by comparing the similarity of known and unknown identities

For this HW, you only have to write code to implement your model architecture. Everything else has been provided for you, on the pretext that most of your time will be used up in developing the suitable model architecture for achieving satisfactory performance.

Common errors which you may face in this homeworks (because of the size of the model)


*   CUDA Out of Memory (OOM): You can tackle this problem by (1) Reducing the batch size (2) Calling `torch.cuda.empty_cache()` and `gc.collect()` (3) Finally restarting the runtime



# Preliminaries

In [None]:
!nvidia-smi # to see what GPU you have

Thu Oct 27 22:14:23 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   77C    P0    57W /  70W |  11347MiB / 15109MiB |     88%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!pip install wandb --quiet

In [None]:
import torch
from torchsummary import summary
import torchvision #This library is used for image-based operations (Augmentations)
import os
import gc
from tqdm import tqdm
from PIL import Image
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
import glob
import wandb
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", device)

Device:  cpu


In [None]:
# from google.colab import drive # Link your drive if you are a colab user
# drive.mount('/content/drive') # Models in this HW take a long time to get trained and make sure to save it her

# TODOs
As you go, please read the code and keep an eye out for TODOs!

# Download Data from Kaggle

In [None]:
!pip install --upgrade --force-reinstall --no-deps kaggle==1.5.8
!mkdir /root/.kaggle

with open("/root/.kaggle/kaggle.json", "w+") as f:
    f.write('{"username":"almutwakelhassan","key":"********************"}') 
    # Put your kaggle username & key here

!chmod 600 /root/.kaggle/kaggle.json

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting kaggle==1.5.8
  Downloading kaggle-1.5.8.tar.gz (59 kB)
[K     |████████████████████████████████| 59 kB 3.3 MB/s 
[?25hBuilding wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Created wheel for kaggle: filename=kaggle-1.5.8-py3-none-any.whl size=73275 sha256=47e0bf927fc3b76f6678cf22a1c6cf47627965a333a075e7f71f8ff7995f3ab5
  Stored in directory: /root/.cache/pip/wheels/de/f7/d8/c3902cacb7e62cb611b1ad343d7cc07f42f7eb76ae3a52f3d1
Successfully built kaggle
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.12
    Uninstalling kaggle-1.5.12:
      Successfully uninstalled kaggle-1.5.12
Successfully installed kaggle-1.5.8


In [None]:
!mkdir '/content/data'

!kaggle competitions download -c 11-785-f22-hw2p2-classification
!unzip -qo '11-785-f22-hw2p2-classification.zip' -d '/content/data'

!kaggle competitions download -c 11-785-f22-hw2p2-verification
!unzip -qo '11-785-f22-hw2p2-verification.zip' -d '/content/data'

Downloading 11-785-f22-hw2p2-classification.zip to /content
100% 2.36G/2.37G [00:10<00:00, 256MB/s]
100% 2.37G/2.37G [00:10<00:00, 235MB/s]
Downloading 11-785-f22-hw2p2-verification.zip to /content
 54% 9.00M/16.8M [00:00<00:00, 91.5MB/s]
100% 16.8M/16.8M [00:00<00:00, 142MB/s] 


# Configs

In [None]:
config = {
    'batch_size': 32, # Increase this if your GPU can handle it
    'lr': 0.01,
    'epochs': 20, # 10 epochs is recommended ONLY for the early submission - you will have to train for much longer typically.
    # Include other parameters as needed.
}

# Classification Dataset

In [None]:
from torchvision.transforms.transforms import RandomEqualize
DATA_DIR = '/content/data/11-785-f22-hw2p2-classification/'# TODO: Path where you have downloaded the data
TRAIN_DIR = os.path.join(DATA_DIR, "classification/train") 
VAL_DIR = os.path.join(DATA_DIR, "classification/dev")
TEST_DIR = os.path.join(DATA_DIR, "classification/test")

# Transforms using torchvision - Refer https://pytorch.org/vision/stable/transforms.html

train_transforms = torchvision.transforms.Compose([
                    torchvision.transforms.RandomHorizontalFlip(0.3),
                    torchvision.transforms.ColorJitter(brightness=(0.8, 1.2), contrast=(0.8,1.2), saturation=(0.8,1.2)),
                    torchvision.transforms.GaussianBlur(3),
                    torchvision.transforms.RandomRotation((-10, 10)),
                    torchvision.transforms.RandomAutocontrast(0.25),
                    # torchvision.transforms.RandomEqualize(1),
                    torchvision.transforms.RandomAdjustSharpness(1.5, 0.2),

    # Implementing the right transforms/augmentation methods is key to improving performance.
                    torchvision.transforms.ToTensor(),
                    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                    ])
# Most torchvision transforms are done on PIL images. So you convert it into a tensor at the end with ToTensor()
# But there are some transforms which are performed after ToTensor() : e.g - Normalization
# Normalization Tip - Do not blindly use normalization that is not suitable for this dataset

val_transforms = torchvision.transforms.Compose([
    # torchvision.transforms.GaussianBlur(3),
    torchvision.transforms.ToTensor(),
    # torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

val_transforms_normalized = torchvision.transforms.Compose([
    # torchvision.transforms.GaussianBlur(3),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])


train_dataset = torchvision.datasets.ImageFolder(TRAIN_DIR, transform = train_transforms)
val_dataset = torchvision.datasets.ImageFolder(VAL_DIR, transform = val_transforms)
val_dataset_normalized = torchvision.datasets.ImageFolder(VAL_DIR, transform = val_transforms_normalized)

# You should NOT have data augmentation on the validation set. Why?


# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = config['batch_size'], 
                                           shuffle = True,num_workers = 4, pin_memory = True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size = config['batch_size'], 
                                         shuffle = False, num_workers = 2)
val_loader_normalized = torch.utils.data.DataLoader(val_dataset, batch_size = config['batch_size'], 
                                         shuffle = False, num_workers = 2)

In [None]:
# You can do this with ImageFolder as well, but it requires some tweaking
class ClassificationTestDataset(torch.utils.data.Dataset):

    def __init__(self, data_dir, transforms):
        self.data_dir   = data_dir
        self.transforms = transforms

        # This one-liner basically generates a sorted list of full paths to each image in the test directory
        self.img_paths  = list(map(lambda fname: os.path.join(self.data_dir, fname), sorted(os.listdir(self.data_dir))))

    def __len__(self):
        return len(self.img_paths)
    
    def __getitem__(self, idx):
        return self.transforms(Image.open(self.img_paths[idx]))

In [None]:
test_dataset = ClassificationTestDataset(TEST_DIR, transforms = val_transforms) #Why are we using val_transforms for Test Data?
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = config['batch_size'], shuffle = False,
                         drop_last = False, num_workers = 2)

In [None]:
# ld = iter(train_loader)
# test_item = next(ld)
# print(test_item[1])

# test_item = next(ld)
# print(test_item[1])

In [None]:
print("Number of classes: ", len(train_dataset.classes))
print("No. of train images: ", train_dataset.__len__())
print("Shape of image: ", train_dataset[0][0].shape)
print("Batch size: ", config['batch_size'])
print("Train batches: ", train_loader.__len__())
print("Val batches: ", val_loader.__len__())

Number of classes:  7000
No. of train images:  140000
Shape of image:  torch.Size([3, 224, 224])
Batch size:  32
Train batches:  4375
Val batches:  1094


# Very Simple Network (for Mandatory Early Submission)

In [None]:
# , output
class Network(torch.nn.Module):
    """
    The Very Low early deadline architecture is a 4-layer CNN.

    The first Conv layer has 64 channels, kernel size 7, and stride 4.
    The next three have 128, 256, and 512 channels. Each have kernel size 3 and stride 2.
    
    Think about strided convolutions from the lecture, as convolutioin with stride= 1 and downsampling.
    For stride 1 convolution, what padding do you need for preserving the spatial resolution? 
    (Hint => padding = kernel_size // 2) - Why?)

    Each Conv layer is accompanied by a Batchnorm and ReLU layer.
    Finally, you want to average pool over the spatial dimensions to reduce them to 1 x 1. Use AdaptiveAvgPool2d.
    Then, remove (Flatten?) these trivial 1x1 dimensions away.
    Look through https://pytorch.org/docs/stable/nn.html 
    
    TODO: Fill out the model definition below! 

    Why does a very simple network have 4 convolutions?
    Input images are 224x224. Note that each of these convolutions downsample.
    Downsampling 2x effectively doubles the receptive field, increasing the spatial
    region each pixel extracts features from. Downsampling 32x is standard
    for most image models.

    Why does a very simple network have high channel sizes?
    Every time you downsample 2x, you do 4x less computation (at same channel size).
    To maintain the same level of computation, you 2x increase # of channels, which 
    increases computation by 4x. So, balances out to same computation.
    Another intuition is - as you downsample, you lose spatial information. We want
    to preserve some of it in the channel dimension.
    """

    def __init__(self, num_classes=7000):
        super().__init__()

        self.backbone = torch.nn.Sequential(
          torch.nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=4),
          torch.nn.BatchNorm2d(64),
          torch.nn.ReLU(),

          torch.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2),
          torch.nn.BatchNorm2d(128),
          torch.nn.ReLU(),

          torch.nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2),
          torch.nn.BatchNorm2d(256),
          torch.nn.ReLU(),
            
          torch.nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=2),
          torch.nn.BatchNorm2d(512),
          torch.nn.ReLU(),

          torch.nn.AdaptiveAvgPool2d(1),

          torch.nn.Flatten()

        ) 
        
        self.cls_layer = torch.nn.Linear(512, 7000)
    
    def forward(self, x, return_feats=False):
        """
        What is return_feats? It essentially returns the second-to-last-layer
        features of a given image. It's a "feature encoding" of the input image,
        and you can use it for the verification task. You would use the outputs
        of the final classification layer for the classification task.

        You might also find that the classification outputs are sometimes better
        for verification too - try both.
        """
        feats = self.backbone(x)
        out = self.cls_layer(feats)

        if return_feats:
            return feats
        else:
            return out
            
model = Network().to(device)
summary(model, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 55, 55]           9,472
       BatchNorm2d-2           [-1, 64, 55, 55]             128
              ReLU-3           [-1, 64, 55, 55]               0
            Conv2d-4          [-1, 128, 27, 27]          73,856
       BatchNorm2d-5          [-1, 128, 27, 27]             256
              ReLU-6          [-1, 128, 27, 27]               0
            Conv2d-7          [-1, 256, 13, 13]         295,168
       BatchNorm2d-8          [-1, 256, 13, 13]             512
              ReLU-9          [-1, 256, 13, 13]               0
           Conv2d-10            [-1, 512, 6, 6]       1,180,160
      BatchNorm2d-11            [-1, 512, 6, 6]           1,024
             ReLU-12            [-1, 512, 6, 6]               0
AdaptiveAvgPool2d-13            [-1, 512, 1, 1]               0
          Flatten-14                  [

In [None]:
gc.collect() # These commands help you when you face CUDA OOM error
torch.cuda.empty_cache()

In [None]:
class ResNetBlock(torch.nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):

        super().__init__()
        
        downsampler = None  
        if stride != 1 or in_channels != out_channels:
            downsampler = torch.nn.Sequential(            
                torch.nn.Conv2d(in_channels, out_channels, 1, stride, bias=False),
                torch.nn.BatchNorm2d(out_channels),
            )

        self.downsampler = downsampler

        v1_5 = False

        if v1_5:
          self.ResNetChunk = torch.nn.Sequential(
          # bias = False. Why? "Biases are in the BatchNorm layers that follow" (-Kaiming He)
              torch.nn.Conv2d(in_channels, in_channels, 3, stride=1, padding=1, bias=False),
              torch.nn.BatchNorm2d(in_channels),
              torch.nn.ReLU(),
          # ResNet V1.5 uses stride in second convolution instead of first for improved performance:
              torch.nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1, bias=False),
              torch.nn.BatchNorm2d(out_channels),
          )
        else:
          self.ResNetChunk = torch.nn.Sequential(
              torch.nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1, bias=False),
              torch.nn.BatchNorm2d(out_channels),
              torch.nn.ReLU(),
              torch.nn.Conv2d(out_channels, out_channels, 3, stride=1, padding=1, bias=False),
              torch.nn.BatchNorm2d(out_channels),
          )
  
    def forward(self, input):

        identity_fn = input

        if self.downsampler is not None:
            identity_fn = self.downsampler.forward(identity_fn)

        delta = self.ResNetChunk.forward(input)

        output = identity_fn + delta
        
        relu_layer = torch.nn.ReLU()
        output = relu_layer.forward(output)

        return output



class ResNet(torch.nn.Module):
    """

    """
    def __init__(self, num_classes=7000, blocks_per_layer=None):
        super().__init__()

        self.block_layers = []

        if blocks_per_layer is None:
            self.blocks_per_layer = [3, 4, 6, 3]
        else:
            self.blocks_per_layer = blocks_per_layer

        self.entry_layers = torch.nn.Sequential(
            torch.nn.Conv2d(3, 64, 7, stride=2, padding=3, bias=False),
            torch.nn.BatchNorm2d(64),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(3, stride=2, padding=1)
        )
        
        in_channels = 64
        out_channels = 128

        self.block_layers.append(self.block_constructor(in_channels, out_channels, self.blocks_per_layer[0], stride=1))
        
        for i in range(len(self.blocks_per_layer[1:])):
            in_channels *= 2
            out_channels *= 2
            self.block_layers.append(self.block_constructor(in_channels, out_channels, self.blocks_per_layer[i], stride=2))
            

        self.block_layers_sequential = torch.nn.Sequential(*self.block_layers)

        self.final_pool = torch.nn.AdaptiveAvgPool2d((1, 1))

        self.cls_layers = torch.nn.Sequential(
            # torch.nn.Linear(1024, 1024),
            # torch.nn.ReLU(),
            # torch.nn.Linear(1024, 1024),
            # torch.nn.ReLU(),
            torch.nn.Linear(1024, num_classes)
        )
        
    def block_constructor(self, in_channels, out_channels, num_blocks, stride=1):
        
        layers = []
        layers.append(ResNetBlock(in_channels, out_channels, stride=stride))
        in_channels = out_channels

        for i in range(1, num_blocks):
            layers.append(ResNetBlock(in_channels, out_channels))
        
        return torch.nn.Sequential(*layers)
        
    def forward(self, x, return_feats=False):
        """
        What is return_feats? It essentially returns the second-to-last-layer
        features of a given image. It's a "feature encoding" of the input image,
        and you can use it for the verification task. You would use the outputs
        of the final classification layer for the classification task.

        You might also find that the classification outputs are sometimes better
        for verification too - try both.
        """
        
        x = self.entry_layers.forward(x)

        x = self.block_layers_sequential(x)
        x = self.final_pool(x)
        flatten_layer = torch.nn.Flatten()
        x = flatten_layer.forward(x)

        out = self.cls_layers.forward(x)

        if return_feats:
            return x
        else:
            return out

In [None]:
class ResNet50Block(torch.nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):

        super().__init__()

        downsampler = None  
        if stride != 1 or in_channels != out_channels:
            downsampler = torch.nn.Sequential(            
                torch.nn.Conv2d(in_channels, out_channels, 1, stride, padding=0, bias=False),
                torch.nn.BatchNorm2d(out_channels),
            )

        self.downsampler = downsampler

        self.ResNetChunk = torch.nn.Sequential(
        # bias = False. Why? "Biases are in the BatchNorm layers that follow" (-Kaiming He)
            torch.nn.Conv2d(in_channels, in_channels, 1, stride=1, padding=0, bias=False),
            torch.nn.BatchNorm2d(in_channels),
            torch.nn.ReLU(),

            torch.nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1, bias=False),
            torch.nn.BatchNorm2d(out_channels),
            torch.nn.ReLU(),

            torch.nn.Conv2d(out_channels, out_channels, 1, stride=1, padding=0, bias=False),
            torch.nn.BatchNorm2d(out_channels),
        )
  
    def forward(self, input):

        identity_fn = input

        if self.downsampler is not None:
            identity_fn = self.downsampler.forward(identity_fn)

        delta = self.ResNetChunk.forward(input)

        output = identity_fn + delta
        
        relu_layer = torch.nn.ReLU()
        output = relu_layer.forward(output)

        return output



class ResNet50(torch.nn.Module):
    def __init__(self, num_classes=7000, blocks_per_layer=None):
        super().__init__()

        self.block_layers = []

        if blocks_per_layer is None:
            self.blocks_per_layer = [3, 4, 6, 3]
        else:
            self.blocks_per_layer = blocks_per_layer

        self.entry_layers = torch.nn.Sequential(
            torch.nn.Conv2d(3, 64, 7, stride=2, padding=3, bias=False),
            torch.nn.BatchNorm2d(64),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(3, stride=2, padding=1)
        )
        
        in_channels = 64
        out_channels = 128

        self.block_layers.append(self.block_constructor(in_channels, out_channels, self.blocks_per_layer[0], stride=1))
        
        for i in range(len(self.blocks_per_layer[1:])):
            in_channels *= 2
            out_channels *= 2
            self.block_layers.append(self.block_constructor(in_channels, out_channels, self.blocks_per_layer[i], stride=2))
            

        self.block_layers_sequential = torch.nn.Sequential(*self.block_layers)

        self.final_pool = torch.nn.AdaptiveAvgPool2d((1, 1))

        self.cls_layers = torch.nn.Sequential(
            # torch.nn.Linear(50176, 16384),
            # torch.nn.ReLU(),
            # torch.nn.Linear(16384, num_classes)
            torch.nn.Linear(1024, num_classes),
        )
        
    def block_constructor(self, in_channels, out_channels, num_blocks, stride=1):
        
        layers = []
        layers.append(ResNet50Block(in_channels, out_channels, stride=stride))
        in_channels = out_channels

        for i in range(1, num_blocks):
            layers.append(ResNet50Block(in_channels, out_channels))
        
        return torch.nn.Sequential(*layers)
        
    def forward(self, x, return_feats=False):
        """
        What is return_feats? It essentially returns the second-to-last-layer
        features of a given image. It's a "feature encoding" of the input image,
        and you can use it for the verification task. You would use the outputs
        of the final classification layer for the classification task.

        You might also find that the classification outputs are sometimes better
        for verification too - try both.
        """
        
        x = self.entry_layers.forward(x)

        x = self.block_layers_sequential(x)
        x = self.final_pool(x)
        flatten_layer = torch.nn.Flatten()
        x = flatten_layer.forward(x)

        out = self.cls_layers.forward(x)

        if return_feats:
            return x
        else:
            return out

# Setup everything for training

In [None]:
# model = ResNet().to(device)
# # model.forward(x=np.zeros((3, 224, 224)))
# model_save_t = torch.load('/content/drive/MyDrive/Colab Notebooks/HW2P2/ResNet34_002')
# model.load_state_dict(model_save_t)
# # optimizer.load_state_dict(model_save_t['optimizer_state_dict'])
# summary(model, (3, 224, 224), device='cuda')

In [None]:
criterion = torch.nn.CrossEntropyLoss()# label_smoothing=0.03)
# TODO: What loss do you need for a multi class classification problem?
optimizer = torch.optim.SGD(model.parameters(), lr=config['lr'], momentum=0.9, weight_decay=1e-4)
# TODO: Implement a scheduler (Optional but Highly Recommended)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)
# You can try ReduceLRonPlateau, StepLR, MultistepLR, CosineAnnealing, etc.
scaler = torch.cuda.amp.GradScaler() # Good news. We have FP16 (Mixed precision training) implemented for you
# It is useful only in the case of compatible GPUs such as T4/V100

In [None]:
model2 = ResNet50().to(device)
model2.load_state_dict(torch.load('/content/data/ResNet50_newaug.pth'))
model2.eval()

ResNet50(
  (entry_layers): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )
  (block_layers_sequential): Sequential(
    (0): Sequential(
      (0): ResNet50Block(
        (downsampler): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (ResNetChunk): Sequential(
          (0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (4): BatchNorm2d(128, eps=1e-05, momentum

In [None]:
model3 = ResNet50().to(device)
model3.load_state_dict(torch.load('/content/data/ResNet50_newaug2.pth'))
model3.eval()

ResNet50(
  (entry_layers): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )
  (block_layers_sequential): Sequential(
    (0): Sequential(
      (0): ResNet50Block(
        (downsampler): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (ResNetChunk): Sequential(
          (0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (4): BatchNorm2d(128, eps=1e-05, momentum

# Let's train!

In [None]:
def train(model, dataloader, optimizer, criterion):
    
    model.train()

    # Progress Bar 
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, leave=False, position=0, desc='Train', ncols=5) 
    
    num_correct = 0
    total_loss = 0

    for i, (images, labels) in enumerate(dataloader):

        optimizer.zero_grad() # Zero gradients

        images, labels = images.to(device), labels.to(device)
        
        with torch.cuda.amp.autocast(): # This implements mixed precision. Thats it! 
            outputs = model(images)
            loss = criterion(outputs, labels)

        # Update no. of correct predictions & loss as we iterate
        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        # tqdm lets you add some details so you can monitor training as you train.
        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct,
            lr="{:.04f}".format(float(optimizer.param_groups[0]['lr'])))
        
        scaler.scale(loss).backward() # This is a replacement for loss.backward()
        scaler.step(optimizer) # This is a replacement for optimizer.step()
        scaler.update() 

        # TODO? Depending on your choice of scheduler,
        # You may want to call some schdulers inside the train function. What are these?
      
        batch_bar.update() # Update tqdm bar

    batch_bar.close() # You need this to close the tqdm bar

    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))

    return acc, total_loss

In [None]:
def validate(model, dataloader, criterion):
  
    model.eval()
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Val', ncols=5)

    num_correct = 0.0
    total_loss = 0.0

    for i, (images, labels) in enumerate(dataloader):
        
        # Move images to device
        images, labels = images.to(device), labels.to(device)
        
        # Get model outputs
        with torch.inference_mode():
            outputs = model(images)
            loss = criterion(outputs, labels)

        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct)

        batch_bar.update()
        
    batch_bar.close()
    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))
    return acc, total_loss

In [None]:
gc.collect() # These commands help you when you face CUDA OOM error
torch.cuda.empty_cache()

# Wandb

In [None]:
wandb.login(key="a8b39d27e28590586b7efa4abb3acaad4e91b958") #API Key is in your wandb account, under settings (wandb.ai/settings)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33makh[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
# Create your wandb run
run = wandb.init(
    name = "ResNet34_AUG1_CONT", ## Wandb creates random run names if you skip this field
    reinit = True, ### Allows reinitalizing runs when you re-run this cell
    # id = "bc12s6oh", ### Insert specific run id here if you want to resume a previous run
    # resume = "must", ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw2p2", ### Project should be created in your wandb account 
    config = config ### Wandb Config for your run
)

# Experiments

In [None]:
!nvidia-smi

Sat Oct 22 19:31:50 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P0    29W /  70W |   1364MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# model_save = wandb.restore('checkpoint.pth')

In [None]:
# model_save = wandb.restore(name='checkpoint.pth', run_path='akh/hw2p2/bc12s6oh')

In [None]:
# from google.colab import drive # Link your drive if you are a colab user
# drive.mount('/content/drive') # Models in this HW take a long time to get trained and make sure to save it her

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
from torchvision.transforms.transforms import RandomEqualize
DATA_DIR = '/content/data/11-785-f22-hw2p2-classification/'# TODO: Path where you have downloaded the data
TRAIN_DIR = os.path.join(DATA_DIR, "classification/train") 
VAL_DIR = os.path.join(DATA_DIR, "classification/dev")
TEST_DIR = os.path.join(DATA_DIR, "classification/test")

# Transforms using torchvision - Refer https://pytorch.org/vision/stable/transforms.html

train_transforms = torchvision.transforms.Compose([
                    torchvision.transforms.RandomHorizontalFlip(0.3),
                    torchvision.transforms.ColorJitter(brightness=(0.8, 1.2), contrast=(0.8,1.2), saturation=(0.8,1.2)),
                    torchvision.transforms.GaussianBlur(3),
                    torchvision.transforms.RandomRotation((-10, 10)),
                    torchvision.transforms.RandomAutocontrast(0.25),
                    # torchvision.transforms.RandomEqualize(1),
                    torchvision.transforms.RandomAdjustSharpness(1.5, 0.2),

    # Implementing the right transforms/augmentation methods is key to improving performance.
                    torchvision.transforms.ToTensor(),
                    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                    ])
# Most torchvision transforms are done on PIL images. So you convert it into a tensor at the end with ToTensor()
# But there are some transforms which are performed after ToTensor() : e.g - Normalization
# Normalization Tip - Do not blindly use normalization that is not suitable for this dataset

val_transforms = torchvision.transforms.Compose([
    # torchvision.transforms.GaussianBlur(3),
    # torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

val_transforms_normalized = torchvision.transforms.Compose([
    # torchvision.transforms.GaussianBlur(3),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])


train_dataset = torchvision.datasets.ImageFolder(TRAIN_DIR, transform = train_transforms)
val_dataset = torchvision.datasets.ImageFolder(VAL_DIR, transform = val_transforms)
val_dataset_normalized = torchvision.datasets.ImageFolder(VAL_DIR, transform = val_transforms_normalized)

# You should NOT have data augmentation on the validation set. Why?


# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = config['batch_size'], 
                                           shuffle = True,num_workers = 4, pin_memory = True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size = config['batch_size'], 
                                         shuffle = False, num_workers = 2)
val_loader_normalized = torch.utils.data.DataLoader(val_dataset, batch_size = config['batch_size'], 
                                         shuffle = False, num_workers = 2)

In [None]:
# torch.save(model.state_dict(), '/content/drive/MyDrive/Colab Notebooks/HW2P2/ResNet34_002_RE')
# wandb.save('/content/drive/MyDrive/Colab Notebooks/HW2P2/ResNet34_002_RE')

In [None]:
best_valacc = 0.0

for epoch in range(config['epochs']):

    curr_lr = float(optimizer.param_groups[0]['lr'])

    # !nvidia-smi

    train_acc, train_loss = train(model, train_loader, optimizer, criterion)
    
    print("\nEpoch {}/{}: \nTrain Acc {:.04f}%\t Train Loss {:.04f}\t Learning Rate {:.04f}".format(
        epoch + 1,
        config['epochs'],
        train_acc,
        train_loss,
        curr_lr))
    
    val_acc, val_loss = validate(model, val_loader, criterion)
    
    print("Val Acc {:.04f}%\t Val Loss {:.04f}".format(val_acc, val_loss))
    scheduler.step(val_acc, epoch=epoch)

    wandb.log({"train_loss":train_loss, 'train_Acc': train_acc, 'validation_Acc':val_acc, 
               'validation_loss': val_loss, "learning_Rate": curr_lr})
    
    # If you are using a scheduler in your train function within your iteration loop, you may want to log
    # your learning rate differently 

    # #Save model in drive location if val_acc is better than best recorded val_acc
    if val_acc >= best_valacc:
      #path = os.path.join(root, model_directory, 'checkpoint' + '.pth')
      print("Saving model")
      torch.save({'model_state_dict':model.state_dict(),
                  'optimizer_state_dict':optimizer.state_dict(),
                  #'scheduler_state_dict':scheduler.state_dict(),
                  'val_acc': val_acc, 
                  'epoch': epoch}, './checkpoint.pth')
      best_valacc = val_acc
      wandb.save('checkpoint.pth')
      torch.save(model.state_dict(), '/content/drive/MyDrive/Colab Notebooks/HW2P2/ResNet34_002_RE')
      wandb.save('/content/drive/MyDrive/Colab Notebooks/HW2P2/ResNet34_002_RE')
      # You may find it interesting to exlplore Wandb Artifcats to version your models
run.finish()

# Classification Task: Testing

In [None]:
def test(model,dataloader):

  model.eval()
  batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Test')
  test_results = []
  
  for i, (images) in enumerate(dataloader):
      # TODO: Finish predicting on the test set.
      images = images.to(device)

      with torch.inference_mode():
        outputs = model(images)

      outputs = torch.argmax(outputs, axis=1).detach().cpu().numpy().tolist()
      test_results.extend(outputs)
      
      batch_bar.update()
      
  batch_bar.close()
  return test_results

In [None]:
test_results = test(model, test_loader)



## Generate csv to submit to Kaggle

In [None]:
with open("classification_early_submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(test_dataset)):
        f.write("{},{}\n".format(str(i).zfill(6) + ".jpg", test_results[i]))

# Verification Task: Validation

The verification task consists of the following generalized scenario:
- You are given X unknown identitites 
- You are given Y known identitites
- Your goal is to match X unknown identities to Y known identities.

We have given you a verification dataset, that consists of 1000 known identities, and 1000 unknown identities. The 1000 unknown identities are split into dev (200) and test (800). Your goal is to compare the unknown identities to the 1000 known identities and assign an identity to each image from the set of unknown identities. 

Your will use/finetune your model trained for classification to compare images between known and unknown identities using a similarity metric and assign labels to the unknown identities. 

This will judge your model's performance in terms of the quality of embeddings/features it generates on images/faces it has never seen during training for classification.

In [None]:
known_regex = "/content/data/verification/known/*/*"
known_paths = [i.split('/')[-2] for i in sorted(glob.glob(known_regex))] 
# This obtains the list of known identities from the known folder

# unknown_regex = "/content/data/verification/unknown_dev/*" #Change the directory accordingly for the test set
unknown_regex = "/content/data/verification/unknown_test/*" #Change the directory accordingly for the test set


# We load the images from known and unknown folders
unknown_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_regex)))]
known_images = [Image.open(p) for p in tqdm(sorted(glob.glob(known_regex)))]

# Why do you need only ToTensor() here?
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()])

unknown_images = torch.stack([transforms(x) for x in unknown_images])
known_images  = torch.stack([transforms(y) for y in known_images ])
#Print your shapes here to understand what we have done

# You can use other similarity metrics like Euclidean Distance if you wish
similarity_metric = torch.nn.CosineSimilarity(dim= 1, eps= 1e-6) 
similarity_metric2 = torch.nn.PairwiseDistance(eps= 1e-6) 


  0%|          | 0/800 [00:00<?, ?it/s][A
 31%|███▏      | 251/800 [00:00<00:00, 2454.00it/s][A
100%|██████████| 800/800 [00:00<00:00, 2502.74it/s]

  0%|          | 0/1000 [00:00<?, ?it/s][A
 51%|█████▏    | 514/1000 [00:00<00:00, 986.08it/s][A
100%|██████████| 1000/1000 [00:00<00:00, 1560.73it/s]


In [None]:
# model.eval()
# known_batch = unknown_images[0:1]
# known_feat = model(known_batch.float().to(device), return_feats=False)
# print(known_feat.shape)
# iknown_feat = model(known_batch.float().to(device), return_feats=True)
# print(iknown_feat.shape)

torch.Size([1, 7000])
torch.Size([1, 1024])


In [None]:
def eval_verification(unknown_images, known_images, model, similarity, batch_size= config['batch_size'], mode='val'): 

    unknown_feats, known_feats = [], []

    batch_bar = tqdm(total=len(unknown_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    model.eval()

    # We load the images as batches for memory optimization and avoiding CUDA OOM errors
    for i in range(0, unknown_images.shape[0], batch_size):
        unknown_batch = unknown_images[i:i+batch_size] # Slice a given portion upto batch_size
        
        with torch.no_grad():
            unknown_feat = model(unknown_batch.float().to(device), return_feats=True) #Get features from model         
        unknown_feats.append(unknown_feat)
        batch_bar.update()
    
    batch_bar.close()
    
    batch_bar = tqdm(total=len(known_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    
    for i in range(0, known_images.shape[0], batch_size):
        known_batch = known_images[i:i+batch_size] 
        with torch.no_grad():
              known_feat = model(known_batch.float().to(device), return_feats=True)
          
        known_feats.append(known_feat)
        batch_bar.update()

    batch_bar.close()

    # Concatenate all the batches
    unknown_feats = torch.cat(unknown_feats, dim=0)
    known_feats = torch.cat(known_feats, dim=0)

    similarity_values = torch.stack([-similarity(unknown_feats, known_feature) for known_feature in known_feats])
    # Print the inner list comprehension in a separate cell - what is really happening?

    predictions = similarity_values.argmax(0).cpu().numpy() #Why are we doing an argmax here?

    # Map argmax indices to identity strings
    pred_id_strings = [known_paths[i] for i in predictions]
    
    if mode == 'val':
      true_ids = pd.read_csv('/content/data/verification/dev_identities.csv')['label'].tolist()
      accuracy = accuracy_score(pred_id_strings, true_ids)
      print("Verification Accuracy = {}".format(accuracy))
    
    return pred_id_strings

In [None]:
# unknown_images, known_images, model3, similarity_metric, 
similarity = similarity_metric
similarity2 = similarity_metric2
batch_size = config['batch_size']
mode = 'val'

unknown_feats, known_feats = [], []

batch_bar = tqdm(total=len(unknown_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
model2.eval()

# We load the images as batches for memory optimization and avoiding CUDA OOM errors
for i in range(0, unknown_images.shape[0], batch_size):
    unknown_batch = unknown_images[i:i+batch_size] # Slice a given portion upto batch_size
    
    with torch.no_grad():
        unknown_feat = model2(unknown_batch.float().to(device), return_feats=False) #Get features from model         
    unknown_feats.append(unknown_feat)
    batch_bar.update()

batch_bar.close()

batch_bar = tqdm(total=len(known_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)

for i in range(0, known_images.shape[0], batch_size):
    known_batch = known_images[i:i+batch_size] 
    with torch.no_grad():
          known_feat = model2(known_batch.float().to(device), return_feats=False)
      
    known_feats.append(known_feat)
    batch_bar.update()

batch_bar.close()

# Concatenate all the batches
unknown_feats = torch.cat(unknown_feats, dim=0)
known_feats = torch.cat(known_feats, dim=0)

similarity_values = torch.stack([similarity(unknown_feats, known_feature) for known_feature in known_feats])
similarity_values2 = torch.stack([similarity2(unknown_feats, known_feature)/similarity(unknown_feats, known_feature) for known_feature in known_feats])
similarity_values3 = torch.stack([similarity2(unknown_feats, known_feature)*similarity(unknown_feats, known_feature) for known_feature in known_feats])
try:
    similarity_values4 = torch.stack([torchmetrics.functional.pairwise_linear_similarity(unknown_feats, np.array(known_feature)) for known_feature in known_feats])
except:
    print("Failed")

# Print the inner list comprehension in a separate cell - what is really happening?

predictions = similarity_values.argmax(0).cpu().numpy() #Why are we doing an argmax here?

# Map argmax indices to identity strings
pred_id_strings = [known_paths[i] for i in predictions]

if mode == 'val':
  true_ids = pd.read_csv('/content/data/verification/dev_identities.csv')['label'].tolist()
  accuracy = accuracy_score(pred_id_strings, true_ids)
  print("Verification Accuracy = {}".format(accuracy))

val:  33%|███▎      | 2/6 [00:26<00:52, 13.24s/it]

KeyboardInterrupt: ignored

In [None]:
unknown_feats = unknown_feats1
known_feats = known_feats1

In [None]:
unknown_feats.shape

torch.Size([200, 1024])

In [None]:
known_feats.shape

torch.Size([1000, 1024])

In [None]:
# !pip install torchmetrics
import torchmetrics
similarity_values = torch.stack([-similarity2(unknown_feats, known_feature) for known_feature in known_feats])
similarity_values2 = torch.stack([similarity(unknown_feats, known_feature)/(similarity2(unknown_feats, known_feature)) for known_feature in known_feats])
# similarity_values0 = torch.stack([torchmetrics.functional.pairwise_linear_similarity(unknown_feats, np.array(known_feature)) for known_feature in known_feats])
predictions = similarity_values.argmax(0).cpu().numpy() #Why are we doing an argmax here?

# Map argmax indices to identity strings
pred_id_strings = [known_paths[i] for i in predictions]

if mode == 'val':
  true_ids = pd.read_csv('/content/data/verification/dev_identities.csv')['label'].tolist()
  accuracy = accuracy_score(pred_id_strings, true_ids)
  print("Verification Accuracy = {}".format(accuracy))

Verification Accuracy = 0.635


In [None]:
pred_id_strings = eval_verification(unknown_images, known_images, model3, similarity_metric2, config['batch_size'], 
                                    mode='test')



In [None]:
with open("verification_early_submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(pred_id_strings)):
        f.write("{},{}\n".format(i, pred_id_strings[i]))

In [None]:
!kaggle competitions submit -c 11-785-f22-hw2p2-classification -f classification_early_submission.csv -m "Early Submission"

100% 541k/541k [00:00<00:00, 2.74MB/s]
Successfully submitted to Face Recognition

In [None]:
!kaggle competitions submit -c 11-785-f22-hw2p2-verification -f verification_early_submission.csv -m "Early Submission"

100% 9.28k/9.28k [00:00<00:00, 40.4kB/s]
Successfully submitted to Face Verification