# HW2P2: Face Classification and Verification


Congrats on coming to the second homework in 11785: Introduction to Deep Learning. This homework significantly longer and tougher than the previous homework. You have 2 sub-parts as outlined below. Please start early! 


*   Face Recognition: You will be writing your own CNN model to tackle the problem of classification, consisting of 7000 identities
*   Face Verification: You use the model trained for classification to evaluate the quality of its feature embeddings, by comparing the similarity of known and unknown identities

For this HW, you only have to write code to implement your model architecture. Everything else has been provided for you, on the pretext that most of your time will be used up in developing the suitable model architecture for achieving satisfactory performance.

Common errors which you may face in this homeworks (because of the size of the model)


*   CUDA Out of Memory (OOM): You can tackle this problem by (1) Reducing the batch size (2) Calling `torch.cuda.empty_cache()` and `gc.collect()` (3) Finally restarting the runtime



# Preliminaries

In [None]:
!nvidia-smi # to see what GPU you have

Wed Oct 26 04:12:52 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   73C    P0    31W /  70W |   8112MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!pip install wandb --quiet

In [None]:
import torch
from torchsummary import summary
import torchvision #This library is used for image-based operations (Augmentations)
import os
import gc
from tqdm import tqdm
from PIL import Image
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
import glob
import wandb
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", device)

Device:  cuda


In [None]:
from google.colab import drive # Link your drive if you are a colab user
drive.mount('/content/drive') # Models in this HW take a long time to get trained and make sure to save it her

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# TODOs
As you go, please read the code and keep an eye out for TODOs!

# Download Data from Kaggle

In [None]:
# TODO: Use the same Kaggle code from HW1P2
!pip install --upgrade --force-reinstall --no-deps kaggle==1.5.8
!mkdir /root/.kaggle

with open("/root/.kaggle/kaggle.json", "w+") as f:
    f.write('{"username":"ruidichang","key":"ec93dcec34e885b48bb96b02e9b79b80"}') 
    # Put your kaggle username & key here

!chmod 600 /root/.kaggle/kaggle.json

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting kaggle==1.5.8
  Using cached kaggle-1.5.8-py3-none-any.whl
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.8
    Uninstalling kaggle-1.5.8:
      Successfully uninstalled kaggle-1.5.8
Successfully installed kaggle-1.5.8
mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [None]:
!mkdir '/content/data'

!kaggle competitions download -c 11-785-f22-hw2p2-classification
!unzip -qo '11-785-f22-hw2p2-classification.zip' -d '/content/data'

!kaggle competitions download -c 11-785-f22-hw2p2-verification
!unzip -qo '11-785-f22-hw2p2-verification.zip' -d '/content/data'

mkdir: cannot create directory ‘/content/data’: File exists
11-785-f22-hw2p2-classification.zip: Skipping, found more recently modified local copy (use --force to force download)
11-785-f22-hw2p2-verification.zip: Skipping, found more recently modified local copy (use --force to force download)


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Configs

In [None]:
config = {
    'batch_size': 256, # Increase this if your GPU can handle it
    'lr': 0.1,
    'epochs': 100, # 10 epochs is recommended ONLY for the early submission - you will have to train for much longer typically.
    # Include other parameters as needed.
}

# Classification Dataset

In [None]:
DATA_DIR = '/content/data/11-785-f22-hw2p2-classification/'# TODO: Path where you have downloaded the data
TRAIN_DIR = os.path.join(DATA_DIR, "classification/train") 
VAL_DIR = os.path.join(DATA_DIR, "classification/dev")
TEST_DIR = os.path.join(DATA_DIR, "classification/test")

# Transforms using torchvision - Refer https://pytorch.org/vision/stable/transforms.html

train_transforms = torchvision.transforms.Compose([ 
    # Implementing the right transforms/augmentation methods is key to improving performance.
                    torchvision.transforms.ToTensor() 
                    ])
# Most torchvision transforms are done on PIL images. So you convert it into a tensor at the end with ToTensor()
# But there are some transforms which are performed after ToTensor() : e.g - Normalization
# Normalization Tip - Do not blindly use normalization that is not suitable for this dataset

val_transforms = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

# train_transforms = torchvision.transforms.Compose([torchvision.transforms.RandomHorizontalFlip(p=0.5),
#                     torchvision.transforms.RandomRotation((-40, 40)),
#                     torchvision.transforms.ToTensor(),
#                     torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])
# val_transforms = torchvision.transforms.Compose([torchvision.transforms.ToTensor(), torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])


train_dataset = torchvision.datasets.ImageFolder(TRAIN_DIR, transform = train_transforms)
val_dataset = torchvision.datasets.ImageFolder(VAL_DIR, transform = val_transforms)
# You should NOT have data augmentation on the validation set. Why?


# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = config['batch_size'], 
                                           shuffle = True,num_workers = 4, pin_memory = True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size = config['batch_size'], 
                                         shuffle = False, num_workers = 2)

In [None]:
# You can do this with ImageFolder as well, but it requires some tweaking
class ClassificationTestDataset(torch.utils.data.Dataset):

    def __init__(self, data_dir, transforms):
        self.data_dir   = data_dir
        self.transforms = transforms

        # This one-liner basically generates a sorted list of full paths to each image in the test directory
        self.img_paths  = list(map(lambda fname: os.path.join(self.data_dir, fname), sorted(os.listdir(self.data_dir))))

    def __len__(self):
        return len(self.img_paths)
    
    def __getitem__(self, idx):
        return self.transforms(Image.open(self.img_paths[idx]))

In [None]:
test_dataset = ClassificationTestDataset(TEST_DIR, transforms = val_transforms) #Why are we using val_transforms for Test Data?
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = config['batch_size'], shuffle = False,
                         drop_last = False, num_workers = 2)

In [None]:
print("Number of classes: ", len(train_dataset.classes))
print("No. of train images: ", train_dataset.__len__())
print("Shape of image: ", train_dataset[0][0].shape)
print("Batch size: ", config['batch_size'])
print("Train batches: ", train_loader.__len__())
print("Val batches: ", val_loader.__len__())

Number of classes:  7000
No. of train images:  140000
Shape of image:  torch.Size([3, 224, 224])
Batch size:  256
Train batches:  547
Val batches:  137


# Very Simple Network (for Mandatory Early Submission)

In [None]:
class Network(torch.nn.Module):
    """
    The Very Low early deadline architecture is a 4-layer CNN.

    The first Conv layer has 64 channels, kernel size 7, and stride 4.
    The next three have 128, 256, and 512 channels. Each have kernel size 3 and stride 2.
    
    Think about strided convolutions from the lecture, as convolutioin with stride= 1 and downsampling.
    For stride 1 convolution, what padding do you need for preserving the spatial resolution? 
    (Hint => padding = kernel_size // 2) - Why?)

    Each Conv layer is accompanied by a Batchnorm and ReLU layer.
    Finally, you want to average pool over the spatial dimensions to reduce them to 1 x 1. Use AdaptiveAvgPool2d.
    Then, remove (Flatten?) these trivial 1x1 dimensions away.
    Look through https://pytorch.org/docs/stable/nn.html 
    
    TODO: Fill out the model definition below! 

    Why does a very simple network have 4 convolutions?
    Input images are 224x224. Note that each of these convolutions downsample.
    Downsampling 2x effectively doubles the receptive field, increasing the spatial
    region each pixel extracts features from. Downsampling 32x is standard
    for most image models.

    Why does a very simple network have high channel sizes?
    Every time you downsample 2x, you do 4x less computation (at same channel size).
    To maintain the same level of computation, you 2x increase # of channels, which 
    increases computation by 4x. So, balances out to same computation.
    Another intuition is - as you downsample, you lose spatial information. We want
    to preserve some of it in the channel dimension.
    """

    def __init__(self, num_classes=7000):
        super().__init__()


        self.backbone = torch.nn.Sequential(
              # TODO
            torch.nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),
            torch.nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            torch.nn.GELU(approximate="none"),
            torch.nn.Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),
            torch.nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            torch.nn.GELU(approximate="none"),
            torch.nn.Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),
            torch.nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            torch.nn.GELU(approximate="none"),
            torch.nn.Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2)),
            torch.nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            torch.nn.GELU(approximate="none"),
            torch.nn.Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2)),
            torch.nn.BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            torch.nn.GELU(approximate="none"),
            torch.nn.Conv2d(1024, 2048, kernel_size=(3, 3), stride=(2, 2)),
            torch.nn.BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            torch.nn.GELU(approximate="none"),
            torch.nn.Conv2d(2048, 4096, kernel_size=(2, 2), stride=(2, 2)),
            torch.nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            torch.nn.GELU(approximate="none"),
            torch.nn.AdaptiveAvgPool2d(output_size=(1, 1)),
            torch.nn.Flatten(start_dim=1, end_dim=-1)

            ) 
        
        self.cls_layer = torch.nn.Linear(4096, num_classes, bias=True)#TODO
    
    def forward(self, x, return_feats=False):
        """
        What is return_feats? It essentially returns the second-to-last-layer
        features of a given image. It's a "feature encoding" of the input image,
        and you can use it for the verification task. You would use the outputs
        of the final classification layer for the classification task.

        You might also find that the classification outputs are sometimes better
        for verification too - try both.
        """
        feats = self.backbone(x)
        out = self.cls_layer(feats)

        if return_feats:
            return feats
        else:
            return out
            
model = Network().to(device)
summary(model, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           1,792
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              GELU-3         [-1, 64, 112, 112]               0
            Conv2d-4          [-1, 128, 56, 56]          73,856
       BatchNorm2d-5          [-1, 128, 56, 56]             256
              GELU-6          [-1, 128, 56, 56]               0
            Conv2d-7          [-1, 256, 28, 28]         295,168
       BatchNorm2d-8          [-1, 256, 28, 28]             512
              GELU-9          [-1, 256, 28, 28]               0
           Conv2d-10          [-1, 512, 13, 13]       1,180,160
      BatchNorm2d-11          [-1, 512, 13, 13]           1,024
             GELU-12          [-1, 512, 13, 13]               0
           Conv2d-13           [-1, 1024, 6, 6]       4,719,616
      BatchNorm2d-14           [-1, 102

# Setup everything for training

In [None]:
criterion = torch.nn.CrossEntropyLoss()# TODO: What loss do you need for a multi class classification problem?
optimizer = torch.optim.SGD(model.parameters(), lr=config['lr'], momentum=0.9, weight_decay=1e-4)
# TODO: Implement a scheduler (Optional but Highly Recommended)
# You can try ReduceLRonPlateau, StepLR, MultistepLR, CosineAnnealing, etc.
scaler = torch.cuda.amp.GradScaler() # Good news. We have FP16 (Mixed precision training) implemented for you
# It is useful only in the case of compatible GPUs such as T4/V100

# Let's train!

In [None]:
def train(model, dataloader, optimizer, criterion):
    
    model.train()

    # Progress Bar 
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, leave=False, position=0, desc='Train', ncols=5) 
    
    num_correct = 0
    total_loss = 0

    for i, (images, labels) in enumerate(dataloader):
        
        optimizer.zero_grad() # Zero gradients

        images, labels = images.to(device), labels.to(device)
        
        with torch.cuda.amp.autocast(): # This implements mixed precision. Thats it! 
            outputs = model(images)
            loss = criterion(outputs, labels)

        # Update no. of correct predictions & loss as we iterate
        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        # tqdm lets you add some details so you can monitor training as you train.
        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct,
            lr="{:.04f}".format(float(optimizer.param_groups[0]['lr'])))
        
        scaler.scale(loss).backward() # This is a replacement for loss.backward()
        scaler.step(optimizer) # This is a replacement for optimizer.step()
        scaler.update() 

        # TODO? Depending on your choice of scheduler,
        # You may want to call some schdulers inside the train function. What are these?
      
        batch_bar.update() # Update tqdm bar

    batch_bar.close() # You need this to close the tqdm bar

    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))

    return acc, total_loss

In [None]:
def validate(model, dataloader, criterion):
  
    model.eval()
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Val', ncols=5)

    num_correct = 0.0
    total_loss = 0.0

    for i, (images, labels) in enumerate(dataloader):
        
        # Move images to device
        images, labels = images.to(device), labels.to(device)
        
        # Get model outputs
        with torch.inference_mode():
            outputs = model(images)
            loss = criterion(outputs, labels)

        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct)

        batch_bar.update()
        
    batch_bar.close()
    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))
    return acc, total_loss

In [None]:
gc.collect() # These commands help you when you face CUDA OOM error
torch.cuda.empty_cache()



# Wandb

In [None]:
wandb.login(key="f12c6cea0bcd560db008417ca06230259ed77e61") #API Key is in your wandb account, under settings (wandb.ai/settings)



True

In [None]:
# Create your wandb run
run = wandb.init(
    name = "early-submission", ## Wandb creates random run names if you skip this field
    reinit = True, ### Allows reinitalizing runs when you re-run this cell
    # run_id = ### Insert specific run id here if you want to resume a previous run
    # resume = "must" ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw2p2-ablations", ### Project should be created in your wandb account 
    config = config ### Wandb Config for your run
)

0,1
learning_Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_Acc,▁▂▅██████████████████████████▇██████████
train_loss,█▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
validation_Acc,▁▃▅▇▇███▇█▇▆▇██████▇█████████▇███████▇██
validation_loss,█▄▃▂▁▁▁▁▂▁▂▂▂▁▁▁▁▁▁▂▁▁▁▁▁▁▂▂▂▂▁▁▁▁▁▁▁▂▁▂

0,1
learning_Rate,0.1
train_Acc,99.97501
train_loss,0.01337
validation_Acc,71.71533
validation_loss,2.26241


In [None]:
# ### Save your model architecture as a string with str(model) 
# model_arch = str(model)

# ### Save it in a txt file 
# arch_file = open("model_arch.txt", "w")
# file_write = arch_file.write(model_arch)
# arch_file.close()

# ### log it in your wandb run with wandb.save()
# wandb.save('model_arch.txt')

# Experiments

In [None]:
best_valacc = 0.0

for epoch in range(config['epochs']):

    curr_lr = float(optimizer.param_groups[0]['lr'])

    train_acc, train_loss = train(model, train_loader, optimizer, criterion)
    
    print("\nEpoch {}/{}: \nTrain Acc {:.04f}%\t Train Loss {:.04f}\t Learning Rate {:.04f}".format(
        epoch + 1,
        config['epochs'],
        train_acc,
        train_loss,
        curr_lr))
    
    val_acc, val_loss = validate(model, val_loader, criterion)
    
    print("Val Acc {:.04f}%\t Val Loss {:.04f}".format(val_acc, val_loss))

    wandb.log({"train_loss":train_loss, 'train_Acc': train_acc, 'validation_Acc':val_acc, 
               'validation_loss': val_loss, "learning_Rate": curr_lr})
    
    # If you are using a scheduler in your train function within your iteration loop, you may want to log
    # your learning rate differently 

    #Save model in drive location if val_acc is better than best recorded val_acc
    if val_acc >= best_valacc:
      #path = os.path.join(root, model_directory, 'checkpoint' + '.pth')
      print("Saving model")
      torch.save({'model_state_dict':model.state_dict(),
                  'optimizer_state_dict':optimizer.state_dict(),
                  #'scheduler_state_dict':scheduler.state_dict(),
                  'val_acc': val_acc, 
                  'epoch': epoch}, './checkpoint.pth')
      best_valacc = val_acc
      wandb.save('checkpoint.pth')
      # You may find it interesting to exlplore Wandb Artifcats to version your models
run.finish()




Epoch 1/100: 
Train Acc 1.5618%	 Train Loss 7.8337	 Learning Rate 0.1000




Val Acc 4.1543%	 Val Loss 6.7204
Saving model





Epoch 2/100: 
Train Acc 25.5763%	 Train Loss 4.3772	 Learning Rate 0.1000




Val Acc 21.7524%	 Val Loss 4.7002
Saving model





Epoch 3/100: 
Train Acc 65.7243%	 Train Loss 1.8744	 Learning Rate 0.1000




Val Acc 51.0920%	 Val Loss 2.7711
Saving model





Epoch 4/100: 
Train Acc 88.8147%	 Train Loss 0.6696	 Learning Rate 0.1000




Val Acc 55.2578%	 Val Loss 2.6302
Saving model





Epoch 5/100: 
Train Acc 98.2754%	 Train Loss 0.1514	 Learning Rate 0.1000




Val Acc 61.2255%	 Val Loss 2.3849
Saving model





Epoch 6/100: 
Train Acc 99.8150%	 Train Loss 0.0336	 Learning Rate 0.1000




Val Acc 67.1077%	 Val Loss 2.0673
Saving model





Epoch 7/100: 
Train Acc 99.9536%	 Train Loss 0.0225	 Learning Rate 0.1000




Val Acc 68.1911%	 Val Loss 2.0242
Saving model





Epoch 8/100: 
Train Acc 99.9672%	 Train Loss 0.0249	 Learning Rate 0.1000




Val Acc 69.1207%	 Val Loss 1.9906
Saving model





Epoch 9/100: 
Train Acc 99.9729%	 Train Loss 0.0274	 Learning Rate 0.1000




Val Acc 69.5997%	 Val Loss 1.9819
Saving model





Epoch 10/100: 
Train Acc 99.9743%	 Train Loss 0.0285	 Learning Rate 0.1000




Val Acc 70.3638%	 Val Loss 1.9636
Saving model





Epoch 11/100: 
Train Acc 99.9729%	 Train Loss 0.0291	 Learning Rate 0.1000




Val Acc 70.8714%	 Val Loss 1.9620
Saving model





Epoch 12/100: 
Train Acc 99.9707%	 Train Loss 0.0288	 Learning Rate 0.1000




Val Acc 71.0424%	 Val Loss 1.9559
Saving model





Epoch 13/100: 
Train Acc 99.9500%	 Train Loss 0.0326	 Learning Rate 0.1000




Val Acc 69.7651%	 Val Loss 2.0673





Epoch 14/100: 
Train Acc 99.9093%	 Train Loss 0.0479	 Learning Rate 0.1000




Val Acc 67.6751%	 Val Loss 2.1673





Epoch 15/100: 
Train Acc 99.9307%	 Train Loss 0.0395	 Learning Rate 0.1000




Val Acc 70.4237%	 Val Loss 2.0214





Epoch 16/100: 
Train Acc 99.9714%	 Train Loss 0.0178	 Learning Rate 0.1000




Val Acc 72.9100%	 Val Loss 1.8838
Saving model





Epoch 17/100: 
Train Acc 99.9729%	 Train Loss 0.0176	 Learning Rate 0.1000




Val Acc 73.1809%	 Val Loss 1.9316
Saving model





Epoch 18/100: 
Train Acc 99.9507%	 Train Loss 0.0392	 Learning Rate 0.1000




Val Acc 62.3204%	 Val Loss 2.4199





Epoch 19/100: 
Train Acc 89.5003%	 Train Loss 0.6716	 Learning Rate 0.1000




Val Acc 56.5579%	 Val Loss 2.6507





Epoch 20/100: 
Train Acc 99.7722%	 Train Loss 0.0382	 Learning Rate 0.1000




Val Acc 71.2449%	 Val Loss 1.8618





Epoch 21/100: 
Train Acc 99.9721%	 Train Loss 0.0092	 Learning Rate 0.1000




Val Acc 73.2351%	 Val Loss 1.7601
Saving model





Epoch 22/100: 
Train Acc 99.9721%	 Train Loss 0.0115	 Learning Rate 0.1000




Val Acc 74.0733%	 Val Loss 1.7852
Saving model





Epoch 23/100: 
Train Acc 99.9736%	 Train Loss 0.0150	 Learning Rate 0.1000




Val Acc 74.6693%	 Val Loss 1.7878
Saving model





Epoch 24/100: 
Train Acc 99.9750%	 Train Loss 0.0177	 Learning Rate 0.1000




Val Acc 75.0855%	 Val Loss 1.8132
Saving model





Epoch 25/100: 
Train Acc 99.9764%	 Train Loss 0.0191	 Learning Rate 0.1000




Val Acc 75.5275%	 Val Loss 1.8288
Saving model





Epoch 26/100: 
Train Acc 99.9757%	 Train Loss 0.0200	 Learning Rate 0.1000




Val Acc 75.7328%	 Val Loss 1.8583
Saving model





Epoch 27/100: 
Train Acc 99.9771%	 Train Loss 0.0202	 Learning Rate 0.1000




Val Acc 75.9922%	 Val Loss 1.8768
Saving model





Epoch 28/100: 
Train Acc 99.9721%	 Train Loss 0.0215	 Learning Rate 0.1000




Val Acc 74.2986%	 Val Loss 1.9916





Epoch 29/100: 
Train Acc 99.5037%	 Train Loss 0.1212	 Learning Rate 0.1000




Val Acc 57.0255%	 Val Loss 2.6643





Epoch 30/100: 
Train Acc 98.3582%	 Train Loss 0.2174	 Learning Rate 0.1000




Val Acc 65.1574%	 Val Loss 2.2424





Epoch 31/100: 
Train Acc 99.9507%	 Train Loss 0.0156	 Learning Rate 0.1000




Val Acc 74.3185%	 Val Loss 1.7160





Epoch 32/100: 
Train Acc 99.9743%	 Train Loss 0.0079	 Learning Rate 0.1000




Val Acc 75.7356%	 Val Loss 1.7078





Epoch 33/100: 
Train Acc 99.9743%	 Train Loss 0.0110	 Learning Rate 0.1000




Val Acc 76.2032%	 Val Loss 1.7589
Saving model





Epoch 34/100: 
Train Acc 99.9743%	 Train Loss 0.0143	 Learning Rate 0.1000




Val Acc 76.7108%	 Val Loss 1.7825
Saving model





Epoch 35/100: 
Train Acc 99.9757%	 Train Loss 0.0162	 Learning Rate 0.1000




Val Acc 76.9075%	 Val Loss 1.8377
Saving model





Epoch 36/100: 
Train Acc 99.9764%	 Train Loss 0.0173	 Learning Rate 0.1000




Val Acc 77.1670%	 Val Loss 1.8704
Saving model





Epoch 37/100: 
Train Acc 99.9771%	 Train Loss 0.0181	 Learning Rate 0.1000




Val Acc 77.1014%	 Val Loss 1.9151





Epoch 38/100: 
Train Acc 99.9771%	 Train Loss 0.0187	 Learning Rate 0.1000




Val Acc 77.0102%	 Val Loss 1.9555





Epoch 39/100: 
Train Acc 99.9300%	 Train Loss 0.0444	 Learning Rate 0.1000




Val Acc 52.5719%	 Val Loss 3.1403





Epoch 40/100: 
Train Acc 91.5219%	 Train Loss 0.6065	 Learning Rate 0.1000




Val Acc 62.2092%	 Val Loss 2.3262





Epoch 41/100: 
Train Acc 99.8779%	 Train Loss 0.0221	 Learning Rate 0.1000




Val Acc 74.1731%	 Val Loss 1.7261





Epoch 42/100: 
Train Acc 99.9714%	 Train Loss 0.0071	 Learning Rate 0.1000




Val Acc 75.9751%	 Val Loss 1.6551





Epoch 43/100: 
Train Acc 99.9743%	 Train Loss 0.0095	 Learning Rate 0.1000




Val Acc 76.5226%	 Val Loss 1.6892





Epoch 44/100: 
Train Acc 99.9750%	 Train Loss 0.0127	 Learning Rate 0.1000




Val Acc 77.0757%	 Val Loss 1.7315





Epoch 45/100: 
Train Acc 99.9757%	 Train Loss 0.0151	 Learning Rate 0.1000




Val Acc 77.4435%	 Val Loss 1.7584
Saving model





Epoch 46/100: 
Train Acc 99.9757%	 Train Loss 0.0165	 Learning Rate 0.1000




Val Acc 77.5747%	 Val Loss 1.8080
Saving model





Epoch 47/100: 
Train Acc 99.9771%	 Train Loss 0.0170	 Learning Rate 0.1000




Val Acc 77.9197%	 Val Loss 1.8318
Saving model





Epoch 48/100: 
Train Acc 99.9771%	 Train Loss 0.0175	 Learning Rate 0.1000




Val Acc 77.9340%	 Val Loss 1.8825
Saving model





Epoch 49/100: 
Train Acc 99.9771%	 Train Loss 0.0178	 Learning Rate 0.1000




Val Acc 77.7971%	 Val Loss 1.9075





Epoch 50/100: 
Train Acc 99.9436%	 Train Loss 0.0269	 Learning Rate 0.1000




Val Acc 73.0041%	 Val Loss 2.1095





Epoch 51/100: 
Train Acc 99.8936%	 Train Loss 0.0405	 Learning Rate 0.1000




Val Acc 70.6689%	 Val Loss 2.2728





Epoch 52/100: 
Train Acc 99.9593%	 Train Loss 0.0200	 Learning Rate 0.1000




Val Acc 76.2945%	 Val Loss 1.9489





Epoch 53/100: 
Train Acc 99.9657%	 Train Loss 0.0136	 Learning Rate 0.1000




Val Acc 77.1641%	 Val Loss 1.9605





Epoch 54/100: 
Train Acc 99.9750%	 Train Loss 0.0148	 Learning Rate 0.1000




Val Acc 77.7857%	 Val Loss 2.0215





Epoch 55/100: 
Train Acc 99.9771%	 Train Loss 0.0159	 Learning Rate 0.1000




Val Acc 77.2953%	 Val Loss 2.0575





Epoch 56/100: 
Train Acc 99.9771%	 Train Loss 0.0187	 Learning Rate 0.1000




Val Acc 76.5768%	 Val Loss 2.1388





Epoch 57/100: 
Train Acc 99.7565%	 Train Loss 0.0535	 Learning Rate 0.1000




Val Acc 4.0289%	 Val Loss 8.2025





Epoch 58/100: 
Train Acc 79.8496%	 Train Loss 1.1416	 Learning Rate 0.1000




Val Acc 64.0596%	 Val Loss 2.1661





Epoch 59/100: 
Train Acc 99.5758%	 Train Loss 0.0460	 Learning Rate 0.1000




Val Acc 73.5687%	 Val Loss 1.7463





Epoch 60/100: 
Train Acc 99.9743%	 Train Loss 0.0075	 Learning Rate 0.1000




Val Acc 76.5654%	 Val Loss 1.6007





Epoch 61/100: 
Train Acc 99.9743%	 Train Loss 0.0089	 Learning Rate 0.1000




Val Acc 77.1812%	 Val Loss 1.6192





Epoch 62/100: 
Train Acc 99.9757%	 Train Loss 0.0122	 Learning Rate 0.1000




Val Acc 77.7572%	 Val Loss 1.6509





Epoch 63/100: 
Train Acc 99.9771%	 Train Loss 0.0146	 Learning Rate 0.1000




Val Acc 78.2476%	 Val Loss 1.6891
Saving model





Epoch 64/100: 
Train Acc 99.9771%	 Train Loss 0.0160	 Learning Rate 0.1000




Val Acc 78.5356%	 Val Loss 1.7200
Saving model





Epoch 65/100: 
Train Acc 99.9771%	 Train Loss 0.0166	 Learning Rate 0.1000




Val Acc 78.7551%	 Val Loss 1.7530
Saving model





Epoch 66/100: 
Train Acc 99.9771%	 Train Loss 0.0168	 Learning Rate 0.1000




Val Acc 78.7466%	 Val Loss 1.7923





Epoch 67/100: 
Train Acc 99.9771%	 Train Loss 0.0170	 Learning Rate 0.1000




Val Acc 78.8578%	 Val Loss 1.8209
Saving model





Epoch 68/100: 
Train Acc 99.9450%	 Train Loss 0.0243	 Learning Rate 0.1000




Val Acc 76.8533%	 Val Loss 1.9615





Epoch 69/100: 
Train Acc 99.9700%	 Train Loss 0.0183	 Learning Rate 0.1000




Val Acc 75.7984%	 Val Loss 2.0646





Epoch 70/100: 
Train Acc 99.9714%	 Train Loss 0.0164	 Learning Rate 0.1000




Val Acc 77.9938%	 Val Loss 1.9656





Epoch 71/100: 
Train Acc 99.9771%	 Train Loss 0.0159	 Learning Rate 0.1000




Val Acc 78.2305%	 Val Loss 1.9749





Epoch 72/100: 
Train Acc 99.9771%	 Train Loss 0.0170	 Learning Rate 0.1000




Val Acc 77.6574%	 Val Loss 2.0270





Epoch 73/100: 
Train Acc 99.9771%	 Train Loss 0.0188	 Learning Rate 0.1000




Val Acc 76.8362%	 Val Loss 2.1070





Epoch 74/100: 
Train Acc 99.9686%	 Train Loss 0.0249	 Learning Rate 0.1000




Val Acc 59.8540%	 Val Loss 2.9251





Epoch 75/100: 
Train Acc 84.0287%	 Train Loss 0.9623	 Learning Rate 0.1000




Val Acc 63.0417%	 Val Loss 2.2537





Epoch 76/100: 
Train Acc 99.6401%	 Train Loss 0.0412	 Learning Rate 0.1000




Val Acc 74.3128%	 Val Loss 1.7089





Epoch 77/100: 
Train Acc 99.9693%	 Train Loss 0.0073	 Learning Rate 0.1000




Val Acc 77.2468%	 Val Loss 1.5723





Epoch 78/100: 
Train Acc 99.9757%	 Train Loss 0.0084	 Learning Rate 0.1000




Val Acc 77.7943%	 Val Loss 1.5990





Epoch 79/100: 
Train Acc 99.9771%	 Train Loss 0.0116	 Learning Rate 0.1000




Val Acc 78.3987%	 Val Loss 1.6292





Epoch 80/100: 
Train Acc 99.9771%	 Train Loss 0.0141	 Learning Rate 0.1000




Val Acc 78.7694%	 Val Loss 1.6731





Epoch 81/100: 
Train Acc 99.9771%	 Train Loss 0.0154	 Learning Rate 0.1000




Val Acc 78.9120%	 Val Loss 1.7106
Saving model





Epoch 82/100: 
Train Acc 99.9771%	 Train Loss 0.0160	 Learning Rate 0.1000




Val Acc 79.0460%	 Val Loss 1.7563
Saving model





Epoch 83/100: 
Train Acc 99.9771%	 Train Loss 0.0163	 Learning Rate 0.1000




Val Acc 79.0916%	 Val Loss 1.8109
Saving model





Epoch 84/100: 
Train Acc 99.9771%	 Train Loss 0.0164	 Learning Rate 0.1000




Val Acc 79.1344%	 Val Loss 1.8347
Saving model





Epoch 85/100: 
Train Acc 99.9564%	 Train Loss 0.0204	 Learning Rate 0.1000




Val Acc 75.2110%	 Val Loss 2.0971





Epoch 86/100: 
Train Acc 99.9436%	 Train Loss 0.0246	 Learning Rate 0.1000




Val Acc 76.3886%	 Val Loss 2.0899





Epoch 87/100: 
Train Acc 99.9721%	 Train Loss 0.0157	 Learning Rate 0.1000




Val Acc 78.4643%	 Val Loss 1.9186





Epoch 88/100: 
Train Acc 99.9743%	 Train Loss 0.0150	 Learning Rate 0.1000




Val Acc 78.6810%	 Val Loss 1.9840





Epoch 89/100: 
Train Acc 99.9771%	 Train Loss 0.0157	 Learning Rate 0.1000




Val Acc 78.3103%	 Val Loss 2.0356





Epoch 90/100: 
Train Acc 99.9771%	 Train Loss 0.0173	 Learning Rate 0.1000




Val Acc 77.9482%	 Val Loss 2.0450





Epoch 91/100: 
Train Acc 99.9764%	 Train Loss 0.0205	 Learning Rate 0.1000




Val Acc 75.0941%	 Val Loss 2.1454





Epoch 92/100: 
Train Acc 86.9358%	 Train Loss 0.8097	 Learning Rate 0.1000




Val Acc 63.7061%	 Val Loss 2.1942





Epoch 93/100: 
Train Acc 97.8198%	 Train Loss 0.1523	 Learning Rate 0.1000




Val Acc 68.6616%	 Val Loss 1.9719





Epoch 94/100: 
Train Acc 99.9536%	 Train Loss 0.0108	 Learning Rate 0.1000




Val Acc 77.1185%	 Val Loss 1.5646





Epoch 95/100: 
Train Acc 99.9743%	 Train Loss 0.0069	 Learning Rate 0.1000




Val Acc 78.1478%	 Val Loss 1.5617





Epoch 96/100: 
Train Acc 99.9764%	 Train Loss 0.0098	 Learning Rate 0.1000




Val Acc 78.7266%	 Val Loss 1.5961





Epoch 97/100: 
Train Acc 99.9771%	 Train Loss 0.0126	 Learning Rate 0.1000




Val Acc 79.1486%	 Val Loss 1.6360
Saving model





Epoch 98/100: 
Train Acc 99.9771%	 Train Loss 0.0144	 Learning Rate 0.1000




Val Acc 79.4109%	 Val Loss 1.6818
Saving model





Epoch 99/100: 
Train Acc 99.9771%	 Train Loss 0.0153	 Learning Rate 0.1000




Val Acc 79.5706%	 Val Loss 1.7226
Saving model





Epoch 100/100: 
Train Acc 99.9771%	 Train Loss 0.0158	 Learning Rate 0.1000




Val Acc 79.6105%	 Val Loss 1.7701
Saving model


VBox(children=(Label(value='666.901 MB of 666.901 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0,…

0,1
learning_Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_Acc,▁▆██████████████████████████████████▇███
train_loss,█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁
validation_Acc,▁▅▇▇▇▇▇▆▇██████▅▇███▇██▇█████▆██████▇███
validation_loss,█▃▂▂▂▂▁▂▁▁▁▂▁▁▁▃▁▁▁▁▂▂▂▁▁▁▁▂▂▃▁▁▁▁▁▂▂▁▁▁

0,1
learning_Rate,0.1
train_Acc,99.97715
train_loss,0.01577
validation_Acc,79.61052
validation_loss,1.77015


# Classification Task: Testing

In [None]:
def test(model,dataloader):

  model.eval()
  batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Test')
  test_results = []
  
  for i, (images) in enumerate(dataloader):
      # TODO: Finish predicting on the test set.
      images = images.to(device)

      with torch.inference_mode():
        outputs = model(images)

      outputs = torch.argmax(outputs, axis=1).detach().cpu().numpy().tolist()
      test_results.extend(outputs)
      
      batch_bar.update()
      
  batch_bar.close()
  return test_results

In [None]:
test_results = test(model, test_loader)



## Generate csv to submit to Kaggle

In [None]:
with open("classification_early_submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(test_dataset)):
        f.write("{},{}\n".format(str(i).zfill(6) + ".jpg", test_results[i]))

In [None]:
### Submit to kaggle competition using kaggle API
!kaggle competitions submit -c 11-785-f22-hw2p2-classification -f ./classification_early_submission.csv -m "Test Submission"

100% 541k/541k [00:04<00:00, 133kB/s]
Successfully submitted to Face Recognition

# Verification Task: Validation

The verification task consists of the following generalized scenario:
- You are given X unknown identitites 
- You are given Y known identitites
- Your goal is to match X unknown identities to Y known identities.

We have given you a verification dataset, that consists of 1000 known identities, and 1000 unknown identities. The 1000 unknown identities are split into dev (200) and test (800). Your goal is to compare the unknown identities to the 1000 known identities and assign an identity to each image from the set of unknown identities. 

Your will use/finetune your model trained for classification to compare images between known and unknown identities using a similarity metric and assign labels to the unknown identities. 

This will judge your model's performance in terms of the quality of embeddings/features it generates on images/faces it has never seen during training for classification.

In [None]:
known_regex = "/content/data/verification/known/*/*"
known_paths = [i.split('/')[-2] for i in sorted(glob.glob(known_regex))] 
# This obtains the list of known identities from the known folder

unknown_regex = "/content/data/verification/unknown_test/*" #Change the directory accordingly for the test set

# We load the images from known and unknown folders
unknown_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_regex)))]
known_images = [Image.open(p) for p in tqdm(sorted(glob.glob(known_regex)))]

# Why do you need only ToTensor() here?
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()])

unknown_images = torch.stack([transforms(x) for x in unknown_images])
known_images  = torch.stack([transforms(y) for y in known_images ])
#Print your shapes here to understand what we have done

# You can use other similarity metrics like Euclidean Distance if you wish
similarity_metric = torch.nn.CosineSimilarity(dim= 1, eps= 1e-6) 

In [None]:
def eval_verification(unknown_images, known_images, model, similarity, batch_size= config['batch_size'], mode='test'): 

    unknown_feats, known_feats = [], []

    batch_bar = tqdm(total=len(unknown_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    model.eval()

    # We load the images as batches for memory optimization and avoiding CUDA OOM errors
    for i in range(0, unknown_images.shape[0], batch_size):
        unknown_batch = unknown_images[i:i+batch_size] # Slice a given portion upto batch_size
        
        with torch.no_grad():
            unknown_feat = model(unknown_batch.float().to(device), return_feats=True) #Get features from model         
        unknown_feats.append(unknown_feat)
        batch_bar.update()
    
    batch_bar.close()
    
    batch_bar = tqdm(total=len(known_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    
    for i in range(0, known_images.shape[0], batch_size):
        known_batch = known_images[i:i+batch_size] 
        with torch.no_grad():
              known_feat = model(known_batch.float().to(device), return_feats=True)
          
        known_feats.append(known_feat)
        batch_bar.update()

    batch_bar.close()

    # Concatenate all the batches
    unknown_feats = torch.cat(unknown_feats, dim=0)
    known_feats = torch.cat(known_feats, dim=0)

    similarity_values = torch.stack([similarity(unknown_feats, known_feature) for known_feature in known_feats])
    # Print the inner list comprehension in a separate cell - what is really happening?

    predictions = similarity_values.argmax(0).cpu().numpy() #Why are we doing an argmax here?

    # Map argmax indices to identity strings
    pred_id_strings = [known_paths[i] for i in predictions]
    
    # if mode == 'test':
    #   true_ids = pd.read_csv('/content/data/verification/dev_identities.csv')['label'].tolist()
    #   accuracy = accuracy_score(pred_id_strings, true_ids)
    #   print("Verification Accuracy = {}".format(accuracy))
    
    return pred_id_strings

In [None]:
pred_id_strings = eval_verification(unknown_images, known_images, model, similarity_metric, config['batch_size'], mode='test')

In [None]:
with open("verification_early_submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(pred_id_strings)):
        f.write("{},{}\n".format(i, pred_id_strings[i]))

In [None]:
### Submit to kaggle competition using kaggle API
!kaggle competitions submit -c 11-785-f22-hw2p2-verification -f ./verification_early_submission.csv -m "Test Submission"