# HW2P2: Face Classification and Verification


Congrats on coming to the second homework in 11785: Introduction to Deep Learning. This homework significantly longer and tougher than the previous homework. You have 2 sub-parts as outlined below. Please start early! 


*   Face Recognition: You will be writing your own CNN model to tackle the problem of classification, consisting of 7000 identities
*   Face Verification: You use the model trained for classification to evaluate the quality of its feature embeddings, by comparing the similarity of known and unknown identities

For this HW, you only have to write code to implement your model architecture. Everything else has been provided for you, on the pretext that most of your time will be used up in developing the suitable model architecture for achieving satisfactory performance.

Common errors which you may face in this homeworks (because of the size of the model)


*   CUDA Out of Memory (OOM): You can tackle this problem by (1) Reducing the batch size (2) Calling `torch.cuda.empty_cache()` and `gc.collect()` (3) Finally restarting the runtime



# Preliminaries

In [1]:
# !nvidia-smi # to see what GPU you have

In [2]:
# !pip install wandb --quiet

In [3]:
import torch
from torchsummary import summary
import torchvision #This library is used for image-based operations (Augmentations)
import os
import gc
from tqdm import tqdm
from PIL import Image
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
import glob
import wandb
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", device)

Device:  cuda


In [4]:
# from google.colab import drive # Link your drive if you are a colab user
# drive.mount('/content/drive') # Models in this HW take a long time to get trained and make sure to save it her

# TODOs
As you go, please read the code and keep an eye out for TODOs!

# Download Data from Kaggle

In [5]:
# TODO: Use the same Kaggle code from HW1P2

In [6]:
# !mkdir 'data'

# !kaggle competitions download -c 11-785-f22-hw2p2-classification
# !unzip -qo '11-785-f22-hw2p2-classification.zip' -d 'data'



In [7]:
# !kaggle competitions download -c 11-785-f22-hw2p2-verification
# !unzip -qo '11-785-f22-hw2p2-verification.zip' -d 'data'

# Configs

In [8]:
config = {
    'batch_size': 64, # Increase this if your GPU can handle it
    'lr': 0.2,
    'epochs': 100, # 10 epochs is recommended ONLY for the early submission - you will have to train for much longer typically.
    'optimizer':'SGD',
    'weight_decay': 1e-5,
    'scheduler':'CosineLR',
    'LR_stepsize': 'batch*epochs',
    'smoothing': 0.1,
    'dropout': 0.15
    # Include other parameters as needed.
}

# Classification Dataset

In [9]:
import pickle

with open('normalization_parameters.pkl', 'rb') as f:
    [mean, std] = pickle.load(f)
tuple(mean), tuple(std)

((0.51302713, 0.4033568, 0.35215932), (0.3069095, 0.26970628, 0.25837687))

In [10]:
DATA_DIR = 'data/11-785-f22-hw2p2-classification/'# TODO: Path where you have downloaded the data
TRAIN_DIR = os.path.join(DATA_DIR, "classification/train") 
VAL_DIR = os.path.join(DATA_DIR, "classification/dev")
TEST_DIR = os.path.join(DATA_DIR, "classification/test")

# Transforms using torchvision - Refer https://pytorch.org/vision/stable/transforms.html

train_transforms = torchvision.transforms.Compose([ 
    # Implementing the right transforms/augmentation methods is key to improving performance.
                    torchvision.transforms.RandomHorizontalFlip(p=0.5),
                    torchvision.transforms.RandomVerticalFlip(p=0.5),
                    torchvision.transforms.RandomAdjustSharpness(0.2),
#                     torchvision.transforms.RandomResizedCrop(224),
                    torchvision.transforms.GaussianBlur(kernel_size=7),
                    torchvision.transforms.ToTensor(),
                    torchvision.transforms.Normalize(tuple(mean), tuple(std))
                    ])
# Most torchvision transforms are done on PIL images. So you convert it into a tensor at the end with ToTensor()
# But there are some transforms which are performed after ToTensor() : e.g - Normalization
# Normalization Tip - Do not blindly use normalization that is not suitable for this dataset

val_transforms = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
                                                torchvision.transforms.Normalize(mean, std)
                                                ])


train_dataset = torchvision.datasets.ImageFolder(TRAIN_DIR, transform = train_transforms)
val_dataset = torchvision.datasets.ImageFolder(VAL_DIR, transform = val_transforms)
# You should NOT have data augmentation on the validation set. Why?


# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = config['batch_size'], 
                                           shuffle = True,num_workers = 4, pin_memory = True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size = config['batch_size'], 
                                         shuffle = False, num_workers = 2)

In [11]:
# You can do this with ImageFolder as well, but it requires some tweaking
class ClassificationTestDataset(torch.utils.data.Dataset):

    def __init__(self, data_dir, transforms):
        self.data_dir   = data_dir
        self.transforms = transforms

        # This one-liner basically generates a sorted list of full paths to each image in the test directory
        self.img_paths  = list(map(lambda fname: os.path.join(self.data_dir, fname), sorted(os.listdir(self.data_dir))))

    def __len__(self):
        return len(self.img_paths)
    
    def __getitem__(self, idx):
        return self.transforms(Image.open(self.img_paths[idx]))

In [12]:
test_dataset = ClassificationTestDataset(TEST_DIR, transforms = val_transforms) #Why are we using val_transforms for Test Data?
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = config['batch_size'], shuffle = False,
                         drop_last = False, num_workers = 2)

In [13]:
print("Number of classes: ", len(train_dataset.classes))
print("No. of train images: ", train_dataset.__len__())
print("Shape of image: ", train_dataset[0][0].shape)
print("Batch size: ", config['batch_size'])
print("Train batches: ", train_loader.__len__())
print("Val batches: ", val_loader.__len__())

Number of classes:  7000
No. of train images:  140000
Shape of image:  torch.Size([3, 224, 224])
Batch size:  64
Train batches:  2188
Val batches:  547


# Efficient Network

In [14]:
def get_efficientnet_v2_hyperparam(model_name):
    from box import Box
    # train_size, eval_size, dropout, randaug, mixup
    if 'efficientnet_v2_s' in model_name:
        end = 300, 384, 0.2, 10, 0
    elif 'efficientnet_v2_m' in model_name:
        end = 384, 480, 0.3, 15, 0.2
    elif 'efficientnet_v2_l' in model_name:
        end = 384, 480, 0.4, 20, 0.5
    elif 'efficientnet_v2_xl' in model_name:
        end = 384, 512, 0.4, 20, 0.5
    return Box({"init_train_size": 128, "init_dropout": 0.1, "init_randaug": 5, "init_mixup": 0,
             "end_train_size": end[0], "end_dropout": end[2], "end_randaug": end[3], "end_mixup": end[4], "eval_size": end[1]})


def get_efficientnet_v2_structure(model_name):
    if 'efficientnet_v2_s' in model_name:
        return [
            # e k  s  in  out xN  se   fused
            (1, 3, 1, 24, 24, 2, False, True),
            (4, 3, 2, 24, 48, 4, False, True),
            (4, 3, 2, 48, 64, 4, False, True),
            (4, 3, 2, 64, 128, 6, True, False),
            (6, 3, 1, 128, 160, 9, True, False),
            (6, 3, 2, 160, 256, 15, True, False),
        ]
    elif 'efficientnet_v2_m' in model_name:
        return [
            # e k  s  in  out xN  se   fused
            (1, 3, 1, 24, 24, 3, False, True),
            (4, 3, 2, 24, 48, 5, False, True),
            (4, 3, 2, 48, 80, 5, False, True),
            (4, 3, 2, 80, 160, 7, True, False),
            (6, 3, 1, 160, 176, 14, True, False),
            (6, 3, 2, 176, 304, 18, True, False),
            (6, 3, 1, 304, 512, 5, True, False),
        ]
    elif 'efficientnet_v2_l' in model_name:
        return [
            # e k  s  in  out xN  se   fused
            (1, 3, 1, 32, 32, 4, False, True),
            (4, 3, 2, 32, 64, 7, False, True),
            (4, 3, 2, 64, 96, 7, False, True),
            (4, 3, 2, 96, 192, 10, True, False),
            (6, 3, 1, 192, 224, 19, True, False),
            (6, 3, 2, 224, 384, 25, True, False),
            (6, 3, 1, 384, 640, 7, True, False),
        ]
    elif 'efficientnet_v2_xl' in model_name:
        return [
            # e k  s  in  out xN  se   fused
            (1, 3, 1, 32, 32, 4, False, True),
            (4, 3, 2, 32, 64, 8, False, True),
            (4, 3, 2, 64, 96, 8, False, True),
            (4, 3, 2, 96, 192, 16, True, False),
            (6, 3, 1, 192, 256, 24, True, False),
            (6, 3, 2, 256, 512, 32, True, False),
            (6, 3, 1, 512, 640, 8, True, False),
        ]

In [15]:
from effnet_v2 import *

# model = get_efficientnet_v2(model_name, pretrained, num_classes, dropout=dropout)
model_name = 'efficientnet_v2_m'
nclass = 7000
dropout = config['dropout']
stochastic_depth = 0.2


residual_config = [MBConvConfig(*layer_config) for layer_config in get_efficientnet_v2_structure(model_name)]
model = EfficientNetV2(residual_config, 1280, nclass, dropout=dropout, stochastic_depth=stochastic_depth, block=MBConv, act_layer=nn.SiLU)
efficientnet_v2_init(model)

In [16]:
model.to(device)
summary(model, (3,224,224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 24, 112, 112]             648
       BatchNorm2d-2         [-1, 24, 112, 112]              48
              SiLU-3         [-1, 24, 112, 112]               0
            Conv2d-4         [-1, 24, 112, 112]           5,184
       BatchNorm2d-5         [-1, 24, 112, 112]              48
              SiLU-6         [-1, 24, 112, 112]               0
   StochasticDepth-7         [-1, 24, 112, 112]               0
            MBConv-8         [-1, 24, 112, 112]               0
            Conv2d-9         [-1, 24, 112, 112]           5,184
      BatchNorm2d-10         [-1, 24, 112, 112]              48
             SiLU-11         [-1, 24, 112, 112]               0
  StochasticDepth-12         [-1, 24, 112, 112]               0
           MBConv-13         [-1, 24, 112, 112]               0
           Conv2d-14         [-1, 24, 1

In [17]:
for images, labels in train_loader:
    images = images.to(device)
    print(model(images).shape)
    break

torch.Size([64, 7000])


In [18]:
for images, labels in train_loader:
    images = images.to(device)
    print(model(images, return_feats=True).shape)
    break

torch.Size([64, 1280])


In [19]:
# from resnet50 import Bottleneck, Block, ResNet


# def ResNet50():
#     return ResNet(Bottleneck, [3,4,6,3], num_classes=7000, num_channels=3)
    
# def ResNet101():
#     return ResNet(Bottleneck, [3,4,23,3], num_classes=7000, num_channels=3)

# def ResNet152():
#     return ResNet(Bottleneck, [3,8,36,3], num_classes=7000, num_channels=3)

# model = ResNet50().to(device)
# summary(model, (3, 224, 224))

# Setup everything for training

In [20]:
config

{'batch_size': 64,
 'lr': 0.2,
 'epochs': 100,
 'optimizer': 'SGD',
 'weight_decay': 1e-05,
 'scheduler': 'CosineLR',
 'LR_stepsize': 'batch*epochs',
 'smoothing': 0.1,
 'dropout': 0.15}

In [21]:
criterion = torch.nn.CrossEntropyLoss(label_smoothing=config['smoothing'])
optimizer = torch.optim.SGD(model.parameters(), lr=config['lr'], weight_decay=config['weight_decay'])
# TODO: Implement a scheduler (Optional but Highly Recommended)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=config['epochs']*len(train_loader), eta_min=1e-09)
# scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=10, T_mult=2, eta_min=0.00001, last_epoch=-1)
# You can try ReduceLRonPlateau, StepLR, MultistepLR, CosineAnnealing, etc.
scaler = torch.cuda.amp.GradScaler() # Good news. We have FP16 (Mixed precision training) implemented for you
# It is useful only in the case of compatible GPUs such as T4/V100

# Let's train!

In [22]:
def train(model, dataloader, optimizer, criterion):
    
    model.train()

    # Progress Bar 
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, leave=False, position=0, desc='Train', ncols=5) 
    
    num_correct = 0
    total_loss = 0

    for i, (images, labels) in enumerate(dataloader):
        
        optimizer.zero_grad() # Zero gradients

        images, labels = images.to(device), labels.to(device)
        
        with torch.cuda.amp.autocast(): # This implements mixed precision. Thats it! 
            outputs = model(images)
            loss = criterion(outputs, labels)

        # Update no. of correct predictions & loss as we iterate
        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        # tqdm lets you add some details so you can monitor training as you train.
        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct,
            lr="{:.04f}".format(float(optimizer.param_groups[0]['lr'])))
        
        scaler.scale(loss).backward() # This is a replacement for loss.backward()
        scaler.step(optimizer) # This is a replacement for optimizer.step()
        scaler.update() 

        # TODO? Depending on your choice of scheduler,
        # You may want to call some schdulers inside the train function. What are these?
        
      
        batch_bar.update() # Update tqdm bar
    
        scheduler.step()

    batch_bar.close() # You need this to close the tqdm bar

    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))

    return acc, total_loss

In [23]:
def validate(model, dataloader, criterion):
  
    model.eval()
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Val', ncols=5)

    num_correct = 0.0
    total_loss = 0.0

    for i, (images, labels) in enumerate(dataloader):
        
        # Move images to device
        images, labels = images.to(device), labels.to(device)
        
        # Get model outputs
        with torch.inference_mode():
            outputs = model(images)
            loss = criterion(outputs, labels)

        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct)

        batch_bar.update()
        
    batch_bar.close()
    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))
    return acc, total_loss

In [24]:
gc.collect() # These commands help you when you face CUDA OOM error
torch.cuda.empty_cache()

# Wandb

In [25]:
wandb.login(key="db668044fcf11bc7352d5c79ac0deb75ac86a16b") #API Key is in your wandb account, under settings (wandb.ai/settings)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mjiin[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/jiin/.netrc


True

In [26]:
# Create your wandb run
run = wandb.init(
    name = "effinet", ## Wandb creates random run names if you skip this field
    reinit = True, ### Allows reinitalizing runs when you re-run this cell
    # run_id = ### Insert specific run id here if you want to resume a previous run
    # resume = "must" ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw2p2-recognition", ### Project should be created in your wandb account 
    config = config ### Wandb Config for your run
)

# Experiments

In [27]:
root = './cls_result'
model_directory = 'effinet'
exp_name = '1_original.pth'
if not os.path.exists(root + '/' + model_directory):
    os.mkdir(root + '/' + model_directory)

In [None]:
best_valacc = 0.0

for epoch in range(config['epochs']):

    curr_lr = float(optimizer.param_groups[0]['lr'])

    train_acc, train_loss = train(model, train_loader, optimizer, criterion)
    
    print("\nEpoch {}/{}: \nTrain Acc {:.04f}%\t Train Loss {:.04f}\t Learning Rate {:.04f}".format(
        epoch + 1,
        config['epochs'],
        train_acc,
        train_loss,
        curr_lr))
    
    val_acc, val_loss = validate(model, val_loader, criterion)
    
    print("Val Acc {:.04f}%\t Val Loss {:.04f}".format(val_acc, val_loss))

    wandb.log({"train_loss":train_loss, 'train_Acc': train_acc, 'validation_Acc':val_acc, 
               'validation_loss': val_loss, "learning_Rate": curr_lr})
    
    # If you are using a scheduler in your train function within your iteration loop, you may want to log
    # your learning rate differently 
    

    # #Save model in drive location if val_acc is better than best recorded val_acc
    if val_acc >= best_valacc:
#         path = os.path.join(root, model_directory, 'checkpoint' + '.pth')
        path = os.path.join(root, model_directory, exp_name)
        print("Saving model")
        torch.save({'model_state_dict':model.state_dict(),
              'optimizer_state_dict':optimizer.state_dict(),
              'scheduler_state_dict':scheduler.state_dict(),
              'val_acc': val_acc, 
              'epoch': epoch}, path)
        best_valacc = val_acc
        
        wandb.save('checkpoint.pth')
      # You may find it interesting to exlplore Wandb Artifcats to version your models
run.finish()

                                                                                                                                                                                


Epoch 1/100: 
Train Acc 0.0129%	 Train Loss 8.8545	 Learning Rate 0.2000


                                                                                                                                                                                

Val Acc 0.0428%	 Val Loss 8.8168
Saving model


                                                                                                                                                                                


Epoch 2/100: 
Train Acc 0.0514%	 Train Loss 8.6561	 Learning Rate 0.2000


                                                                                                                                                                                

Val Acc 0.1400%	 Val Loss 8.5352
Saving model


                                                                                                                                                                                


Epoch 3/100: 
Train Acc 0.2192%	 Train Loss 8.1042	 Learning Rate 0.1998


                                                                                                                                                                                

Val Acc 0.5770%	 Val Loss 7.8629
Saving model


                                                                                                                                                                                


Epoch 4/100: 
Train Acc 0.9769%	 Train Loss 7.4341	 Learning Rate 0.1996


                                                                                                                                                                                

Val Acc 1.9481%	 Val Loss 7.1345
Saving model


                                                                                                                                                                                


Epoch 5/100: 
Train Acc 3.1907%	 Train Loss 6.7705	 Learning Rate 0.1992


                                                                                                                                                                                

Val Acc 4.8875%	 Val Loss 6.5606
Saving model


                                                                                                                                                                                


Epoch 6/100: 
Train Acc 7.7118%	 Train Loss 6.1485	 Learning Rate 0.1988


                                                                                                                                                                                

Val Acc 9.7492%	 Val Loss 6.2729
Saving model


                                                                                                                                                                                


Epoch 7/100: 
Train Acc 14.0104%	 Train Loss 5.5871	 Learning Rate 0.1982


                                                                                                                                                                                

Val Acc 14.5195%	 Val Loss 5.6478
Saving model


                                                                                                                                                                                


Epoch 8/100: 
Train Acc 21.7750%	 Train Loss 5.0704	 Learning Rate 0.1976


                                                                                                                                                                                

Val Acc 20.3468%	 Val Loss 5.2949
Saving model


                                                                                                                                                                                


Epoch 9/100: 
Train Acc 29.6525%	 Train Loss 4.6187	 Learning Rate 0.1969


                                                                                                                                                                                

Val Acc 29.4333%	 Val Loss 4.7738
Saving model


                                                                                                                                                                                


Epoch 10/100: 
Train Acc 37.3707%	 Train Loss 4.2228	 Learning Rate 0.1960


                                                                                                                                                                                

Val Acc 33.4695%	 Val Loss 4.6131
Saving model


                                                                                                                                                                                


Epoch 11/100: 
Train Acc 44.9554%	 Train Loss 3.8723	 Learning Rate 0.1951


                                                                                                                                                                                

Val Acc 40.2108%	 Val Loss 4.2738
Saving model


                                                                                                                                                                                


Epoch 12/100: 
Train Acc 51.6496%	 Train Loss 3.5616	 Learning Rate 0.1941


                                                                                                                                                                                

Val Acc 43.3987%	 Val Loss 4.1116
Saving model


                                                                                                                                                                                


Epoch 13/100: 
Train Acc 57.9168%	 Train Loss 3.2936	 Learning Rate 0.1930


                                                                                                                                                                                

Val Acc 48.2918%	 Val Loss 3.9064
Saving model


                                                                                                                                                                                


Epoch 14/100: 
Train Acc 63.4176%	 Train Loss 3.0567	 Learning Rate 0.1918


                                                                                                                                                                                

Val Acc 51.2197%	 Val Loss 3.7888
Saving model


                                                                                                                                                                                


Epoch 15/100: 
Train Acc 68.3522%	 Train Loss 2.8456	 Learning Rate 0.1905


                                                                                                                                                                                

Val Acc 51.9396%	 Val Loss 3.7422
Saving model


                                                                                                                                                                                


Epoch 16/100: 
Train Acc 72.7605%	 Train Loss 2.6596	 Learning Rate 0.1891


                                                                                                                                                                                

Val Acc 56.2871%	 Val Loss 3.5625
Saving model


                                                                                                                                                                                


Epoch 17/100: 
Train Acc 77.0595%	 Train Loss 2.4927	 Learning Rate 0.1876


                                                                                                                                                                                

Val Acc 59.9863%	 Val Loss 3.4255
Saving model


                                                                                                                                                                                


Epoch 18/100: 
Train Acc 80.5480%	 Train Loss 2.3451	 Learning Rate 0.1861


                                                                                                                                                                                

Val Acc 61.4602%	 Val Loss 3.3859
Saving model


                                                                                                                                                                                


Epoch 19/100: 
Train Acc 83.9301%	 Train Loss 2.2123	 Learning Rate 0.1844


                                                                                                                                                                                

Val Acc 62.3429%	 Val Loss 3.3490
Saving model


Train:   5%|███▊                                                                     | 115/2188 [00:34<10:21,  3.34it/s, acc=89.2663%, loss=2.0180, lr=0.1826, num_correct=6570]

# Classification Task: Testing

In [None]:
os.path.join(root, model_directory, exp_name)

In [None]:
model.load_state_dict(torch.load(os.path.join(root, model_directory, exp_name))['model_state_dict'])
print(torch.load(os.path.join(root, model_directory, exp_name))['val_acc'])

In [None]:
def test(model,dataloader):

  model.eval()
  batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Test')
  test_results = []
  
  for i, (images) in enumerate(dataloader):
      # TODO: Finish predicting on the test set.
      images = images.to(device)

      with torch.inference_mode():
        outputs = model(images)

      outputs = torch.argmax(outputs, axis=1).detach().cpu().numpy().tolist()
      test_results.extend(outputs)
      
      batch_bar.update()
      
  batch_bar.close()
  return test_results

In [None]:
test_results = test(model, test_loader)

## Generate csv to submit to Kaggle

In [None]:
with open(os.path.join(root, model_directory, "effinet_{}.csv".format(exp_name[:-4])), "w+") as f:
    f.write("id,label\n")
    for i in range(len(test_dataset)):
        f.write("{},{}\n".format(str(i).zfill(6) + ".jpg", test_results[i]))

In [None]:
# !kaggle competitions submit -c 11-785-f22-hw2p2-classification -f cls_result/ResNet50/resnet50_1_normalize.csv -m "Message"

# Verification Task: Validation

The verification task consists of the following generalized scenario:
- You are given X unknown identitites 
- You are given Y known identitites
- Your goal is to match X unknown identities to Y known identities.

We have given you a verification dataset, that consists of 1000 known identities, and 1000 unknown identities. The 1000 unknown identities are split into dev (200) and test (800). Your goal is to compare the unknown identities to the 1000 known identities and assign an identity to each image from the set of unknown identities. 

Your will use/finetune your model trained for classification to compare images between known and unknown identities using a similarity metric and assign labels to the unknown identities. 

This will judge your model's performance in terms of the quality of embeddings/features it generates on images/faces it has never seen during training for classification.

In [None]:
known_regex = "data/verification/known/*/*"
known_paths = [i.split('/')[-2] for i in sorted(glob.glob(known_regex))] 
# This obtains the list of known identities from the known folder

unknown_regex = "data/verification/unknown_dev/*" #Change the directory accordingly for the test set

# We load the images from known and unknown folders
unknown_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_regex)))]
known_images = [Image.open(p) for p in tqdm(sorted(glob.glob(known_regex)))]

# Why do you need only ToTensor() here?
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean, std)])

unknown_images = torch.stack([transforms(x) for x in unknown_images])
known_images  = torch.stack([transforms(y) for y in known_images ])
#Print your shapes here to understand what we have done

# You can use other similarity metrics like Euclidean Distance if you wish
similarity_metric = torch.nn.CosineSimilarity(dim= 1, eps= 1e-6) 

In [None]:
def eval_verification(unknown_images, known_images, model, similarity, batch_size= config['batch_size'], mode='val'): 

    unknown_feats, known_feats = [], []

    batch_bar = tqdm(total=len(unknown_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    model.eval()

    # We load the images as batches for memory optimization and avoiding CUDA OOM errors
    for i in range(0, unknown_images.shape[0], batch_size):
        unknown_batch = unknown_images[i:i+batch_size] # Slice a given portion upto batch_size
        
        with torch.no_grad():
            unknown_feat = model(unknown_batch.float().to(device), return_feats=True) #Get features from model         
        unknown_feats.append(unknown_feat)
        batch_bar.update()
    
    batch_bar.close()
    
    batch_bar = tqdm(total=len(known_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    
    for i in range(0, known_images.shape[0], batch_size):
        known_batch = known_images[i:i+batch_size] 
        with torch.no_grad():
              known_feat = model(known_batch.float().to(device), return_feats=True)
          
        known_feats.append(known_feat)
        batch_bar.update()

    batch_bar.close()

    # Concatenate all the batches
    unknown_feats = torch.cat(unknown_feats, dim=0)
    known_feats = torch.cat(known_feats, dim=0)

    similarity_values = torch.stack([similarity(unknown_feats, known_feature) for known_feature in known_feats])
    # Print the inner list comprehension in a separate cell - what is really happening?

    predictions = similarity_values.argmax(0).cpu().numpy() #Why are we doing an argmax here?

    # Map argmax indices to identity strings
    pred_id_strings = [known_paths[i] for i in predictions]
    
    if mode == 'val':
      true_ids = pd.read_csv('data/verification/dev_identities.csv')['label'].tolist()
      accuracy = accuracy_score(pred_id_strings, true_ids)
      print("Verification Accuracy = {}".format(accuracy))
    
    return pred_id_strings

In [None]:
pred_id_strings = eval_verification(unknown_images, known_images, model, similarity_metric, config['batch_size'], mode='val')

In [None]:
unknown_regex = "data/verification/unknown_test/*" #Change the directory accordingly for the test set

# We load the images from known and unknown folders
unknown_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_regex)))]

# Why do you need only ToTensor() here?
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()])

unknown_images = torch.stack([transforms(x) for x in unknown_images])

In [None]:
pred_id_strings = eval_verification(unknown_images, known_images, model, similarity_metric, config['batch_size'], mode='test')

In [None]:
len(pred_id_strings)

In [None]:
root = './vrf_result'
model_directory = 'effinet'
# exp_name = '0_original.pth'
if not os.path.exists(root + '/' + model_directory):
    os.mkdir(root + '/' + model_directory)

In [None]:
with open(os.path.join(root, model_directory, "effinet_{}.csv".format(exp_name[:-4])), "w+") as f:
    f.write("id,label\n")
    for i in range(len(pred_id_strings)):
        f.write("{},{}\n".format(i, pred_id_strings[i]))

In [None]:
# !kaggle competitions submit -c 11-785-f22-hw2p2-verification -f vrf_result/ResNet50/resnet50_1_normalize.csv -m "Message"