# HW2P2: Face Classification and Verification


Congrats on coming to the second homework in 11785: Introduction to Deep Learning. This homework significantly longer and tougher than the previous homework. You have 2 sub-parts as outlined below. Please start early! 


*   Face Recognition: You will be writing your own CNN model to tackle the problem of classification, consisting of 7000 identities
*   Face Verification: You use the model trained for classification to evaluate the quality of its feature embeddings, by comparing the similarity of known and unknown identities

For this HW, you only have to write code to implement your model architecture. Everything else has been provided for you, on the pretext that most of your time will be used up in developing the suitable model architecture for achieving satisfactory performance.

Common errors which you may face in this homeworks (because of the size of the model)


*   CUDA Out of Memory (OOM): You can tackle this problem by (1) Reducing the batch size (2) Calling `torch.cuda.empty_cache()` and `gc.collect()` (3) Finally restarting the runtime



# README

## Instructions to run code: All cells need to be run!

Ablations Strategies: 

1) Architectures considered:
- ResNet 50  (Easy implementation - Low cutoff reached)
- ResNet 101 (Allowed me to reach Medium Cutoff sucesfully, however, improvements thereafter were quite tedious and required a reconsideration of architecture)
- ConvNeXt   (Best Performance - highly improved training accuracy achieved within 15 epochs)

2) Epochs: 
- Training for longer epochs improved performance, however, beyond a certain number of epochs (75-85), the tradeoff in performance and resource consumption was not beneficial.

3) Hyperparameters: 
* Learning Rate tuning was not required, 0.1 LR achieved the high cutoff requirement
* Batch Size was experimented at different values from 16-128. Finally a batch_size of 64 was selected due its acceptable performance for compute units consumed. Although lower batch sizes notably improved performance.

4) Data loading scheme:
* RandomHorizontalFlip(0.5) - increasing value of flip marginally increased performance
* ColorJitter(brightness = 0.65, contrast = 0.45, saturation = 0.55)
* RandomGrayscale(p=0.2)
* RandomRotation(degrees=(-30, 30))
* RandomPerspective(distortion_scale=0.25, p=0.25) - minimal impact observed
* RandAugment
* GaussianBlur(kernel_size=(3, 3), sigma=(0.1,.2))
  - Small Gaussian Blur worked in the initial stages, however, in the final code iteration, it did not improve performance, hence was removed
* Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
  - Normalization was not fruitful for this dataset

# Preliminaries

In [None]:
!nvidia-smi # to see what GPU you have

Wed Nov  2 14:57:41 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P0    38W / 300W |   2907MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!pip install wandb --quiet

In [None]:
# !pip install poutyne

In [2]:
import torch
from torchsummary import summary
import torchvision #This library is used for image-based operations (Augmentations)
import os
import gc
from tqdm import tqdm
from PIL import Image
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
import glob
#import wandb
from torch.utils.data import Subset
from sklearn.model_selection import train_test_split
# from poutyne import set_seeds
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", device)

Device:  cuda


In [3]:
from google.colab import drive # Link your drive if you are a colab user
drive.mount('/content/drive') # Models in this HW take a long time to get trained and make sure to save it her

# import os.path as path 
# if not path.exists("/content/drive"):
#     !sudo add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
#     !sudo apt-get update -qq 2>&1 > /dev/null
#     !sudo apt -y install -qq google-drive-ocamlfuse 2>&1 > /dev/null
#     !google-drive-ocamlfuse

#     !sudo apt-get install -qq w3m # to act as web browser 
#     !xdg-settings set default-web-browser w3m.desktop # to set default browser
#     %cd /content
#     !mkdir drive
#     %cd drive
#     !mkdir MyDrive
#     %cd ..
#     %cd ..
#     !google-drive-ocamlfuse /content/drive/MyDrive

Mounted at /content/drive


# TODOs
As you go, please read the code and keep an eye out for TODOs!

# Download Data from Kaggle

In [4]:
!pip install --upgrade --force-reinstall --no-deps kaggle==1.5.8
!mkdir /root/.kaggle

with open("/root/.kaggle/kaggle.json", "w+") as f:
    f.write('{"username":"ripcurl11","key":"a924e45910075179ad325ad28d952008"}') 
    # Put your kaggle username & key here

!chmod 600 /root/.kaggle/kaggle.json

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting kaggle==1.5.8
  Downloading kaggle-1.5.8.tar.gz (59 kB)
[K     |████████████████████████████████| 59 kB 4.0 MB/s 
[?25hBuilding wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Created wheel for kaggle: filename=kaggle-1.5.8-py3-none-any.whl size=73276 sha256=a0a3ee86d1c3bb2c45c35be89044f60bb88e2497293b9f7f9d7b9f5625bb2dd3
  Stored in directory: /root/.cache/pip/wheels/de/f7/d8/c3902cacb7e62cb611b1ad343d7cc07f42f7eb76ae3a52f3d1
Successfully built kaggle
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.12
    Uninstalling kaggle-1.5.12:
      Successfully uninstalled kaggle-1.5.12
Successfully installed kaggle-1.5.8


In [5]:
!mkdir '/content/data'

!kaggle competitions download -c 11-785-f22-hw2p2-classification
!unzip -qo '11-785-f22-hw2p2-classification.zip' -d '/content/data'

!kaggle competitions download -c 11-785-f22-hw2p2-verification
!unzip -qo '11-785-f22-hw2p2-verification.zip' -d '/content/data'

Downloading 11-785-f22-hw2p2-classification.zip to /content
100% 2.36G/2.37G [01:21<00:00, 41.2MB/s]
100% 2.37G/2.37G [01:21<00:00, 31.1MB/s]
Downloading 11-785-f22-hw2p2-verification.zip to /content
100% 16.8M/16.8M [00:01<00:00, 27.4MB/s]
100% 16.8M/16.8M [00:01<00:00, 17.4MB/s]


# Configs

In [7]:
config = {
    'batch_size': 64, # Increase this if your GPU can handle it
    'lr': 0.1,
    'epochs': 85, # 10 epochs is recommended ONLY for the early submission - you will have to train for much longer typically.
    
    ##### Include other parameters as needed.
}

# Classification Dataset

In [8]:
# np.random.seed(42)
# torch.manual_seed(42)

DATA_DIR = '/content/data/11-785-f22-hw2p2-classification/'####################### TODO: Path where you have downloaded the data
TRAIN_DIR = os.path.join(DATA_DIR, "classification/train") 
VAL_DIR = os.path.join(DATA_DIR, "classification/dev")
TEST_DIR = os.path.join(DATA_DIR, "classification/test")

# Transforms using torchvision - Refer https://pytorch.org/vision/stable/transforms.html

train_transforms = torchvision.transforms.Compose([
                torchvision.transforms.RandomHorizontalFlip(0.5),
                torchvision.transforms.ColorJitter(brightness = 0.2, contrast = 0.2, saturation = 0.2),
                torchvision.transforms.RandomGrayscale(p=0.2),
                torchvision.transforms.RandAugment(),
                #torchvision.transforms.RandomPerspective(distortion_scale=0.2, p=0.2),
                torchvision.transforms.ToTensor(),
                #torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
                ])
# Most torchvision transforms are done on PIL images. So you convert it into a tensor at the end with ToTensor()

# But there are some transforms which are performed after ToTensor() : e.g - Normalization
# Normalization Tip - Do not blindly use normalization that is not suitable for this dataset

val_transforms = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])


train_dataset = torchvision.datasets.ImageFolder(TRAIN_DIR, transform = train_transforms)
val_dataset = torchvision.datasets.ImageFolder(VAL_DIR, transform = val_transforms)
# You should NOT have data augmentation on the validation set. Why?

# train_indices, valid_test_indices = train_test_split(np.arange(len(dataset)),
#                                                     train_size=0.4,
#                                                     stratify=dataset.targets,
#                                                     random_state=42)
# # We take 20% for the validation dataset and 20% for the test dataset
# # (i.e. 50% of the remaining 40%).
# valid_indices, test_indices = train_test_split(valid_test_indices,
#                                             train_size=0.5,
#                                             stratify=np.asarray(dataset.targets)[valid_test_indices],
#                                             random_state=42)

# train_dataset = Subset(dataset, train_indices)


# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = config['batch_size'], 
                                           shuffle = True,num_workers = 4, pin_memory = True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size = config['batch_size'], 
                                         shuffle = False, num_workers = 2)

  cpuset_checked))


In [9]:
# You can do this with ImageFolder as well, but it requires some tweaking
class ClassificationTestDataset(torch.utils.data.Dataset):

    def __init__(self, data_dir, transforms):
        self.data_dir   = data_dir
        self.transforms = transforms

        # This one-liner basically generates a sorted list of full paths to each image in the test directory
        self.img_paths  = list(map(lambda fname: os.path.join(self.data_dir, fname), sorted(os.listdir(self.data_dir))))

    def __len__(self):
        return len(self.img_paths)
    
    def __getitem__(self, idx):
        return self.transforms(Image.open(self.img_paths[idx]))

In [10]:
test_dataset = ClassificationTestDataset(TEST_DIR, transforms = val_transforms) #Why are we using val_transforms for Test Data?
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = config['batch_size'], shuffle = False,
                         drop_last = False, num_workers = 2)

In [None]:
print("Number of classes: ", len(train_dataset.classes))
print("No. of train images: ", train_dataset.__len__())
print("Shape of image: ", train_dataset[0][0].shape)
print("Batch size: ", config['batch_size'])
print("Train batches: ", train_loader.__len__())
print("Val batches: ", val_loader.__len__())

Number of classes:  7000
No. of train images:  140000
Shape of image:  torch.Size([3, 224, 224])
Batch size:  64
Train batches:  2188
Val batches:  547


# Very Simple Network (for Mandatory Early Submission)

In [None]:
# class Network(torch.nn.Module):
#     """
#     The Very Low early deadline architecture is a 4-layer CNN.

#     The first Conv layer has 64 channels, kernel size 7, and stride 4.
#     The next three have 128, 256, and 512 channels. Each have kernel size 3 and stride 2.
    
#     Think about strided convolutions from the lecture, as convolutioin with stride= 1 and downsampling.
#     For stride 1 convolution, what padding do you need for preserving the spatial resolution? 
#     (Hint => padding = kernel_size // 2) - Why?)

#     Each Conv layer is accompanied by a Batchnorm and ReLU layer.
#     Finally, you want to average pool over the spatial dimensions to reduce them to 1 x 1. Use AdaptiveAvgPool2d.
#     Then, remove (Flatten?) these trivial 1x1 dimensions away.
#     Look through https://pytorch.org/docs/stable/nn.html 
    
#     TODO: Fill out the model definition below! 

#     Why does a very simple network have 4 convolutions?
#     Input images are 224x224. Note that each of these convolutions downsample.
#     Downsampling 2x effectively doubles the receptive field, increasing the spatial
#     region each pixel extracts features from. Downsampling 32x is standard
#     for most image models.

#     Why does a very simple network have high channel sizes?
#     Every time you downsample 2x, you do 4x less computation (at same channel size).
#     To maintain the same level of computation, you 2x increase # of channels, which 
#     increases computation by 4x. So, balances out to same computation.
#     Another intuition is - as you downsample, you lose spatial information. We want
#     to preserve some of it in the channel dimension.
#     """

#     def __init__(self, num_classes=7000):
#         super().__init__()

#         self.backbone = torch.nn.Sequential(
#             # TODO
#             torch.nn.Conv2d(3, 64, kernel_size=7, stride=4, padding=3, bias=False),
#             torch.nn.BatchNorm2d(64),
#             torch.nn.ReLU(),

#             torch.nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1, bias=False),
#             torch.nn.BatchNorm2d(128),
#             torch.nn.ReLU(),

#             torch.nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1, bias=False),
#             torch.nn.BatchNorm2d(256),
#             torch.nn.ReLU(),

#             torch.nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1, bias=False),
#             torch.nn.BatchNorm2d(512),
#             torch.nn.ReLU(),

#             torch.nn.AdaptiveAvgPool2d((1, 1)), # For each channel, collapses (averages) the entire feature map (height & width) to 1x1
#             torch.nn.Flatten(), # the above ends up with batch_size x 64 x 1 x 1, flatten to batch_size x 64
#             ) 
        
#         self.cls_layer = torch.nn.Sequential(
#             torch.nn.Linear(512, num_classes)
#         )

#     def weight_init(m):
#       if isinstance(m, torch.nn.Linear):
#         torch.nn.init.kaiming_uniform_(m.weight)
#       elif isinstance(m, torch.nn.Conv2d):
#         torch.nn.init.kaiming_normal_(m.weight, mode='fan_out')
    
#     def forward(self, x, return_feats=False):
#         """
#         What is return_feats? It essentially returns the second-to-last-layer
#         features of a given image. It's a "feature encoding" of the input image,
#         and you can use it for the verification task. You would use the outputs
#         of the final classification layer for the classification task.

#         You might also find that the classification outputs are sometimes better
#         for verification too - try both.
#         """
#         feats = self.backbone(x)
#         out = self.cls_layer(feats)

#         if return_feats:
#             return feats
#         else:
#             return out
            
# #model = Network().to(device)
# #summary(model, (3, 224, 224))

# ConNeXt Network (High Cutoff Submission)

ConvNeXt

In [14]:
class BasicConvNeXtBlock(torch.nn.Module):
  def __init__(self, in_channels, out_channels, exp_ratio, dropout=0.1):
    super().__init__()
    hidden_dim = in_channels * exp_ratio

    # Referring to an article on ConvNeXt from https://towardsdatascience.com/implementing-convnext-in-pytorch-7e37a67abba6
    # specific reference to bottleneck block in convnext-14.py 

    # narrow -> wide (with depth-wise and bigger kernel)
    self.conv_1 = torch.nn.Sequential(
        torch.nn.Conv2d(in_channels, in_channels, kernel_size=7, padding=3, groups=in_channels),
        # torch.nn.GroupNorm((num_groups=1, num_channels=in_channels),
        torch.nn.BatchNorm2d(in_channels)
    )

    # wide -> wide
    self.conv_2 = torch.nn.Sequential(
        torch.nn.Conv2d(in_channels, hidden_dim, kernel_size=1, stride=1, padding=0),
        #torch.nn.ReLU(),
        #torch.nn.LeakyReLU(),
        #torch.nn.PReLU(),
        torch.nn.GELU(),
    )

    # wide -> narrow
    self.conv_3 = torch.nn.Sequential(
        torch.nn.Conv2d(hidden_dim, out_channels, kernel_size=1, stride=1, padding=0)
    )

    self.drop_path = torchvision.ops.StochasticDepth(p=dropout, mode="batch")

  def forward(self, x):
    out = self.conv_1(x)
    out = self.conv_2(out)
    out = self.conv_3(out)
    x = x + self.drop_path(out)
    return x
    
class Network(torch.nn.Module):

  def __init__(self, num_classes=7000):
    super().__init__()
    self.num_classes = num_classes
    self.layers = []

    # Defined ConvNeXt Stage Configurations from Facebook Research (Tiny ConvNeXt implementation)
    # def convnext_tiny(pretrained=False,in_22k=False, **kwargs):
    #   model = ConvNeXt(depths=[3, 3, 9, 3], dims=[96, 192, 384, 768], **kwargs)
    self.stage_config = [
        # Expansion Ratio = 4, No. of Channels, Depth
        [4, 96, 3],
        [4, 192, 3],
        [4, 384, 9],
        [4, 768, 3],
    ]

    print(self.stage_config)
    print(len(self.stage_config))

    # Stem: the first layer in the model that does the heavy downsampling of the input image.
    stem = torch.nn.Sequential(
        torch.nn.Conv2d(3, self.stage_config[0][1], kernel_size=4, stride=4),
        torch.nn.BatchNorm2d(96)) 

    self.layers.append(stem)
  
    for i in range(0, len(self.stage_config)):
      exp_ratio, in_channels, num_blocks = self.stage_config[i]

      for j in range(0, num_blocks):
        self.layers.append(BasicConvNeXtBlock(self.stage_config[i][1], self.stage_config[i][1], self.stage_config[i][0], dropout=0.1))
      
      if i < len(self.stage_config)-1:
        # Intermediate downsampling conv layers
        down_smpl = torch.nn.Sequential(
          torch.nn.BatchNorm2d(self.stage_config[i][1]),
          torch.nn.Conv2d(self.stage_config[i][1], self.stage_config[i+1][1], kernel_size=2, stride=2),
        )
        self.layers.append(down_smpl)
    
    self.layers = torch.nn.Sequential(*self.layers)

    self.embeddings = torch.nn.Sequential(
        torch.nn.AdaptiveAvgPool2d(1),
        torch.nn.BatchNorm2d(768),
        torch.nn.Flatten(),
    )

    # Classification Layer
    self.cls = torch.nn.Linear(768, self.num_classes)

  def forward(self, x, return_feats=False):
    """
    What is return_feats? It essentially returns the second-to-last-layer
    features of a given image. It's a "feature encoding" of the input image,
    and you can use it for the verification task. You would use the outputs
    of the final classification layer for the classification task.

    You might also find that the classification outputs are sometimes better
    for verification too - try both.
    """
    out = self.layers(x)
    feats = self.embeddings(out)
    out = self.cls(feats)
    
    if return_feats:
      return feats
    else:
      return out 

model = Network()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model= torch.nn.DataParallel(model)
model.to(device)

[[4, 96, 3], [4, 192, 3], [4, 384, 9], [4, 768, 3]]
4


DataParallel(
  (module): Network(
    (layers): Sequential(
      (0): Sequential(
        (0): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicConvNeXtBlock(
        (conv_1): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (conv_2): Sequential(
          (0): Conv2d(96, 384, kernel_size=(1, 1), stride=(1, 1))
          (1): GELU(approximate=none)
        )
        (conv_3): Sequential(
          (0): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1))
        )
        (drop_path): StochasticDepth(p=0.1, mode=batch)
      )
      (2): BasicConvNeXtBlock(
        (conv_1): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): BatchNorm2d(9

# Setup everything for training

In [None]:
criterion = torch.nn.CrossEntropyLoss(label_smoothing=0.2) 
optimizer = torch.optim.SGD(model.parameters(), lr=config['lr'], momentum=0.9, weight_decay=1e-4)
# scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.9, patience=1,verbose=True)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=(len(train_loader) * config['epochs']))
# You can try ReduceLRonPlateau, StepLR, MultistepLR, CosineAnnealing, etc.
scaler = torch.cuda.amp.GradScaler() # Good news. We have FP16 (Mixed precision training) implemented for you
# It is useful only in the case of compatible GPUs such as T4/V100

# Let's train!

In [None]:
def train(model, dataloader, optimizer, criterion):
    
    model.train()

    # Progress Bar 
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, leave=False, position=0, desc='Train', ncols=5) 
    
    num_correct = 0
    total_loss = 0

    for i, (images, labels) in enumerate(dataloader):
        
        optimizer.zero_grad() # Zero gradients

        images, labels = images.to(device), labels.to(device)
        
        with torch.cuda.amp.autocast(): # This implements mixed precision. Thats it! 
            outputs = model(images)
            loss = criterion(outputs, labels)

        # Update no. of correct predictions & loss as we iterate
        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        # tqdm lets you add some details so you can monitor training as you train.
        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct,
            lr="{:.04f}".format(float(optimizer.param_groups[0]['lr'])))
        
        scaler.scale(loss).backward() # This is a replacement for loss.backward()
        scaler.step(optimizer) # This is a replacement for optimizer.step()
        scaler.update() 
        scheduler.step()
        # TODO? Depending on your choice of scheduler,
        #scheduler.step(total_loss)   ############################################################################# should it be here ???????
        # You may want to call some schdulers inside the train function. What are these?
      
        batch_bar.update() # Update tqdm bar
    #scheduler.step(total_loss)   ################################################################################## it should be here !!!!!!!
    batch_bar.close() # You need this to close the tqdm bar

    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))

    return acc, total_loss

In [None]:
def validate(model, dataloader, criterion):
  
    model.eval()
    batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Val', ncols=5)

    num_correct = 0.0
    total_loss = 0.0

    for i, (images, labels) in enumerate(dataloader):
        
        # Move images to device
        images, labels = images.to(device), labels.to(device)
        
        # Get model outputs
        with torch.inference_mode():
            outputs = model(images)
            loss = criterion(outputs, labels)

        num_correct += int((torch.argmax(outputs, axis=1) == labels).sum())
        total_loss += float(loss.item())

        batch_bar.set_postfix(
            acc="{:.04f}%".format(100 * num_correct / (config['batch_size']*(i + 1))),
            loss="{:.04f}".format(float(total_loss / (i + 1))),
            num_correct=num_correct)

        batch_bar.update()
        
    batch_bar.close()
    acc = 100 * num_correct / (config['batch_size']* len(dataloader))
    total_loss = float(total_loss / len(dataloader))
    #scheduler.step(total_loss)
    return acc, total_loss

In [None]:
gc.collect() # These commands help you when you face CUDA OOM error
torch.cuda.empty_cache()



# Wandb

In [None]:
wandb.login(key="2178c9f0d96e90016c3d36bcccb07de5e0c51edc") #API Key is in your wandb account, under settings (wandb.ai/settings)

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
# # Create your wandb run
# run = wandb.init(
#     name = "early-submission", ## Wandb creates random run names if you skip this field
#     reinit = True, ### Allows reinitalizing runs when you re-run this cell
#     # run_id = ### Insert specific run id here if you want to resume a previous run
#     # resume = "must" ### You need this to resume previous runs, but comment out reinit = True when using this
#     project = "hw2p2-ablations-mid", ### Project should be created in your wandb account 
#     config = config ### Wandb Config for your run
# )

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016669057766678937, max=1.0…

[34m[1mwandb[0m: [32m[41mERROR[0m Error communicating with wandb process
[34m[1mwandb[0m: [32m[41mERROR[0m For more info see: https://docs.wandb.ai/library/init#init-start-error


Problem at: <ipython-input-88-083a376a9176> 8 <module>


UsageError: ignored

# Experiments

In [None]:
best_valacc = 0.0

for epoch in range(config['epochs']):

    curr_lr = float(optimizer.param_groups[0]['lr'])

    train_acc, train_loss = train(model, train_loader, optimizer, criterion)
    
    print("\nEpoch {}/{}: \nTrain Acc {:.04f}%\t Train Loss {:.04f}\t Learning Rate {:.04f}".format(
        epoch + 1,
        config['epochs'],
        #10,
        train_acc,
        train_loss,
        curr_lr))
    
    val_acc, val_loss = validate(model, val_loader, criterion)
    #scheduler.step(val_loss)
    print("Val Acc {:.04f}%\t Val Loss {:.04f}".format(val_acc, val_loss))

    # wandb.log({"train_loss":train_loss, 'train_Acc': train_acc, 'validation_Acc':val_acc, 
    #           'validation_loss': val_loss, "learning_Rate": curr_lr})
    
    # If you are using a scheduler in your train function within your iteration loop, you may want to log
    # your learning rate differently 

    # #Save model in drive location if val_acc is better than best recorded val_acc
    if (val_acc >= best_valacc):
      #path = os.path.join(root, model_directory, 'checkpoint' + '.pth')
      print("Saving model")
      torch.save({'model_state_dict':model.state_dict(),
                  'optimizer_state_dict':optimizer.state_dict(),
                  'scheduler_state_dict':scheduler.state_dict(),
                  'val_acc': val_acc, 
                  'epoch': epoch}, './checkpoint.pth')
      best_valacc = val_acc
      # wandb.save('checkpoint.pth')
      # You may find it interesting to exlplore Wandb Artifcats to version your models
run.finish()




Epoch 1/85: 
Train Acc 0.0557%	 Train Loss 8.7686	 Learning Rate 0.1000




Val Acc 0.1600%	 Val Loss 8.4159
Saving model





Epoch 2/85: 
Train Acc 0.5227%	 Train Loss 8.1969	 Learning Rate 0.1000




Val Acc 1.6453%	 Val Loss 7.7482
Saving model





Epoch 3/85: 
Train Acc 3.1093%	 Train Loss 7.4953	 Learning Rate 0.0999




Val Acc 6.0672%	 Val Loss 7.1168
Saving model





Epoch 4/85: 
Train Acc 10.8211%	 Train Loss 6.7336	 Learning Rate 0.0997




Val Acc 18.7214%	 Val Loss 6.2719
Saving model





Epoch 5/85: 
Train Acc 23.5289%	 Train Loss 6.0096	 Learning Rate 0.0995




Val Acc 33.1753%	 Val Loss 5.5861
Saving model





Epoch 6/85: 
Train Acc 37.8699%	 Train Loss 5.3775	 Learning Rate 0.0991




Val Acc 43.2016%	 Val Loss 5.2370
Saving model





Epoch 7/85: 
Train Acc 49.3173%	 Train Loss 4.8969	 Learning Rate 0.0988




Val Acc 51.0255%	 Val Loss 4.8582
Saving model





Epoch 8/85: 
Train Acc 57.6418%	 Train Loss 4.5529	 Learning Rate 0.0983




Val Acc 57.1955%	 Val Loss 4.6079
Saving model





Epoch 9/85: 
Train Acc 63.9075%	 Train Loss 4.3025	 Learning Rate 0.0978




Val Acc 62.0915%	 Val Loss 4.4071
Saving model





Epoch 10/85: 
Train Acc 68.4501%	 Train Loss 4.1131	 Learning Rate 0.0973




Val Acc 65.5250%	 Val Loss 4.3223
Saving model





Epoch 11/85: 
Train Acc 71.8864%	 Train Loss 3.9625	 Learning Rate 0.0966




Val Acc 66.7447%	 Val Loss 4.2059
Saving model





Epoch 12/85: 
Train Acc 74.6708%	 Train Loss 3.8423	 Learning Rate 0.0959




Val Acc 68.7414%	 Val Loss 4.1693
Saving model





Epoch 13/85: 
Train Acc 77.3445%	 Train Loss 3.7257	 Learning Rate 0.0952




Val Acc 71.3351%	 Val Loss 4.0631
Saving model





Epoch 14/85: 
Train Acc 79.4461%	 Train Loss 3.6343	 Learning Rate 0.0943




Val Acc 71.3437%	 Val Loss 4.0523
Saving model





Epoch 15/85: 
Train Acc 80.7137%	 Train Loss 3.5690	 Learning Rate 0.0935




Val Acc 71.2866%	 Val Loss 4.0389





Epoch 16/85: 
Train Acc 82.2483%	 Train Loss 3.5011	 Learning Rate 0.0925




Val Acc 72.5377%	 Val Loss 3.9682
Saving model





Epoch 17/85: 
Train Acc 83.3581%	 Train Loss 3.4484	 Learning Rate 0.0915




Val Acc 72.8291%	 Val Loss 3.9638
Saving model





Epoch 18/85: 
Train Acc 84.5742%	 Train Loss 3.3956	 Learning Rate 0.0905




Val Acc 74.1802%	 Val Loss 3.9304
Saving model





Epoch 19/85: 
Train Acc 85.4226%	 Train Loss 3.3494	 Learning Rate 0.0893




Val Acc 74.6258%	 Val Loss 3.9332
Saving model





Epoch 20/85: 
Train Acc 86.9744%	 Train Loss 3.2909	 Learning Rate 0.0882




Val Acc 73.4146%	 Val Loss 3.9396





Epoch 21/85: 
Train Acc 87.4236%	 Train Loss 3.2629	 Learning Rate 0.0870




Val Acc 73.8431%	 Val Loss 3.9325





Epoch 22/85: 
Train Acc 88.1034%	 Train Loss 3.2354	 Learning Rate 0.0857




Val Acc 75.2628%	 Val Loss 3.8686
Saving model





Epoch 23/85: 
Train Acc 89.5910%	 Train Loss 3.1740	 Learning Rate 0.0844




Val Acc 75.5285%	 Val Loss 3.8848
Saving model





Epoch 24/85: 
Train Acc 89.9944%	 Train Loss 3.1567	 Learning Rate 0.0830




Val Acc 75.7684%	 Val Loss 3.8785
Saving model





Epoch 25/85: 
Train Acc 90.8414%	 Train Loss 3.1223	 Learning Rate 0.0816




Val Acc 76.4768%	 Val Loss 3.8836
Saving model





Epoch 26/85: 
Train Acc 91.6169%	 Train Loss 3.0907	 Learning Rate 0.0801




Val Acc 76.0283%	 Val Loss 3.8630





Epoch 27/85: 
Train Acc 91.8733%	 Train Loss 3.0763	 Learning Rate 0.0786




Val Acc 76.3854%	 Val Loss 3.8245





Epoch 28/85: 
Train Acc 92.5181%	 Train Loss 3.0440	 Learning Rate 0.0771




Val Acc 76.3397%	 Val Loss 3.8375





Epoch 29/85: 
Train Acc 93.1801%	 Train Loss 3.0176	 Learning Rate 0.0755




Val Acc 77.9793%	 Val Loss 3.7798
Saving model





Epoch 30/85: 
Train Acc 93.3672%	 Train Loss 3.0091	 Learning Rate 0.0739




Val Acc 77.2252%	 Val Loss 3.8026





Epoch 31/85: 
Train Acc 93.8907%	 Train Loss 2.9878	 Learning Rate 0.0723




Val Acc 77.1452%	 Val Loss 3.7873





Epoch 32/85: 
Train Acc 94.5605%	 Train Loss 2.9543	 Learning Rate 0.0706




Val Acc 79.2019%	 Val Loss 3.7299
Saving model





Epoch 33/85: 
Train Acc 95.0704%	 Train Loss 2.9304	 Learning Rate 0.0689




Val Acc 76.9367%	 Val Loss 3.8138





Epoch 34/85: 
Train Acc 95.1754%	 Train Loss 2.9217	 Learning Rate 0.0672




Val Acc 77.9479%	 Val Loss 3.7886





Epoch 35/85: 
Train Acc 95.2318%	 Train Loss 2.9168	 Learning Rate 0.0655




Val Acc 77.2481%	 Val Loss 3.8190





Epoch 36/85: 
Train Acc 95.7260%	 Train Loss 2.8975	 Learning Rate 0.0637




Val Acc 79.1562%	 Val Loss 3.7325





Epoch 37/85: 
Train Acc 96.4158%	 Train Loss 2.8619	 Learning Rate 0.0619




Val Acc 78.4221%	 Val Loss 3.7465





Epoch 38/85: 
Train Acc 96.8007%	 Train Loss 2.8415	 Learning Rate 0.0601




Val Acc 80.3445%	 Val Loss 3.6584
Saving model





Epoch 39/85: 
Train Acc 96.7922%	 Train Loss 2.8351	 Learning Rate 0.0583




Val Acc 80.2959%	 Val Loss 3.6791





Epoch 40/85: 
Train Acc 97.1014%	 Train Loss 2.8260	 Learning Rate 0.0564




Val Acc 80.8844%	 Val Loss 3.6632
Saving model





Epoch 41/85: 
Train Acc 97.4113%	 Train Loss 2.8003	 Learning Rate 0.0546




Val Acc 81.8441%	 Val Loss 3.6145
Saving model





Epoch 42/85: 
Train Acc 97.6770%	 Train Loss 2.7831	 Learning Rate 0.0528




Val Acc 81.5642%	 Val Loss 3.6615





Epoch 43/85: 
Train Acc 97.9055%	 Train Loss 2.7672	 Learning Rate 0.0509




Val Acc 82.4297%	 Val Loss 3.5882
Saving model





Epoch 44/85: 
Train Acc 98.0876%	 Train Loss 2.7576	 Learning Rate 0.0491




Val Acc 82.0755%	 Val Loss 3.6370





Epoch 45/85: 
Train Acc 98.2325%	 Train Loss 2.7432	 Learning Rate 0.0472




Val Acc 81.0500%	 Val Loss 3.7203





Epoch 46/85: 
Train Acc 98.4096%	 Train Loss 2.7284	 Learning Rate 0.0454




Val Acc 83.4295%	 Val Loss 3.5857
Saving model





Epoch 47/85: 
Train Acc 98.7331%	 Train Loss 2.7035	 Learning Rate 0.0436




Val Acc 83.0724%	 Val Loss 3.6638





Epoch 48/85: 
Train Acc 98.8260%	 Train Loss 2.6886	 Learning Rate 0.0417




Val Acc 82.2298%	 Val Loss 3.6020





Epoch 49/85: 
Train Acc 99.0245%	 Train Loss 2.6711	 Learning Rate 0.0399




Val Acc 83.6894%	 Val Loss 3.5249
Saving model





Epoch 50/85: 
Train Acc 99.1795%	 Train Loss 2.6571	 Learning Rate 0.0381




Val Acc 84.4121%	 Val Loss 3.5890
Saving model





Epoch 51/85: 
Train Acc 99.1723%	 Train Loss 2.6491	 Learning Rate 0.0363




Val Acc 83.3181%	 Val Loss 3.5594





Epoch 52/85: 
Train Acc 99.3009%	 Train Loss 2.6347	 Learning Rate 0.0345




Val Acc 83.8980%	 Val Loss 3.4950





Epoch 53/85: 
Train Acc 99.3844%	 Train Loss 2.6208	 Learning Rate 0.0328




Val Acc 85.6033%	 Val Loss 3.4668
Saving model





Epoch 54/85: 
Train Acc 99.4930%	 Train Loss 2.6060	 Learning Rate 0.0311




Val Acc 85.6947%	 Val Loss 3.4575
Saving model





Epoch 55/85: 
Train Acc 99.5130%	 Train Loss 2.5964	 Learning Rate 0.0294




Val Acc 85.8889%	 Val Loss 3.4836
Saving model





Epoch 56/85: 
Train Acc 99.4844%	 Train Loss 2.5900	 Learning Rate 0.0277




Val Acc 86.9116%	 Val Loss 3.4182
Saving model





Epoch 57/85: 
Train Acc 99.6908%	 Train Loss 2.5624	 Learning Rate 0.0261




Val Acc 86.4174%	 Val Loss 3.5889





Epoch 58/85: 
Train Acc 99.7458%	 Train Loss 2.5521	 Learning Rate 0.0245




Val Acc 86.2403%	 Val Loss 3.5790





Epoch 59/85: 
Train Acc 99.7151%	 Train Loss 2.5479	 Learning Rate 0.0229




Val Acc 86.8087%	 Val Loss 3.4424





Epoch 60/85: 
Train Acc 99.7579%	 Train Loss 2.5340	 Learning Rate 0.0214




Val Acc 87.6228%	 Val Loss 3.3833
Saving model





Epoch 61/85: 
Train Acc 99.8372%	 Train Loss 2.5162	 Learning Rate 0.0199




Val Acc 87.7657%	 Val Loss 3.3776
Saving model





Epoch 62/85: 
Train Acc 99.8329%	 Train Loss 2.5078	 Learning Rate 0.0184




Val Acc 87.9399%	 Val Loss 3.3050
Saving model





Epoch 63/85: 
Train Acc 99.8572%	 Train Loss 2.4983	 Learning Rate 0.0170




Val Acc 87.9256%	 Val Loss 3.3555





Epoch 64/85: 
Train Acc 99.8736%	 Train Loss 2.4852	 Learning Rate 0.0156




Val Acc 88.3798%	 Val Loss 3.4016
Saving model





Epoch 65/85: 
Train Acc 99.8807%	 Train Loss 2.4791	 Learning Rate 0.0143




Val Acc 88.3655%	 Val Loss 3.4162





Epoch 66/85: 
Train Acc 99.8986%	 Train Loss 2.4722	 Learning Rate 0.0130




Val Acc 89.1853%	 Val Loss 3.3963
Saving model





Epoch 67/85: 
Train Acc 99.8915%	 Train Loss 2.4645	 Learning Rate 0.0118




Val Acc 88.5426%	 Val Loss 3.3730





Epoch 68/85: 
Train Acc 99.8915%	 Train Loss 2.4596	 Learning Rate 0.0107




Val Acc 89.1968%	 Val Loss 3.2657
Saving model





Epoch 69/85: 
Train Acc 99.9150%	 Train Loss 2.4512	 Learning Rate 0.0095




Val Acc 89.6081%	 Val Loss 3.2771
Saving model





Epoch 70/85: 
Train Acc 99.9179%	 Train Loss 2.4443	 Learning Rate 0.0085




Val Acc 89.2339%	 Val Loss 3.4945





Epoch 71/85: 
Train Acc 99.9207%	 Train Loss 2.4406	 Learning Rate 0.0075




Val Acc 89.7052%	 Val Loss 3.3680
Saving model





Epoch 72/85: 
Train Acc 99.9022%	 Train Loss 2.4330	 Learning Rate 0.0065




Val Acc 89.6681%	 Val Loss 3.2729





Epoch 73/85: 
Train Acc 99.9443%	 Train Loss 2.4263	 Learning Rate 0.0057




Val Acc 89.9166%	 Val Loss 3.2779
Saving model





Epoch 74/85: 
Train Acc 99.9293%	 Train Loss 2.4226	 Learning Rate 0.0048




Val Acc 90.2165%	 Val Loss 3.2301
Saving model





Epoch 75/85: 
Train Acc 99.9429%	 Train Loss 2.4202	 Learning Rate 0.0041




Val Acc 90.1423%	 Val Loss 3.2809





Epoch 76/85: 
Train Acc 99.9364%	 Train Loss 2.4225	 Learning Rate 0.0034




Val Acc 90.5450%	 Val Loss 3.3281
Saving model





Epoch 77/85: 
Train Acc 99.9593%	 Train Loss 2.4125	 Learning Rate 0.0027




Val Acc 90.5336%	 Val Loss 3.2894





Epoch 78/85: 
Train Acc 99.9579%	 Train Loss 2.4081	 Learning Rate 0.0022




Val Acc 90.3708%	 Val Loss 3.2845





Epoch 79/85: 
Train Acc 99.9264%	 Train Loss 2.4147	 Learning Rate 0.0017




Val Acc 90.2251%	 Val Loss 3.2315





Epoch 80/85: 
Train Acc 99.8822%	 Train Loss 2.4123	 Learning Rate 0.0012




Val Acc 90.3993%	 Val Loss 3.2295





Epoch 81/85: 
Train Acc 99.9629%	 Train Loss 2.4064	 Learning Rate 0.0009




Val Acc 90.4593%	 Val Loss 3.2342





Epoch 82/85: 
Train Acc 99.9393%	 Train Loss 2.4075	 Learning Rate 0.0005




Val Acc 90.6564%	 Val Loss 3.2267
Saving model





Epoch 83/85: 
Train Acc 99.9407%	 Train Loss 2.4061	 Learning Rate 0.0003




Val Acc 90.5764%	 Val Loss 3.3279





Epoch 84/85: 
Train Acc 99.9229%	 Train Loss 2.4059	 Learning Rate 0.0001




Val Acc 90.5536%	 Val Loss 3.2220





Epoch 85/85: 
Train Acc 99.9493%	 Train Loss 2.4068	 Learning Rate 0.0000


                                                                                                      

Val Acc 90.6021%	 Val Loss 3.2390




# Classification Task: Testing

In [None]:
def test(model,dataloader):

  model.eval()
  batch_bar = tqdm(total=len(dataloader), dynamic_ncols=True, position=0, leave=False, desc='Test')
  test_results = []
  
  for i, (images) in enumerate(dataloader):
      # TODO: Finish predicting on the test set.
      images = images.to(device)

      with torch.inference_mode():
        outputs = model(images)

      outputs = torch.argmax(outputs, axis=1).detach().cpu().numpy().tolist()
      test_results.extend(outputs)
      
      batch_bar.update()
      
  batch_bar.close()
  return test_results

In [None]:
test_results = test(model, test_loader)



## Generate csv to submit to Kaggle

In [None]:
with open("classification_submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(test_dataset)):
        f.write("{},{}\n".format(str(i).zfill(6) + ".jpg", test_results[i]))

In [None]:
!kaggle competitions submit -c 11-785-f22-hw2p2-classification-slack -f ./classification_submission.csv -m "Low-Cutoff Submission"

100% 541k/541k [00:02<00:00, 241kB/s]
Successfully submitted to Face Recognition (Slack)

# Verification Task: Validation

The verification task consists of the following generalized scenario:
- You are given X unknown identitites 
- You are given Y known identitites
- Your goal is to match X unknown identities to Y known identities.

We have given you a verification dataset, that consists of 1000 known identities, and 1000 unknown identities. The 1000 unknown identities are split into dev (200) and test (800). Your goal is to compare the unknown identities to the 1000 known identities and assign an identity to each image from the set of unknown identities. 

Your will use/finetune your model trained for classification to compare images between known and unknown identities using a similarity metric and assign labels to the unknown identities. 

This will judge your model's performance in terms of the quality of embeddings/features it generates on images/faces it has never seen during training for classification.

In [None]:
known_regex = "/content/data/verification/known/*/*"
known_paths = [i.split('/')[-2] for i in sorted(glob.glob(known_regex))] 
# This obtains the list of known identities from the known folder

### Validation Directory
unknown_regex = "/content/data/verification/unknown_dev/*"

### Test Directory
#unknown_regex = "/content/data/verification/unknown_test/*"

# We load the images from known and unknown folders
unknown_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_regex)))]
known_images = [Image.open(p) for p in tqdm(sorted(glob.glob(known_regex)))]

print (len(unknown_images))
print (len(known_images))
# Why do you need only ToTensor() here?
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()])

unknown_images = torch.stack([transforms(x) for x in unknown_images])
known_images  = torch.stack([transforms(y) for y in known_images ])
#Print your shapes here to understand what we have done
print (unknown_images.size())
print (known_images.size())
# You can use other similarity metrics like Euclidean Distance if you wish
similarity_metric = torch.nn.CosineSimilarity(dim= 1, eps= 1e-6) 

100%|██████████| 200/200 [00:00<00:00, 5208.02it/s]
100%|██████████| 1000/1000 [00:00<00:00, 2551.61it/s]


200
1000
torch.Size([200, 3, 224, 224])
torch.Size([1000, 3, 224, 224])


In [None]:
def eval_verification(unknown_images, known_images, model, similarity, batch_size= config['batch_size'], mode='val'): 

    unknown_feats, known_feats = [], []

    batch_bar = tqdm(total=len(unknown_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    model.eval()

    # We load the images as batches for memory optimization and avoiding CUDA OOM errors
    for i in range(0, unknown_images.shape[0], batch_size):
        unknown_batch = unknown_images[i:i+batch_size] # Slice a given portion upto batch_size
        
        with torch.no_grad():
            unknown_feat = model(unknown_batch.float().to(device), return_feats=True) #Get features from model         
        unknown_feats.append(unknown_feat)
        batch_bar.update()
    
    batch_bar.close()
    
    batch_bar = tqdm(total=len(known_images)//batch_size, dynamic_ncols=True, position=0, leave=False, desc=mode)
    
    for i in range(0, known_images.shape[0], batch_size):
        known_batch = known_images[i:i+batch_size] 
        with torch.no_grad():
              known_feat = model(known_batch.float().to(device), return_feats=True)
          
        known_feats.append(known_feat)
        batch_bar.update()

    batch_bar.close()

    # Concatenate all the batches
    unknown_feats = torch.cat(unknown_feats, dim=0)
    known_feats = torch.cat(known_feats, dim=0)

    similarity_values = torch.stack([similarity(unknown_feats, known_feature) for known_feature in known_feats])
    # Print the inner list comprehension in a separate cell - what is really happening?
    print(f"similarity_values: {len(similarity_values)}")

    predictions = similarity_values.argmax(0).cpu().numpy() #Why are we doing an argmax here?
    print(f"predictions number: {len(predictions)}")
    # Map argmax indices to identity strings
    pred_id_strings = [known_paths[i] for i in predictions]
    
    if mode == 'val':
      true_ids = pd.read_csv('/content/data/verification/dev_identities.csv')['label'].tolist()
      print(f"true_ids: {len(true_ids)}")
      print(f"pred_id_strings: {len(pred_id_strings)}")
      accuracy = accuracy_score(pred_id_strings, true_ids)
      print("Verification Accuracy = {}".format(accuracy))
    
    return pred_id_strings

In [None]:
# If we are validating
pred_id_strings = eval_verification(unknown_images, known_images, model, similarity_metric, config['batch_size'], mode='val')



similarity_values: 1000
predictions number: 200
true_ids: 200
pred_id_strings: 200
Verification Accuracy = 0.67


In [None]:
known_regex = "/content/data/verification/known/*/*"
known_paths = [i.split('/')[-2] for i in sorted(glob.glob(known_regex))] 
# This obtains the list of known identities from the known folder

### Validation Directory
#unknown_regex = "/content/data/verification/unknown_dev/*"

### Test Directory
unknown_regex = "/content/data/verification/unknown_test/*"

# We load the images from known and unknown folders
unknown_images = [Image.open(p) for p in tqdm(sorted(glob.glob(unknown_regex)))]
known_images = [Image.open(p) for p in tqdm(sorted(glob.glob(known_regex)))]

print (len(unknown_images))
print (len(known_images))
# Why do you need only ToTensor() here?
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()])

unknown_images = torch.stack([transforms(x) for x in unknown_images])
known_images  = torch.stack([transforms(y) for y in known_images ])
#Print your shapes here to understand what we have done
print (unknown_images.size())
print (known_images.size())
# You can use other similarity metrics like Euclidean Distance if you wish
similarity_metric = torch.nn.CosineSimilarity(dim= 1, eps= 1e-6) 

100%|██████████| 800/800 [00:00<00:00, 2657.22it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5128.48it/s]


800
1000
torch.Size([800, 3, 224, 224])
torch.Size([1000, 3, 224, 224])


In [None]:
# If we are testing
pred_id_strings = eval_verification(unknown_images, known_images, model, similarity_metric, config['batch_size'], mode='test')



similarity_values: 1000
predictions number: 800


In [None]:
with open("verification_submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(pred_id_strings)):
        f.write("{},{}\n".format(i, pred_id_strings[i]))

In [None]:
!kaggle competitions submit -c 11-785-f22-hw2p2-verification-slack -f ./verification_submission.csv -m "Low-Cutoff Submission"

100% 9.28k/9.28k [00:02<00:00, 4.55kB/s]
Successfully submitted to Face Verification (Slack)