## Student Name: Oguzhan Aksoy

# Objective

In the scope of this assignment, your task is to train a neural network model for face recognition.

At the test time, we will provide your model with pairs of images. The images inside every pair might belong to the same person (label 0) or to the different ones (label 1). We will calculate cosine distance between the predicted descriptors (by your model) for every pair and use this distance as a score in the label classification task. The metric is TPR@FPR=0.05.

# Scoring Criteria
(max 15 points, 20% of the final grade)


You will be awarded up to 15 points based on the performance of your model on the test dataset.

- If your model outperforms the baseline (gradescope submission named "BASELINE"), you will be scored 15.
- If your model's score X satisfies 0.5 * Y <= X <= Y, where Y is the baseline score, you will be scored (X / (0.5 * Y) - 1) * 15.
- If your model's score is below 50% of the baseline one, you will be scored 0.

Notes:
- The primary metric for scoring your models is TPR@FPR=0.05. We also print AUROC.
- Be aware that the platform automatically performs a pretty smart code similarity comparison. Share your solutions on your own risk of losing homework scores. We will discount the scores for the parties with the high code similarity by the factor of N, where N is the number of party members.
- Any external sources or code used should be properly cited.
- You can use generic pretrained (e.g., ImageNet-pretrained) neural network models, but cannot use LFW and other faces datasets-pretrained models.
- If you (apparently) find any security leaks, exploit them at your own risk of losing homework scores. We set your Homework 2 score to 0 if we detect any sign of security leaks abuse.
- Do not name your submission "BASELINE".

# Submission Guidelines
Submit the classifier.py file along with any other necessary files for model loading.

- Name your script file classifier.py containing a class named Classifier. Within the __init__ method of this class, initialize the neural network; no arguments will be passed (we will call it as clf = Classifier()).
- The class should have a method named load_model which also has no arguments and it loads the weights of your model from the file(-s) you have uploaded. Example: you uploaded model.pth along with classifier.py, your code can access this file at "/autograder/submission/model.pth". (Next, we will call clf.load_model().) Besides, if you're using PyTorch, do not forget to call self.model.eval() after loading model.
- The class should have a method named calculate_descriptors. This method should take raw images (numpy arrays of shape (112, 112, 3)) as input and return the corresponding descriptors tensors (numpy array of shape (N, )). Here, N is the size of your descriptors, can be arbitrary; but we recommend to keep it small (e.g., 256 or 512) so that your submission will not run of RAM (6 GB).

Upload (submit) your files to Homework 2 of our [MA030348 course](https://www.gradescope.com/courses/646549).

You can find an example of the correct classifier.py structure (PyTorch implementation) in the course files: link.

# Data


Data
To train the algorithm, we'll provide a train set [here](https://drive.google.com/file/d/1lHYhtP-84qrwHM7f3dDXFyYQPuIfRmqt/view) (approx. 860 MB). The train folder contains *.jpg images and labels.json file with a python dict with a correspondence "image ID: person ID".

To test the algorithm, we'll use a different test set of images (classes are the same).

We read all images via `skimage.io.imread` !!!

#The platform specifications

You are provided with the following resources on Gradescope: 40 minutes of processing time, 4 CPU cores, 6 GB RAM.

Platform provides you with python3.11 environment (activated by default) and preinstalled pip packages we need to solve the problem: `numpy`, `scipy`, `scikit-image`, `scikit-learn`, `opencv`, `torch`, `tensorflow`, `torchvision`, `keras`, `imutils`, `timm`, `gdown`.

All additional files (e.g. model weights), that you have submitted along with scripts, could be found here: "/autograder/submission/".

If your model weights are larger than 100 MB, you won't be able to use direct file uploading. In this case, you can try upload your model to Google disk and then download it inside the `load_model` function via `gdown`.

# Solution

In [1]:
%pip install pytorch-metric-learning
%pip install faiss-gpu

Collecting pytorch-metric-learning
  Downloading pytorch_metric_learning-2.3.0-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pytorch-metric-learning
Successfully installed pytorch-metric-learning-2.3.0
Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import os
print(os.listdir())
if not "train" in os.listdir():
  os.system("unzip /content/drive/MyDrive/train.zip -d /content")
  print("Unzipped 'train.zip'")
else:
  print("'train' is already unzipped and in the directory")

['.config', 'drive', 'sample_data']
Unzipped 'train.zip'


In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

### MNIST code originally from https://github.com/pytorch/examples/blob/master/mnist/main.py ###
from torchvision import datasets, transforms

from pytorch_metric_learning import distances, losses, miners, reducers, testers
from pytorch_metric_learning.utils.accuracy_calculator import AccuracyCalculator


In [5]:
# MODEL FROM SEMINAR
### MNIST code originally from https://github.com/pytorch/examples/blob/master/mnist/main.py ###
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(9216, 128)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        return x

In [6]:
### MNIST code originally from https://github.com/pytorch/examples/blob/master/mnist/main.py ###
def train(model, loss_func, mining_func, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, labels) in enumerate(train_loader):
        data, labels = data.to(device), labels.to(device)
        optimizer.zero_grad()
        embeddings = model(data)
        indices_tuple = mining_func(embeddings, labels)
        loss = loss_func(embeddings, labels, indices_tuple)
        loss.backward()
        optimizer.step()
        if batch_idx % 20 == 0:
            print(
                "Epoch {} Iteration {}: Loss = {}, Number of mined triplets = {}".format(
                    epoch, batch_idx, loss, mining_func.num_triplets
                )
            )


### convenient function from pytorch-metric-learning ###
def get_all_embeddings(dataset, model):
    tester = testers.BaseTester()
    return tester.get_all_embeddings(dataset, model)


### compute accuracy using AccuracyCalculator from pytorch-metric-learning ###
def test(train_set, test_set, model, accuracy_calculator):
    train_embeddings, train_labels = get_all_embeddings(train_set, model)
    test_embeddings, test_labels = get_all_embeddings(test_set, model)
    train_labels = train_labels.squeeze(1)
    test_labels = test_labels.squeeze(1)
    print("Computing accuracy")
    accuracies = accuracy_calculator.get_accuracy(
        test_embeddings, test_labels, train_embeddings, train_labels, False
    )
    print("Test set accuracy (Precision@1) = {}".format(accuracies["precision_at_1"]))


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

batch_size = 128

In [7]:
import json
import numpy as np
from sklearn.model_selection import train_test_split
from torchvision import datasets, transforms, models
from torch.utils.data import Dataset, DataLoader
import torch
from skimage import io

with open('/content/train/labels.json', 'r') as file:
    labels = json.load(file)

train_paths = [f'/content/train/{path}.jpg' for path in labels.keys()]
train_labels = list(labels.values())

class CustomImageDataset(Dataset):
    def __init__(self, image_paths, labels, transform=None):
        self.image_paths = image_paths
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image_path = self.image_paths[idx]
        image = io.imread(image_path)
        label = self.labels[idx]
        if self.transform:
            image = self.transform(image)
        return image, label

train_dataset = CustomImageDataset(train_paths, train_labels, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=2, pin_memory=True)

from torchvision.models import ResNet18_Weights

model = models.resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
model.fc = nn.Linear(model.fc.in_features, 10752)
optimizer = optim.Adam(model.parameters(), lr=0.01)
num_epochs = 3
print(f"There are {torch.cuda.device_count()} available GPUs.")

if torch.cuda.device_count() > 1:
    model = torch.nn.DataParallel(model)

model = model.to(device)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 118MB/s]


There are 1 available GPUs.


In [8]:
from tqdm.autonotebook import tqdm
import shutil

### pytorch-metric-learning stuff ###
distance = distances.CosineSimilarity()
reducer = reducers.ThresholdReducer(low=0)
loss_func = losses.TripletMarginLoss(margin=0.2, distance=distance, reducer=reducer)
mining_func = miners.TripletMarginMiner(
    margin=0.2, distance=distance, type_of_triplets="semihard"
)
accuracy_calculator = AccuracyCalculator(include=("precision_at_1",), k=1)
### pytorch-metric-learning stuff ###


for epoch in tqdm(range(1, num_epochs + 1)):
    train(model, loss_func, mining_func, device, train_loader, optimizer, epoch)
    model_filename = f'model{epoch}.pth'
    torch.save(model.state_dict(), model_filename)
    shutil.copy(f'/content/model{epoch}.pth', f'/content/drive/MyDrive/model{epoch}.pth')

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch 1 Iteration 0: Loss = 0.14824602007865906, Number of mined triplets = 318
Epoch 1 Iteration 20: Loss = 0.0, Number of mined triplets = 0
Epoch 1 Iteration 40: Loss = 0.08616504818201065, Number of mined triplets = 25
Epoch 1 Iteration 60: Loss = 0.0, Number of mined triplets = 0
Epoch 1 Iteration 80: Loss = 0.0, Number of mined triplets = 0
Epoch 1 Iteration 100: Loss = 0.13665682077407837, Number of mined triplets = 97
Epoch 1 Iteration 120: Loss = 0.13404269516468048, Number of mined triplets = 92
Epoch 1 Iteration 140: Loss = 0.12343066930770874, Number of mined triplets = 13
Epoch 1 Iteration 160: Loss = 0.12422560155391693, Number of mined triplets = 24
Epoch 1 Iteration 180: Loss = 0.10485595464706421, Number of mined triplets = 130
Epoch 1 Iteration 200: Loss = 0.0875491201877594, Number of mined triplets = 19
Epoch 1 Iteration 220: Loss = 0.12601107358932495, Number of mined triplets = 141
Epoch 1 Iteration 240: Loss = 0.17533119022846222, Number of mined triplets = 159
E