# Pairwise Binary classification with Deep Learning

In this notebook we will train Multilayer Perceptron Neural Network to detect whether two images are from same class (car brand) or not. To train Perceptron, we will use image embeddings got from trained CNN model (SOTA models such as MobileNet).

# Set up

## Packages and requirements

In [5]:
# Major builtin libraries
import os
import gc
import time
import random
import typing as t
from copy import deepcopy
from collections import defaultdict

In [6]:
import warnings  # If you want to disable warnings
warnings.filterwarnings("ignore")

# For descriptive error messages
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [7]:
# Classic packages for data manipulation and visualization
import numpy as np
import pandas as pd
import polars as pl
import matplotlib.pyplot as plt

In [8]:
# Basic PyTorch
import torch
import torch.nn as nn
import torch.optim as optim  # Optimization algorithms and dynamic learning rate adjusting
import torch.nn.functional as F
# from torch.nn.modules.loss import _Loss  # For writing a custom Loss function
from torch.utils.data import DataLoader, Dataset  # For custom data presentation

In [9]:
# Utils
import joblib  # Pipelining, pickling (dump/load), parallel processing
from tqdm import tqdm  # Progress bar for training process
from tempfile import TemporaryDirectory

# Classic ML tools
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold  # Cross-Validation

In [10]:
# ML Metrics
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from torchmetrics.classification import MulticlassF1Score, F1Score # F1 metric for multiclass

In [11]:
# Torch Computer Vision tools for images processing
from torchvision.io import read_image
from torchvision.transforms.functional import to_pil_image, to_grayscale, to_tensor
from torchvision import models  # Pretrained models

In [12]:
# Albumentations is an OS library for augmentations
import albumentations as A
from albumentations.pytorch import ToTensorV2
# import torchvision.transforms as T  # We can use torch augmentations instead

In [13]:
# Output text colorizing
from colorama import Back, Style

def print_highlighted(text: str, bgcolor=Back.YELLOW) -> None:
    """
    Function to print a text with colored background.
    """
    print(bgcolor + text + Style.RESET_ALL)

In [14]:
import wandb # MLOps platform to simplify and speed up the process of building ML models

In [15]:
wandb.login() # We log in via pop-up,
# wandb.login(key=api_key)  # but you can also log in manually with function args

[34m[1mwandb[0m: Currently logged in as: [33mremainedmind[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

## Configuration

In [16]:
CONFIG = {
    "seed": 2306,
    # "epochs": 20,
    "image_dimension": 256,  # Depends on pretrained model used
    "model_name": "SiamesePerceptron",  # Pretrained model we will use
    "embedding_size": 512,  # Embedding output size
    "train_batch_size": 200,
    "val_batch_size": 400,
    "learning_rate": 1e-3,
    "min_lr": 1e-8,
    "min_loss_delta": 1e-7, # To stop training on plateau
    "weight_decay": 1e-7,

}

In [17]:
wandb_run = wandb.init(project="cars-classification-project", config=CONFIG)

In [18]:
config = wandb.config
del CONFIG

Set Seed for Reproducibility

In [19]:
def set_seed(seed=42):
    """
    Sets the seed of the entire notebook so results are the same every time we run.
    This is for REPRODUCIBILITY.
    """
    np.random.seed(seed)
    torch.manual_seed(seed)

    # When running on the CuDNN backend, two further options must be set
    # torch.backends.cudnn.deterministic = True
    # torch.backends.cudnn.benchmark = False  # When False, this option makes CUDA reproducible, BUT the performance might suffer

    # Set a fixed value for the hash seed
    os.environ['PYTHONHASHSEED'] = str(seed)

set_seed(seed=config.seed)

# Data

For our MLP we will use image embeddings - output from CNN models. We already got this data before, so we just load it.

## Set data location

In [20]:
# config.repo = 'car-brands/'  # dataset name on Kaggle
config.repo = 'data/'  # dataset name on local device
# config.repo = 'car_brand_detection/'  # Google Collab

# config.root = '/kaggle/input/' + config.repo
# config.root = 'drive/MyDrive/' + config.repo
config.root = '../'  + config.repo

config.data_path = config.root + 'embeddings_and_labels.csv'
config.test_images_path = config.root + 'images/test'
config.test_labels = config.root + 'test_labels.csv'

config.mlp_model_path = 'saved_instances/SiamesePerceptron.pth'
config.save_model_to = f'{config.model_name}.pth'

In [21]:
df = pd.read_csv(config.data_path)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,503,504,505,506,507,508,509,510,511,label
0,-0.001561,0.159091,-0.326709,-0.226603,-0.097365,0.204614,-0.142634,0.208764,0.469837,0.069866,...,0.410457,-0.024682,0.178265,-0.35734,-0.073279,0.190458,-0.060786,0.364546,0.824516,Acura_MDX
1,0.229466,-0.186505,-0.329016,-0.650666,0.115301,0.149208,0.065496,0.191487,0.42892,-0.016147,...,0.195697,0.171068,0.037157,0.049534,-0.224402,0.225593,0.180314,0.073213,0.500426,Acura_MDX
2,-0.203165,0.330612,-0.413488,-0.128357,0.013811,0.244605,-0.137738,0.365148,0.385303,0.318809,...,0.413026,0.040102,0.054195,-0.251548,-0.020557,0.241178,0.0629,0.354683,0.59187,Acura_MDX
3,-0.052456,0.385235,-0.423291,-0.007358,0.031012,0.242327,0.118072,0.23562,0.464901,0.320997,...,0.412529,0.0152,0.237078,-0.429586,-0.110773,0.166453,0.079581,0.243743,0.662615,Acura_MDX
4,0.092909,0.075529,-0.04049,0.137453,0.53128,0.219033,-0.100117,0.161784,-0.07792,-0.265161,...,-0.02452,0.143381,-0.099898,0.137324,0.137523,0.217806,0.313156,0.400654,0.217053,Acura_MDX


In [22]:
df.columns[-20:]

Index(['493', '494', '495', '496', '497', '498', '499', '500', '501', '502',
       '503', '504', '505', '506', '507', '508', '509', '510', '511', 'label'],
      dtype='object')

**label** is our last column. Rest are embeddings

In [23]:
embeddings_bag = df[(df.columns[:-1])]
embeddings_bag.head(2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,502,503,504,505,506,507,508,509,510,511
0,-0.001561,0.159091,-0.326709,-0.226603,-0.097365,0.204614,-0.142634,0.208764,0.469837,0.069866,...,0.111356,0.410457,-0.024682,0.178265,-0.35734,-0.073279,0.190458,-0.060786,0.364546,0.824516
1,0.229466,-0.186505,-0.329016,-0.650666,0.115301,0.149208,0.065496,0.191487,0.42892,-0.016147,...,0.13534,0.195697,0.171068,0.037157,0.049534,-0.224402,0.225593,0.180314,0.073213,0.500426


In [24]:
labels = df[df.columns[-1]]
labels.head(2)

0    Acura_MDX
1    Acura_MDX
Name: label, dtype: object

In [25]:
labels.unique()[:10]

array(['Acura_MDX', 'Alfa Romeo_Giulietta', 'Audi_100', 'Audi_80',
       'Audi_A1', 'Audi_A3', 'Audi_A4', 'Audi_A5', 'Audi_A6', 'Audi_A7'],
      dtype=object)

In [26]:
config.num_of_classes = labels.nunique()

Now we apply One Hot Encoding

In [27]:
def apply_label_encoding(labels: t.Union[pd.Series, np.array],
                         encoder_name: os.path,
                         action='encode',
     ):
    """
    One Hot encoding. We apply encoding by replacing the label column in dataframe.
    As for decoding data back, we work with vector-array (as it's most likely to
    be a prediction result)
    """
    encoder = LabelEncoder()
    if action == 'encode':
        # We transform dataframe here. Nothing returns
        # data = data.with_columns(pl.DataFrame(encoder.fit_transform(data[column]), schema=['label']))
        encoder_name = f"{encoder_name}_LEncoder.pkl"
        if encoder_name in os.listdir():
            with open(encoder_name, "rb") as fp:
                encoder: LabelEncoder = joblib.load(fp)
            labels = encoder.transform(labels)
            print_highlighted("Encoded with existing encoder.")
        else:
            labels = encoder.fit_transform(labels)
            with open(encoder_name, "wb") as fp:
                joblib.dump(encoder, fp)
        return labels
    elif action == 'decode':
        # We pass vector here. Result is a vector
        with open(f"{encoder_name}_LEncoder.pkl", "rb") as fp:
            encoder: LabelEncoder = joblib.load(fp)
        return encoder.inverse_transform(labels)

In [28]:
labels = pd.DataFrame(apply_label_encoding(labels, action='encode', encoder_name="embeddings_labels"), columns=['label'])
labels.head()

[43mEncoded with existing encoder.[0m


Unnamed: 0,label
0,0
1,0
2,0
3,0
4,0


In [29]:
apply_label_encoding(labels, action='decode', encoder_name="embeddings_labels")

array(['Acura_MDX', 'Acura_MDX', 'Acura_MDX', ..., 'ZIL_5301_Bychok',
       'ZIL_5301_Bychok', 'ZIL_5301_Bychok'], dtype=object)

## Pytorch Dataset to run model on

In [30]:
from sklearn.model_selection import train_test_split

In [31]:
train_embeddings_bag, test_embeddings_bag, train_labels, test_labels = train_test_split(embeddings_bag, labels, random_state=config.seed, train_size=21790/24564, shuffle=False)

In [32]:
# If we shuffle the data, we need to reset indexes
train_labels.reset_index(drop=True, inplace=True)
test_labels.reset_index(drop=True, inplace=True)
test_embeddings_bag.reset_index(drop=True, inplace=True)
train_embeddings_bag.reset_index(drop=True, inplace=True)

In [33]:
print(test_labels)

      label
0         0
1         0
2         0
3         1
4         1
...     ...
2769    758
2770    758
2771    759
2772    759
2773    759

[2774 rows x 1 columns]


## Feedforward model to process embeddings pairs

That's the schema of our network

In [185]:
class SiameseNetwork(nn.Module):
    def __init__(self, embedding_size):
        super().__init__()

        self.fc = nn.Sequential(
            nn.Linear(in_features=embedding_size, out_features=1024),
            nn.ReLU(),
            nn.Linear(in_features=1024, out_features=1024),
            nn.ReLU(),
            nn.Dropout(p=0.3, inplace=False),
            nn.Linear(in_features=1024, out_features=1),
            nn.Sigmoid()
        )

    def forward(self, x1, x2):
        square = (x1 - x2)**2
        square = square.to(torch.float32)
        # Pass the inputs through fully connected layers
        output = self.fc(square)
        return output

In [179]:
# But we will use trained and saved model
# perceptron_model = SiameseNetwork(config.embedding_size)

In [186]:
try:
    # Load weights from previously trained
    perceptron_model = torch.load(config.mlp_model_path, map_location=torch.device('cpu'))
except FileNotFoundError:
    print("No trained model found.")

## Quality metrics

For training tracking we will use accuracy and F1 score

In [37]:
f1_score = MulticlassF1Score(num_classes=config.num_of_classes)

For model evaluation we will also use other metrics

### Device

In [181]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
perceptron_model = perceptron_model.to(device)

f1_score = f1_score.to(device)
torch.cuda.empty_cache()

## Label prediction based of K Nearest

## How values match among nearest neighbors

In [39]:
from sklearn.neighbors import NearestNeighbors

In [40]:
neigh = NearestNeighbors(n_neighbors=5, metric='cosine')

We train Neighbors model on `train_embeddings_bag`, so the model will predict the indexes related with **that** array.

In [66]:
neigh.fit(train_embeddings_bag.to_numpy())

In [51]:
# found_nearest = neigh.kneighbors(torch.unsqueeze(r_embedding, dim=0).cpu().numpy())
found_nearest = neigh.kneighbors(test_embeddings_bag, return_distance=False)

In [53]:
print(pd.DataFrame(found_nearest))

          0      1      2      3      4
0        11  11805   8577   8596  15828
1      2295     11   2175   6554   2316
2         2      3      0      1   4780
3        25     23     17     19     15
4        27     26     21   8508   8510
...     ...    ...    ...    ...    ...
2769  21749  21736   2979  21738  21737
2770  21742  21738  21745  21741  21748
2771  21780  21770  21765  21767  21771
2772  21785   3554  21758   3259   3540
2773  21788  21768  12483  18723  10177

[2774 rows x 5 columns]


In [55]:
def get_dataset_with_nearest(array_of_nearest: np.array):
    neighbors = []
    with tqdm(array_of_nearest, desc="Processing",unit="row") as process:
        for row_of_nearest in process:
            labels_of_closest = []
            for nearest_label in row_of_nearest:
                labels_of_closest.append(labels.iat[nearest_label, 0])  # Get scalar from current row and first column
            neighbors.append(labels_of_closest)
    label_array = pd.DataFrame(neighbors)
    return label_array

neighbors_labels = get_dataset_with_nearest(found_nearest)

Processing: 100%|██████████| 2774/2774 [00:00<00:00, 9425.33row/s]


In [56]:
neighbors_labels

Unnamed: 0,0,1,2,3,4
0,0,408,279,280,582
1,138,0,131,229,138
2,0,0,0,0,176
3,1,1,1,1,1
4,1,1,1,287,287
...,...,...,...,...,...
2769,758,758,151,758,758
2770,758,758,758,758,758
2771,759,759,759,759,759
2772,759,160,759,154,160


In [57]:
neighbors_labels.nunique()

0    756
1    752
2    750
3    747
4    749
dtype: int64

All labels are covered.

Now we will check how do the neighbors labels match to each other.

In [58]:
def all_values_same(row):
    return all(row == row[0])

In [59]:
def some_values_same(row, number_of_same=3):
    unique_numbers = set(row)
    all_numbers = list(row)
    [all_numbers.remove(u) for u in unique_numbers]
    # Now we can select any number - list contains only numbers that represent the majority.
    try:
        value_to_compare = all_numbers[0]
    except IndexError:
        value_to_compare = row[0]

    return ((row == value_to_compare).sum() >= number_of_same)

In [60]:
neighbors_labels['are_labels_same'] = neighbors_labels.apply(all_values_same, axis=1)

In [61]:
neighbors_labels['three_are_same'] = neighbors_labels.loc[:, neighbors_labels.columns[:-1]].apply(some_values_same, axis=1)

Let's check how good our Neighbors model works

In [62]:
neighbors_labels

Unnamed: 0,0,1,2,3,4,are_labels_same,three_are_same
0,0,408,279,280,582,False,False
1,138,0,131,229,138,False,False
2,0,0,0,0,176,False,True
3,1,1,1,1,1,True,True
4,1,1,1,287,287,False,True
...,...,...,...,...,...,...,...
2769,758,758,151,758,758,False,True
2770,758,758,758,758,758,True,True
2771,759,759,759,759,759,True,True
2772,759,160,759,154,160,False,False


In [63]:
print(f"In {len(neighbors_labels[neighbors_labels['are_labels_same']]) / len(neighbors_labels) * 100}% of data classes have all same class in its' nearest vectors")

In 53.28046142754146% of data classes have all same class in its' nearest vectors


In [64]:
print(f"In {len(neighbors_labels[neighbors_labels['three_are_same']]) / len(neighbors_labels) * 100}% of data classes have at least 3 of same class in its' nearest vectors")

In 86.58976207642394% of data classes have at least 3 of same class in its' nearest vectors


So, when we have target object to predict label for, we can take some odd number of neighbors (e.g. $N=5$), and then do a Vote using majority label! It is enough to have at least $\frac{N-1}{2}$ of same class to make a strong prediction. Let's see does it work at all.

In [49]:
# embeddings_bag = embeddings_bag.to_numpy()
# labels = labels.to_numpy()
# labels = apply_label_encoding(labels, action='decode', encoder_name="embeddings_labels")

In [65]:
from collections import Counter

In [51]:
def predict_with_nearest(all_nearest_indexes, labels):

    correct = 0
    total = 0

    for i, row in enumerate(tqdm(all_nearest_indexes)):
        nearest_indexes = row
        nearest_classes = [labels[n].item() for n in nearest_indexes]
        counter = Counter(nearest_classes)
        predicted_class = counter.most_common(1)[0][0]

        target_label = labels[i]  # Actual class
        correct += int(target_label == predicted_class)
        total += 1
    print_highlighted(f"Accuracy is: {correct/total}")


In [52]:
# Here we get an array of size B x N, where B is a batch size we want to test at once; N is a number of neighbors (we will take all dataset)
bag_of_nearest = neigh.kneighbors(test_embeddings_bag, return_distance=False)

In [203]:
predict_with_nearest(bag_of_nearest, train_labels.to_numpy(),)

NameError: name 'bag_of_nearest' is not defined

That's a result on all embeddings, either familiar for our Backbone CNN model (MobileNet trained with Arcface) or not.

In [55]:
del bag_of_nearest
del neighbors_labels

Let's test our Model by prediction the similarity between objects that are nearest in the vector area. For that case we will use our NearestNeighbors trained algorithm.

In [56]:
# predicted_proba, target_labels = get_probabilities(model=perceptron_model, dataloader=test_dataloader, device=device)

NameError: name 'test_dataloader' is not defined

## Testing on real photos

We will upload our backbone model to get embeddings of test photos.

In [67]:
config.embedding_model_path = 'saved_instances/ArcFace_mobilenet_v2.pth'

In [68]:
def get_input_feature_size(classifier: nn.Sequential) -> int:
    for module in classifier.modules():
        if isinstance(module, nn.Linear):
            return module.in_features

In [69]:
def get_model(model_name='resnet18', from_path=None, pretrained=True,) -> torch.nn.Module:
    """
        Multipurpose function to load the model. For our task we will use fully trained model. If you don't have such, you may
    download any pretrained model cut last layer - you both `pretrained` and `get_embeddings` set to True.
    :param model_name:
    :param get_embeddings: whether to cut the classifier layer
    :return:
    """
    if from_path:
        try:
            model = torch.load(from_path, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu"))
            return model
        except FileNotFoundError:
            raise
    elif model_name:
        model = getattr(models, model_name) # We use builtin function
        model = model(
            weights=('DEFAULT' if pretrained else None)
        )

    model.classifier = nn.Sequential(
        # nn.Dropout(p=0.3, inplace=True),
        nn.Linear(in_features=get_input_feature_size(model.classifier),
                  out_features=config.embedding_size, bias=True
                  ),

    )
    return model

In [70]:
embedding_model = get_model(
    from_path=config.embedding_model_path,
)

## Get embeddings online

Let's build a test dataset of images. Then, we will pass them through the backbone and try to predict a label. So, we will use everything: backbone CNN model, KNeighbors model, Binary Classifier model.

In [363]:
config.test_images_path = '../data/val_dataset_segmented'
config.test_labels = "../data/val_labels.csv"

In [71]:
def get_file_path_by_id(file_id, dir=config.root):
    return os.path.join(dir, str(file_id) + ".jpg")

In [74]:
data_transforms = {
    # Only validation is needed.

    "val": A.Compose([
        #         A.ToRGB(),
        A.Resize(config.image_dimension, config.image_dimension),
        A.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225],
            max_pixel_value=255.0,
            p=1.0
        ),
        ToTensorV2()], p=1.)
}

In [75]:
class CustomImagesDataset(Dataset):
    """
    """
    def __init__(self, data: pd.DataFrame=None, images_path: os.path=None, labels_path:os.path=None, transform_images: A.Compose=None, encoder_name=None):
        """

        """
        super().__init__()
        assert (data is not None) or (labels_path is not None and images_path is not None)

        if data is None:
            data = pd.read_csv(labels_path)
            data['file_path'] = data['id'].apply(get_file_path_by_id, dir=images_path)

        self.images_paths = data['file_path']


        self.encoder_name = encoder_name if encoder_name else self.__hash__()  # We use hash as a unique name
        print_highlighted(f"Label Encoder saved with id `{self.encoder_name}`")
        self.labels = apply_label_encoding(labels=data['label'], action='encode', encoder_name=self.encoder_name)

        # self.labels = data['label']
        #         self.indexes = data['id'].values
        self.transform_images = transform_images
        self.__set_dataset_len()

    def __set_dataset_len(self):
        self.length = self.labels.shape[0] # Number of rows

    def __len__(self):
        """
        We calculate the len in another function, so that we are able to slice.
        """
        return self.length

    def __getitem__(self, index) -> tuple[torch.Tensor, int]:
        """ Function to return item by indexing the dataset """

        if not isinstance(index, int) and isinstance(index, slice):
            # It's not an index, but slice.
            # We will return the part of data by making a copy of the dataset
            index: slice
            self = deepcopy(self)
            self.length = index.stop  # Cut the length of dataset.
            self.labels = self.labels[:self.length]
            return self
        assert self.__len__() >= index

        image = to_pil_image(read_image(self.images_paths[index]))
        if self.transform_images:
            # Albumentations requires us to convert image to Numpy Array
            image = self.transform_images(image=np.array(image))['image']

        label = self.labels[index]
        return image, label

In [364]:
test_images_dataset = CustomImagesDataset(images_path=config.test_images_path, labels_path=config.test_labels, transform_images=data_transforms['val'], encoder_name='test_labels')

[43mLabel Encoder saved with id `test_labels`[0m
[43mEncoded with existing encoder.[0m


In [368]:
len(test_images_dataset)

11

In [369]:
config.batch_size = 320

In [370]:
test_dataloader = DataLoader(
    test_images_dataset,
    batch_size=config.batch_size,
    shuffle=False,
    num_workers=os.cpu_count() % 4,
)

In [164]:
@torch.inference_mode()
def predict_online(dataloader, all_labels, feature_extractor_model, neighbors_model, device):
    feature_extractor_model = feature_extractor_model.to(device)
    feature_extractor_model.eval()
    correct = 0
    total = 0
    all_predictions = []

    with tqdm(dataloader, desc="Processing...",unit="batch") as process:
        for images, target_labels in process:
            images = images.to(device)
            target_labels = target_labels.to(device)
            # print(labels)

            embeddings = feature_extractor_model(images).cpu()

            bag_of_nearest_indexes = neighbors_model.kneighbors(embeddings, return_distance=False)
            # print(bag_of_nearest_indexes)
            del images
            del embeddings


            # for i, row in enumerate(tqdm(bag_of_nearest_indexes, desc="|__Predicting...")):
            for i, row in enumerate(bag_of_nearest_indexes):

                nearest_indexes = row
                nearest_classes = [all_labels[n].item() for n in nearest_indexes]
                counter = Counter(nearest_classes)
                predicted_class = counter.most_common(1)[0][0]
                #
                target_label = target_labels[i]  # Actual class
                correct += int(target_label == predicted_class)
                total += 1
                all_predictions.append(predicted_class)

        # print(np.take(all_labels,bag_of_nearest), labels)
        print_highlighted(f"Accuracy is: {correct/total}")

    return np.array(all_predictions)

In [165]:
test_Y_pred = predict_online(test_dataloader, labels.to_numpy(), embedding_model, neigh, device)

Processing...: 100%|██████████| 2/2 [00:06<00:00,  3.19s/batch]

[43mAccuracy is: 0.83[0m





In [190]:
perceptron_model(torch.rand(10, 512), torch.rand(1, 512))

tensor([[4.0977e-06],
        [4.5946e-07],
        [7.9427e-06],
        [2.9541e-06],
        [1.2626e-06],
        [1.0746e-06],
        [1.1124e-06],
        [9.1576e-07],
        [7.4756e-06],
        [1.3007e-05]], grad_fn=<SigmoidBackward0>)

Note: while using Siamese Network, we are able to pass one vector as `X1` and array of vectors as `X2`. So, we can pass one target vector and batch of nearest vectors - thus we get batch of similarity ratio

In [239]:
# Example
k = 5
with torch.no_grad():
    print(perceptron_model(torch.rand(4, 1, 512), torch.rand(4, k, 512)).shape)
del k

tensor([[[2.6600e-06],
         [2.7041e-06],
         [8.2067e-06],
         [4.0064e-07],
         [4.3008e-07]],

        [[9.9791e-06],
         [2.6468e-04],
         [3.4462e-07],
         [1.3662e-04],
         [5.1698e-06]],

        [[5.5825e-09],
         [3.9484e-09],
         [1.7595e-06],
         [1.3264e-07],
         [1.1600e-06]],

        [[1.2489e-07],
         [1.5809e-07],
         [5.6568e-04],
         [3.2742e-07],
         [9.7062e-05]]])


In [352]:
def get_probable_label(weights, labels) -> int:
    """
    This function is supposed to process two vectors:
    `labels` represents sequence of labels of neighbors, and values MAY REPEAT
    `weights` show how probable each value is.
    The main problem to solve here is that it might be two same labels in array (so it's more probable).
    :param weights:
    :param labels:
    :return:
    """
    probs = defaultdict(float)
    for i, label in enumerate(labels):
        probs[label.item()] += weights[i].item()

    label_with_max_proba, _ = max(probs.items(), key=lambda x: x[1])  # Iterate over values, but get the key.
    return label_with_max_proba


In [353]:
@torch.inference_mode()
def predict_with_weighted_nearest(X: torch.tensor, backbone_model, binary_clf_model, device):
    """
        Function to enhance the prediction of KNeighbors model. We still use nearest neighbors to get probable labels,
    and then, we use Binary Classificator MLP to compare neighbors embeddings with our target image embedding.
    :param X: image or sequence of images - normalized 3x256x256 vectors;
    :param backbone_model: CNN network without classifier layer - to get image embedding;
    :param binary_clf_model: embedding classifier that detects whether two vectors are of same class (label);
    :param device:
    :return: probabilities of predicted labels
    """
    backbone_model = backbone_model.to(device)
    binary_clf_model = binary_clf_model.to(device)

    if X.dim() == 3:
        # Means it's one image, not a batch
        X = torch.unsqueeze(X, dim=0)  # Turn it into batch
        print('unsqueezed.')
    elif X.dim() == 4:
        pass

    embedding = backbone_model(X)  # This vector is two-dimensional as it is a batch

    bag_of_nearest_indexes = neigh.kneighbors(embedding, return_distance=False)  # Shape is `B x k`, k are neighbors
    # bag_of_nearest_indexes = torch.tensor(bag_of_nearest_indexes).to(device)
    # print(bag_of_nearest_indexes)

    # Now we select embeddings by their indexes. For the case of indexing array by another array, numpy.take (https://numpy.org/doc/stable/reference/generated/numpy.take.html) works fine.
    # We're indexing the `N x 512` array by the `B x K` array, and the result is the `B x K x 512` array of embeddings.
    # Try following to see:
    # print(train_embeddings_bag.to_numpy().shape, bag_of_nearest_indexes.shape, np.take(train_embeddings_bag.to_numpy(), bag_of_nearest_indexes, axis=0).shape)

    batch_of_nearest_vectors = np.take(train_embeddings_bag.to_numpy(), bag_of_nearest_indexes, axis=0)
    batch_of_nearest_labels = torch.tensor(np.squeeze(np.take(train_labels.to_numpy(), bag_of_nearest_indexes, axis=0))) # Shape of (B, K)

    # Note: `batch_of_nearest` is 3-dimensional. If we are to compare nearest with the target embedding,
    # we have to adjust this vector to same dimension.
    embedding = torch.unsqueeze(embedding, dim=1) # From shape (B, 512) to (B, 1, 512)
    predicted_similarity = binary_clf_model(embedding, batch_of_nearest_vectors)
    predicted_similarity = torch.squeeze(predicted_similarity)  # From (B, K, 1) to (B, K)

    # Now we can get the highest probability, and apply it as our prediction. But there is also
    # more stable way: to sum probabilities of same class firstly.

    # We normalize the proba among the nearest (as they are too close originally), but it's not really necessary
    predicted_similarity = F.normalize(predicted_similarity, dim=1)


    # Now we are going to iterate over nearest to get most probable label per each item
    predictions = []
    for row_of_similarities, row_of_labels in zip(predicted_similarity, batch_of_nearest_labels):
        predictions.append(get_probable_label(weights=row_of_similarities, labels=row_of_labels))
    return torch.tensor(predictions)



In [377]:
total = 0
correct = 0
preds = []
all_labels = []
with tqdm(test_dataloader, desc="Processing...",unit="batch") as process:
    for x, labels in process:
        y_pred = predict_with_weighted_nearest(X=x, backbone_model=embedding_model, binary_clf_model=perceptron_model, device=device)
        correct += (labels == y_pred).int().sum().item()
        total += labels.size(0)
        preds.extend(list(y_pred.cpu().numpy()))
        all_labels.extend(list(labels.cpu().numpy()))

print_highlighted(f"Accuracy is: {correct/total}")
print(list(zip(all_labels, preds)))

Processing...: 100%|██████████| 1/1 [00:02<00:00,  2.18s/batch]

[43mAccuracy is: 0.6363636363636364[0m
[(269, 269), (279, 596), (303, 303), (306, 306), (308, 308), (316, 316), (354, 354), (388, 390), (457, 458), (487, 601), (516, 516)]





In [335]:
predict_with_weighted_nearest

<function __main__.predict_with_weighted_nearest(X: <built-in method tensor of type object at 0x00007FFCC690C560>, backbone_model, binary_clf_model, device)>

In [154]:
torch.rand(1, 3, 2, 2).dim()

4

In [148]:
temporary_variables()

Target label:  411
Index is 11855; got vector of shape (512,); similarity is:[0.99997866].Label is 411
Index is 11858; got vector of shape (512,); similarity is:[0.99949944].Label is 411
Index is 11859; got vector of shape (512,); similarity is:[0.99895465].Label is 411
Index is 21139; got vector of shape (512,); similarity is:[0.992599].Label is 740
Index is 11853; got vector of shape (512,); similarity is:[0.9963295].Label is 411


In [143]:
test_images_dataset[1]

(tensor([[[-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          ...,
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179]],
 
         [[-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
          [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
          [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
          ...,
          [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
          [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
          [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357]],
 
         [[-1.8044, -1.8044, -1.8044,  ..., -1.8044, -1.8044, -1.8044],
          [-1.8044, -1.8044,

In [126]:
i = 0
while i < len(test_images_dataset) -1:
    print(test_images_dataset[i][1], test_images_dataset[i+1][1])
    i+=1

269 279
279 303
303 306
306 308
308 316
316 354
354 388
388 457
457 487
487 516


In [94]:
print(test_images_dataset.images_paths)

0      ../data/images/test\1.jpg
1      ../data/images/test\2.jpg
2      ../data/images/test\3.jpg
3      ../data/images/test\4.jpg
4      ../data/images/test\5.jpg
5      ../data/images/test\6.jpg
6      ../data/images/test\7.jpg
7      ../data/images/test\8.jpg
8      ../data/images/test\9.jpg
9     ../data/images/test\10.jpg
10    ../data/images/test\11.jpg
Name: file_path, dtype: object


## Real testing on the photos from Internet

Now we will repeat this experiment with real photos from Internet. No models were trained on them