## **Computing `Mean Average Precision (mAP)` and `Top-k Accuracy` for our Retrieval System**
We'll sample 1000 random images in the `validation dataset` as queries. For each query:
1. Retrieve all images, rank them by similarity.
2. Compute average precision for each query.
3. Take the mean across all queries => mAP. This takes into consideration the ranking of the images.
4. We will evaluate top 1 accuracy and top-k accuracy.

### **1. Import Necessary Libraries**

In [1]:
import cv2
import numpy as np
import random
from itertools import islice
from torchvision import transforms
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from torchvision.models import vgg16, VGG16_Weights

from scripts.evaluate import retrieve_top_k_similar, top_k_accuracy, compute_map_top_k
from src.datasets import OxfordFlowerDataset
from src.features import DeepConvFeature
from src.encoders import VLADEncoder, FisherVectorEncoder
from src.utils import cosine_similarity, load_model
from src.config import ROOT, DEVICE

Device used: cuda


  check_for_updates()


# Declare Datasets

In [2]:
train_dataset = OxfordFlowerDataset(purpose="train")
val_dataset = OxfordFlowerDataset(purpose="validation")

### **3. Deep Conv Feature Extractor**


In [3]:
extractor = DeepConvFeature(
    model=vgg16(weights=VGG16_Weights.DEFAULT),
    layer_index=-1,  # Last conv layer
    spatial_encoding=True,
    device=DEVICE
)

2025-01-06 12:24:45,927 - Feature_Extractor - INFO - Selected layer: features.28, Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))


### **2. Load the PCA and KMeans models**

In [4]:
kmeans_model = load_model(rf'{ROOT}/models/pickle_model_files/k_means_k256_deep_features_vgg16_no_pca.pkl')
kmeans_model_pca = load_model(rf'{ROOT}/models/pickle_model_files/k_means_k256_deep_features_vgg16_pca.pkl')
pca_model_vlad = load_model(rf'{ROOT}/models/pickle_model_files/pca_vlad_k256_deep_features_vgg16_feature_dim257.pkl')

If you have not yet trained your model or saved it, you can train it using the following code (might take a while. Reduce num_clusters to make it faster. This comes at the cost of the performance, however).

In [5]:
NUM_CLUSTERS = 256
IMAGE_SIZE = (224, 224)
DIM_REDUCTION_FACTOR = 2

labels, paths, features = [], [], []
for img, lbl, path in train_dataset:
    labels.append(lbl)
    paths.append(path)
    features.append(extractor(img))

labels = np.array(labels)
features = np.vstack(features)

kmeans_model = KMeans(n_clusters=NUM_CLUSTERS, random_state=42)
kmeans_model.fit(features)

pca_model_vlad = PCA(n_components=NUM_CLUSTERS // DIM_REDUCTION_FACTOR)
pca_model_vlad.fit(features)
reduced_features = pca_model_vlad.transform(features)

kmeans_model_pca = KMeans(n_clusters=NUM_CLUSTERS, random_state=42)
kmeans_model_pca.fit(reduced_features)

KeyboardInterrupt: 

### **3. Load the Encoders**

In [6]:
vlad_encoder = VLADEncoder(
    feature_extractor=extractor,
    kmeans_model=kmeans_model_pca,
    pca=pca_model_vlad,
    power_norm_weight=1.0,
)

# TODO: declare other encoders also and loop

### **4. Sample random images as queries**

In [7]:
sample_size = 1000
idxs = random.sample(range(len(val_dataset)), sample_size)

queries = []
query_labels = []
for idx in idxs:
    queries.append(val_dataset[idx][0])
    query_labels.append(val_dataset[idx][1])

### **5. Compute the mAP**

First,we prepare the data.

In [9]:
train_paths, train_labels = zip(*[(path, label) for _, label, path in train_dataset])
train_dataset_vectors = vlad_encoder.generate_encoding_map(train_paths)
dataset_labels_dict = dict(zip(train_paths, train_labels))

How it works:
- If `k` is given, we only consider the `top-k` ranked results per query.
- if `k=None` or omitted, we consider all results (the entire dataset).
- For each query, we compute average precision (AP). Then we average across all queries, yielding mean average precision (mAP).

First, we do it for the whole dataset:

In [15]:
mAP_value = compute_map_top_k(
    queries=queries,
    query_labels=query_labels,
    dataset=train_dataset_vectors,  # {path: vector}
    dataset_labels=dataset_labels_dict,    # {path: label}
    encoder=vlad_encoder  # or vlad_encoder, fisher_encoder
)

print("Mean Average Precision (mAP):", mAP_value)

KeyboardInterrupt: 

Normally, we might only care about the top results. Let's compute the mAP for the top 5 results:

In [None]:
mAP_value_top5 = compute_map_top_k(
    queries=queries,
    query_labels=query_labels,
    dataset=train_dataset_vectors,
    dataset_labels=dataset_labels_dict,
    encoder=vlad_encoder,
    k=5
)
print("Mean Average Precision (mAP) for Top-5:", mAP_value_top5)

## **6. Top-k accuracy**

How it works:
- For each query, retrieve **top-k** most similar images.
- If any of them share the same label as the query, that counts as correct.
- The final accuracy is `num_correct_queries / num_queries`.

Let's compute the top-1 accuracy (the most relevant match has to be the correct one):

In [16]:
acc_k5 = top_k_accuracy(
    queries=queries,
    query_labels=query_labels,
    dataset=train_dataset_vectors,
    dataset_labels=dataset_labels_dict,
    encoder=vlad_encoder,
    k=1
)
print("Top-5 Accuracy:", acc_k5)

Top-5 Accuracy: 0.18


Normally, we might also consider the second, third and so on.. most relevant results. In this case, we can set `k > 1`. Let's try for `k=5`:

In [None]:
acc_k5 = top_k_accuracy(
    queries=queries,
    query_labels=query_labels,
    dataset=train_dataset_vectors,
    dataset_labels=dataset_labels_dict,
    encoder=vlad_encoder,
    k=5
)
print("Top-5 Accuracy:", acc_k5)