## **Computing `Mean Average Precision (mAP)` and `Top-k Accuracy` for our Retrieval System**
We'll use all images in the `validation + test dataset` as queries. For each query:
1. Retrieve all images, rank them by similarity.
2. Compute average precision for each query.
3. Take the mean across all queries => mAP. This takes into consideration the ranking of the images.
4. We will evaluate top 1 accuracy and top-k accuracy.

### **1. Import Necessary Libraries**

In [1]:
import numpy as np
from torchvision.models import vgg16, VGG16_Weights

from pyvisim.eval import top_k_accuracy, top_k_map
from scripts.train import train_k_means, train_pca, train_gmm
from pyvisim.datasets import OxfordFlowerDataset
from pyvisim.features import DeepConvFeature
from pyvisim.encoders import VLADEncoder, FisherVectorEncoder, Pipeline
from pyvisim._utils import load_model, plot_and_save_barplot
from pyvisim._config import ROOT

Device used: cuda


  check_for_updates()


# Declare Datasets

In [2]:
train_dataset = OxfordFlowerDataset(purpose="train")
val_dataset = OxfordFlowerDataset(purpose=["validation", "test"])

In [3]:
train_imgs, train_labels = zip(*[(img, label) for img, label, _ in train_dataset])
val_imgs, val_labels = zip(*[(img, label) for img, label, _ in val_dataset])

### **3. Deep Conv Feature Extractor**


In [4]:
extractor = DeepConvFeature(
    model=vgg16(weights=VGG16_Weights.DEFAULT),
    layer_index=-1,  # Last conv layer
    device=DEVICE
)

2025-01-09 22:42:07,942 - Feature_Extractor - INFO - Selected layer: features.28, Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))


### **2. Load the PCA and KMeans models**

According to my experience, VLAD works better without PCA, so VLAD vectors are computed without PCA. However, Fisher Vectors benefit significantly from PCA.

In [5]:
kmeans_model_k256 = load_model(rf'{ROOT}/models/pickle_model_files/k_means_k256_deep_features_vgg16_no_pca.pkl')

gmm_model_pca = load_model(rf'{ROOT}/models/pickle_model_files/gmm_k256_deep_features_vgg16_pca.pkl')
pca_model = load_model(rf'{ROOT}/models/pickle_model_files/pca_fisher_k256_deep_features_vgg16_feature_dim257.pkl')

If you have not yet trained your model or saved it, you can train it using the following code (might take a while. Reduce num_clusters to make it faster. This comes at the cost of the performance, however).

In [6]:
NUM_CLUSTERS = 256
IMAGE_SIZE = (224, 224)
DIM_REDUCTION_FACTOR = 2

labels, paths, features = [], [], []
for img, lbl, path in train_dataset:
    labels.append(lbl)
    paths.append(path)
    features.append(extractor(img))

labels = np.array(labels)
features = np.vstack(features)

pca_model= train_pca(reduction_factor=DIM_REDUCTION_FACTOR, features=features)
reduced_features = pca_model.transform(features)

kmeans_model_k256= train_k_means(n_clusters=NUM_CLUSTERS, features=features)

gmm_model_pca = train_gmm(n_components=NUM_CLUSTERS, features=reduced_features)

### **3. Load the Encoders**

In [7]:
vlad_encoder = VLADEncoder(
    feature_extractor=extractor,
    kmeans_model=kmeans_model_k256,
    power_norm_weight=1.0,
)

fisher_vector_encoder = FisherVectorEncoder(
    feature_extractor=extractor,
    gmm_model=gmm_model_pca,
    pca=pca_model,
    power_norm_weight=0.5,
)


pipeline_with_pca = Pipeline(
    [vlad_encoder, fisher_vector_encoder]
)

## **4. Performance metrics**

First, we prepare the data.

In [8]:
train_paths, train_labels = zip(*[(path, label) for _, label, path in train_dataset])
encodings_vlad = vlad_encoder.generate_encoding_map(train_paths)
encodings_fisher = fisher_vector_encoder.generate_encoding_map(train_paths)
encodings_pipeline = pipeline_with_pca.generate_encoding_map(train_paths)
dataset_labels_dict = dict(zip(train_paths, train_labels))

## **5.1. Top-k accuracy**

How it works:
- For each query, retrieve **top-k** most similar images.
- If any of them share the same label as the query, that counts as correct.
- The final accuracy is `num_correct_queries / num_queries`.

Let's compute the top-1 accuracy (the most relevant match has to be the correct one):

In [9]:
# Top-1 Accuracy for VLAD with PCA
acc_k1_vlad = top_k_accuracy(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_vlad,
    path_labels_dict=dataset_labels_dict,
    encoder=vlad_encoder,
    k=1
)
print("Top-1 Accuracy, VLAD:", acc_k1_vlad)

Top-1 Accuracy, VLAD: 0.6931372549019608


In [10]:
# Top-1 Accuracy for Fisher with PCA
acc_k1_fisher = top_k_accuracy(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_fisher,
    path_labels_dict=dataset_labels_dict,
    encoder=fisher_vector_encoder,
    k=1
)
print("Top-1 Accuracy, Fisher Vector:", acc_k1_fisher)

Top-1 Accuracy, Fisher Vector: 0.667156862745098


In [11]:
# Top-1 Accuracy for Pipeline with PCA
acc_k1_pipeline = top_k_accuracy(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_pipeline,
    path_labels_dict=dataset_labels_dict,
    encoder=pipeline_with_pca,
    k=1
)
print("Top-1 Accuracy, Pipeline:", acc_k1_pipeline)

Top-1 Accuracy, Pipeline: 0.6936274509803921


Normally, we might also consider the second, third and so on.. most relevant results. In this case, we can set `k > 1`. Let's try for `k=5`:

In [12]:
# Top-5 Accuracy for VLAD with PCA
acc_k5_vlad = top_k_accuracy(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_vlad,
    path_labels_dict=dataset_labels_dict,
    encoder=vlad_encoder,
    k=5
)
print("Top-5 Accuracy, VLAD:", acc_k5_vlad)

Top-5 Accuracy, VLAD: 0.8671568627450981


In [13]:
# Top-5 Accuracy for Fisher with PCA
acc_k5_fisher = top_k_accuracy(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_fisher,
    path_labels_dict=dataset_labels_dict,
    encoder=fisher_vector_encoder,
    k=5
)
print("Top-5 Accuracy, Fisher Vector:", acc_k5_fisher)

Top-5 Accuracy, Fisher Vector: 0.8387254901960784


In [None]:
# Top-5 Accuracy for Pipeline with PCA
acc_k5_pipeline = top_k_accuracy(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_pipeline,
    path_labels_dict=dataset_labels_dict,
    encoder=pipeline_with_pca,
    k=5
)
print("Top-5 Accuracy, Pipeline:", acc_k5_pipeline)

### **5.2. Compute the mAP**

How it works:
- If `k` is given, we only consider the `top-k` ranked results per query.
- if `k=None` or omitted, we consider all results (the entire dataset).
- For each query, we compute average precision (AP). Then we average across all queries, yielding mean average precision (mAP).

Example:
Image `a` has label `1`, and the top-6 retrieved images have labels:
- Truth Labels: [0, 1, 1, 0 ,0, 1]

**a) k=None**: we consider all results.
- Rank 2: AP = 1/2
- Rank 3: AP = 2/3
- Rank 6: AP = 3/6
- mAP = (1/2 + 2/3 + 3/6) / 6 0.278

**b) k=3**:
- Rank 2: AP = 1/2
- Rank 3: AP = 2/3
- mAP = (1/2 + 2/3) / 3 = 0.389

First, we do it for the whole dataset:

In [None]:
mAP_value_vlad = top_k_map(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_vlad,  # {path: vector}
    path_labels_dict=dataset_labels_dict,    # {path: label}
    encoder=vlad_encoder  # or vlad_encoder, fisher_encoder
)
print("Mean Average Precision (mAP), VLAD:", mAP_value_vlad)

In [None]:
mAP_value_fisher= top_k_map(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_fisher,
    path_labels_dict=dataset_labels_dict,
    encoder=fisher_vector_encoder
)
print("Mean Average Precision (mAP), Fisher Vector:", mAP_value_fisher)

In [None]:
mAP_value_pipeline = top_k_map(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_pipeline,
    path_labels_dict=dataset_labels_dict,
    encoder=pipeline_with_pca
)
print("Mean Average Precision (mAP), Pipeline:", mAP_value_pipeline)

Normally, we might only care about the top results. Let's compute the mAP for the top 5 results:

In [None]:
mAP_value_top5_vlad_pca = top_k_map(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_vlad,
    path_labels_dict=dataset_labels_dict,
    encoder=vlad_encoder,
    k=5
)
print("Mean Average Precision (mAP) for Top-5, VLAD:", mAP_value_top5_vlad_pca)

In [None]:
mAP_value_top5_fisher_pca = top_k_map(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_fisher,
    path_labels_dict=dataset_labels_dict,
    encoder=fisher_vector_encoder,
    k=5
)
print("Mean Average Precision (mAP) for Top-5, Fisher Vector:", mAP_value_top5_fisher_pca)

In [None]:
mAP_value_top5_pipeline = top_k_map(
    images=val_imgs,
    image_labels=val_labels,
    encoding_map=encodings_pipeline,
    path_labels_dict=dataset_labels_dict,
    encoder=pipeline_with_pca,
    k=5
)
print("Mean Average Precision (mAP) for Top-5, Pipeline with PCA:", mAP_value_top5_pipeline)

In [None]:
# TODO: Plot bar chart for mAP and top-k accuracy to compare the performance of the encoders.
plot_and_save_barplot(
 {
     "VLAD": [mAP_value_vlad, acc_k1_vlad, acc_k5_vlad],
     "Fisher Vector": [mAP_value_fisher, acc_k1_fisher, acc_k5_fisher],
     "Pipeline": [mAP_value_pipeline, acc_k1_pipeline, acc_k5_pipeline]
 },
    bar_labels=["mAP", "Top-1 Accuracy", "Top-5 Accuracy"],
    title="Performance Metrics for VLAD, Fisher Vector, and Pipeline with PCA",
    ylabel="Value",
    xlabel="Metrics")