## In this notebook we will see how to evaluate on the following benchmarks:

- Pittsburgh (pitts30k-val, pitts30k-test and pitts250k-test) [1]
- MapillarySLS [2]
- Cross Season [3]
- ESSEX [3]
- Inria Holidays [3]
- Nordland [3]
- SPED [3]

[1] NetVLAD: CNN architecture for weakly supervised place recognition (https://github.com/Relja/netvlad)

[2] Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition (https://github.com/FrederikWarburg/mapillary_sls)

[3] VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change (https://github.com/MubarizZaffar/VPR-Bench)

You'll need to download Pittsburgh dataset from [1] (you need to email Relja for the dataset), MapillarySLS validation from [2]. For the other datasets, visit [3] for detail on their amazing benchmark, they also host those datasets on this link (https://surfdrive.surf.nl/files/index.php/s/sbZRXzYe3l0v67W), huge thanks.

---

**Note:** I rewrote the code for loading these datasets to ensure consistency in evaluation across all datasets and to improve its speed. The original code for these datasets was slow for valid reasons. For instance, VPR-Bench calculates multiple metrics, including latency, which requires individual image processing in the forward pass. MSLS offers various evaluation modes, such as Image_to_Image, Sequence_to_Sequence, Sequence_to_Image, among others. In this project, we focus solely on measuring recall@K and as a result, we can significantly speed up the validation process. Therefoe, you'll need to use the precomputed ground_truth that we provide in this repo (in the directory datasets).

That being said, all you need to do is download the dataset and place it in a specific directory (we will need the dataset images). After that, you can hard-code the directory path into a global variable, as we will show in the following steps.


In [2]:
%reload_ext autoreload
%autoreload 2

import sys
sys.path.append('..') # append parent directory, we need it

import torch
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as T
import matplotlib.pyplot as plt
import numpy as np
from tqdm.notebook import tqdm
from utils.validation import get_validation_recalls

In [3]:
MEAN=[0.485, 0.456, 0.406]; STD=[0.229, 0.224, 0.225]

IM_SIZE = (320, 320)

def input_transform(image_size=IM_SIZE):
    return T.Compose([
        # T.Resize(image_size, interpolation=T.InterpolationMode.BICUBIC),
		T.Resize(image_size,  interpolation=T.InterpolationMode.BILINEAR),
        
        T.ToTensor(),
        T.Normalize(mean=MEAN, std=STD)
    ])

In this project, we provide for each benchmark (or test dataset) a Dataset Class that encapsulates images sequentially as follows: 

$[R_1, R_2, ..., R_n, Q_1, Q_2, ..., Q_m]$ where $R_i$ are the reference images and $Q_i$ are the queries. We keep the number of references and queries as variables in the object so that we can split into references/queries later when evaluating. We also store a ground_truth matrix that indicates which references are prositives for each query.

**Note:** make sure that for every [BenchmarkClass].py, the global variable DATASET_ROOT (where each dataset images are located) is well initialized, otherwise you won't be able to run the following steps. Also, GT_ROOT is the location of the precomputed ground_truth and filenames that WE PROVIDED (by default in ../datasets/).

In [4]:
from dataloaders.val.CrossSeasonDataset import CrossSeasonDataset
from dataloaders.val.EssexDataset import EssexDataset
from dataloaders.val.InriaDataset import InriaDataset
from dataloaders.val.NordlandDataset import NordlandDataset
from dataloaders.val.SPEDDataset import SPEDDataset
from dataloaders.val.MapillaryDataset import MSLS
from dataloaders.val.PittsburghDataset import PittsburghDataset



def get_val_dataset(dataset_name, input_transform=input_transform()):
    dataset_name = dataset_name.lower()
    
    if 'cross' in dataset_name:
        ds = CrossSeasonDataset(input_transform = input_transform)
    
    elif 'essex' in dataset_name:
        ds = EssexDataset(input_transform = input_transform)
    
    elif 'inria' in dataset_name:    
        ds = InriaDataset(input_transform = input_transform)
    
    elif 'nordland' in dataset_name:    
        ds = NordlandDataset(input_transform = input_transform)
    
    elif 'sped' in dataset_name:
        ds = SPEDDataset(input_transform = input_transform)
    
    elif 'msls' in dataset_name:
        ds = MSLS(input_transform = input_transform)

    elif 'pitts' in dataset_name:
        ds = PittsburghDataset(which_ds=dataset_name, input_transform = input_transform)
    else:
        raise ValueError
    
    num_references = ds.num_references
    num_queries = ds.num_queries
    ground_truth = ds.ground_truth
    return ds, num_references, num_queries, ground_truth
    

We define a function to which we give a model, a dataloader and it returns the resulting representations

In [5]:
def get_descriptors(model, dataloader, device):
    descriptors = []
    with torch.no_grad():
        for batch in tqdm(dataloader, 'Calculating descritptors...'):
            imgs, labels = batch
            output = model(imgs.to(device)).cpu()
            descriptors.append(output)

    return torch.cat(descriptors)

Let's now load a pre-trained model

In [9]:
from main import VPRModel

# define which device you'd like run experiments on (cuda:0 if you only have one gpu)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = VPRModel(backbone_arch='resnet50', 
                 layers_to_crop=[],
                 agg_arch='ConvAP',
                 agg_config={'in_channels': 2048,
                            'out_channels': 512,
                            's1' : 2,
                            's2' : 2},
        )


state_dict = torch.load('../LOGS/best_models/resnet50_ConvAP_512_2x2.ckpt') # link to the trained weights
model.load_state_dict(state_dict)
# model.load_state_dict(state_dict['state_dict'])
model.eval()
model = model.to(device)

## Running validation on one of the benchmarks

In [10]:
# all_datasets = ['CrossSeason' ,'Essex' ,'Inria' ,'Nordland' ,'SPED' ,'MSLS']
val_dataset_name = 'CrossSeason'
batch_size = 40

val_dataset, num_references, num_queries, ground_truth = get_val_dataset(val_dataset_name)
val_loader = DataLoader(val_dataset, num_workers=4, batch_size=batch_size)

descriptors = get_descriptors(model, val_loader, device)
print(f'Descriptor dimension {descriptors.shape[1]}')

# now we split into references and queries
r_list = descriptors[ : num_references].cpu()
q_list = descriptors[num_references : ].cpu()
recalls_dict, preds = get_validation_recalls(r_list=r_list,
                                    q_list=q_list,
                                    k_values=[1, 5, 10, 15, 20, 25],
                                    gt=ground_truth,
                                    print_results=True,
                                    dataset_name=val_dataset_name,
                                    )

Calculating descritptors...:   0%|          | 0/10 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------------+
|                   Performance on CrossSeason                   |
+----------+--------+--------+--------+--------+--------+--------+
|    K     |   1    |   5    |   10   |   15   |   20   |   25   |
+----------+--------+--------+--------+--------+--------+--------+
| Recall@K | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
+----------+--------+--------+--------+--------+--------+--------+


## Evaluating on all benchmarks

In [11]:
val_dataset_names = ['CrossSeason' ,'Essex' ,'Inria', 'MSLS', 'SPED', 'Nordland', 'pitts30k_test', 'pitts250k_test']
batch_size = 40

for val_name in val_dataset_names:
    val_dataset, num_references, num_queries, ground_truth = get_val_dataset(val_name)
    val_loader = DataLoader(val_dataset, num_workers=4, batch_size=batch_size)
    print(f'Evaluating on {val_name}')
    descriptors = get_descriptors(model, val_loader, device)
    
    print(f'Descriptor dimension {descriptors.shape[1]}')
    r_list = descriptors[ : num_references]
    q_list = descriptors[num_references : ]

    recalls_dict, preds = get_validation_recalls(r_list=r_list,
                                                q_list=q_list,
                                                k_values=[1, 5, 10, 15, 20, 25],
                                                gt=ground_truth,
                                                print_results=True,
                                                dataset_name=val_name,
                                                faiss_gpu=False
                                                )
    del descriptors
    print('========> DONE!\n\n')

Evaluating on CrossSeason


Calculating descritptors...:   0%|          | 0/10 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------------+
|                   Performance on CrossSeason                   |
+----------+--------+--------+--------+--------+--------+--------+
|    K     |   1    |   5    |   10   |   15   |   20   |   25   |
+----------+--------+--------+--------+--------+--------+--------+
| Recall@K | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
+----------+--------+--------+--------+--------+--------+--------+


Evaluating on Essex


Calculating descritptors...:   0%|          | 0/11 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------+
|                   Performance on Essex                   |
+----------+-------+-------+-------+-------+-------+-------+
|    K     |   1   |   5   |   10  |   15  |   20  |   25  |
+----------+-------+-------+-------+-------+-------+-------+
| Recall@K | 74.29 | 93.33 | 97.14 | 98.57 | 99.05 | 99.52 |
+----------+-------+-------+-------+-------+-------+-------+


Evaluating on Inria


Calculating descritptors...:   0%|          | 0/21 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------+
|                   Performance on Inria                   |
+----------+-------+-------+-------+-------+-------+-------+
|    K     |   1   |   5   |   10  |   15  |   20  |   25  |
+----------+-------+-------+-------+-------+-------+-------+
| Recall@K | 81.67 | 91.00 | 93.00 | 93.00 | 93.67 | 94.33 |
+----------+-------+-------+-------+-------+-------+-------+


Evaluating on MSLS


Calculating descritptors...:   0%|          | 0/491 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------+
|                   Performance on MSLS                    |
+----------+-------+-------+-------+-------+-------+-------+
|    K     |   1   |   5   |   10  |   15  |   20  |   25  |
+----------+-------+-------+-------+-------+-------+-------+
| Recall@K | 83.38 | 90.27 | 92.16 | 93.65 | 93.78 | 94.05 |
+----------+-------+-------+-------+-------+-------+-------+


Evaluating on SPED


Calculating descritptors...:   0%|          | 0/31 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------+
|                   Performance on SPED                    |
+----------+-------+-------+-------+-------+-------+-------+
|    K     |   1   |   5   |   10  |   15  |   20  |   25  |
+----------+-------+-------+-------+-------+-------+-------+
| Recall@K | 81.55 | 90.77 | 93.41 | 94.73 | 96.05 | 96.38 |
+----------+-------+-------+-------+-------+-------+-------+


Evaluating on Nordland


Calculating descritptors...:   0%|          | 0/759 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------+
|                 Performance on Nordland                  |
+----------+-------+-------+-------+-------+-------+-------+
|    K     |   1   |   5   |   10  |   15  |   20  |   25  |
+----------+-------+-------+-------+-------+-------+-------+
| Recall@K | 40.25 | 56.63 | 63.19 | 66.78 | 70.36 | 72.14 |
+----------+-------+-------+-------+-------+-------+-------+


Evaluating on pitts30k_test


Calculating descritptors...:   0%|          | 0/421 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------+
|               Performance on pitts30k_test               |
+----------+-------+-------+-------+-------+-------+-------+
|    K     |   1   |   5   |   10  |   15  |   20  |   25  |
+----------+-------+-------+-------+-------+-------+-------+
| Recall@K | 90.57 | 95.13 | 96.21 | 96.83 | 97.20 | 97.52 |
+----------+-------+-------+-------+-------+-------+-------+


Evaluating on pitts250k_test


Calculating descritptors...:   0%|          | 0/2306 [00:00<?, ?it/s]

Descriptor dimension 2048


+----------------------------------------------------------+
|              Performance on pitts250k_test               |
+----------+-------+-------+-------+-------+-------+-------+
|    K     |   1   |   5   |   10  |   15  |   20  |   25  |
+----------+-------+-------+-------+-------+-------+-------+
| Recall@K | 92.34 | 97.49 | 98.43 | 98.79 | 98.95 | 99.13 |
+----------+-------+-------+-------+-------+-------+-------+


