Ahmad Khaliq, Ming Xu, Stephen Hausler, Michael Milford, Sourav Garg
This repository provides the implementation of VLAD-BuFF, a novel approach to Visual Place Recognition (VPR). VPR plays a crucial role in many visual localization tasks and is often framed as an image retrieval problem. While state-of-the-art methods rely on VLAD aggregation to weigh feature contributions, they face limitations such as the 'burstiness' problem (over-representation of repetitive structures) and the high computational cost of feature-to-cluster comparisons.
VLAD-BuFF addresses these challenges through two key innovations:
- A self-similarity-based feature discounting mechanism that mitigates burstiness by learning burst-aware features during VPR training.
- A fast feature aggregation technique using PCA-initialized, learnable pre-projection to reduce local feature dimensions without sacrificing performance.
Benchmark results on nine public datasets demonstrate that VLAD-BuFF achieves state-of-the-art performance while maintaining high recall, even with significantly reduced feature dimensions. This enables faster aggregation and improved computational efficiency.
For more details, refer to the paper at arXiv.
The code has been tested on PyTorch 2.1.0 with CUDA 12.1 and Xformers. To create a ready-to-run environment, use the following command:
conda env create -f environment.yml
You can easily load and test our VLAD-BuFF model via Torch Hub with just a few lines of code:
import torch
model = torch.hub.load("Ahmedest61/VLAD-BuFF", "vlad_buff", antiburst=True, nv_pca=192, wpca=True, num_pcs=4096)
model.eval()
model.cuda()
For training, download the GSV-Cities dataset. For evaluation download the desired datasets (MSLS, NordLand, SPED, Pittsburgh, Sfsm, Toyko247, StLucia, Baidu and AmsterTime)
Training is done on GSV-Cities dataset for 4 complete epochs. To train VLAD-BuFF or 192PrePool VLAD-BuFF, run the following commands:
python train.py --aggregation NETVLAD --expName dnv2_NV_AB --antiburst --no_wandb
python train.py --aggregation NETVLAD --expName dnv2_NV_192PCA_AB --antiburst --nv_pca 192 --no_wandb
Logs and checkpoints will be saved in the logs directory after training.
To add the PCA whitening layer, use the following commands:
python add_pca.py --aggregation NETVLAD --expName dnv2_NV_AB --ckpt_state_dict --num_pcs 8192 --resume_train ./logs/lightning_logs/version_0/checkpoints/last.ckpt --antiburst
python add_pca.py --aggregation NETVLAD --expName dnv2_NV_192PCA_AB --ckpt_state_dict --num_pcs 4096 --nv_pca 192 --resume_train ./logs/lightning_logs/version_1/checkpoints/last.ckpt --antiburst
To evaluate the models, run:
python eval.py --aggregation NETVLAD --wpca --num_pcs 8192 --antiburst --ckpt_state_dict --val_datasets MSLS --expName dnv2_NV_AB --resume_train ./logs/lightning_logs/version_0/checkpoints/dnv2_NV_AB_wpca8192_last.ckpt --store_eval_output --save_dir ./logs/lightning_logs/version_0/ --no_wandb
python eval.py --aggregation NETVLAD --nv_pca 192 --wpca --num_pcs 4096 --antiburst --ckpt_state_dict --val_datasets MSLS --expName dnv2_NV_192PCA_AB --resume_train ./logs/lightning_logs/version_1/checkpoints/dnv2_NV_192PCA_AB_wpca4096_last.ckpt --store_eval_output --save_dir ./logs/lightning_logs/version_1/ --no_wandb
You can also download the pretrained VLAD-BuFF models from here.
MSLS Val | NordLand | Pitts250k-t | SPED | SFSM | Tokyo247 | StLucia | AmsterTime | Baidu | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 |
92.4 | 95.8 | 78.0 | 90.4 | 95.6 | 98.7 | 92.8 | 96.2 | 88.3 | 91.0 | 96.5 | 98.1 | 100 | 100 | 61.7 | 81.9 | 77.5 | 87.9 |
MSLS Val | NordLand | Pitts250k-t | SPED | SFSM | Tokyo247 | StLucia | AmsterTime | Baidu | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 |
91.9 | 95.9 | 71.4 | 86.3 | 95.0 | 98.2 | 90.9 | 96.0 | 87.3 | 90.1 | 97.5 | 98.4 | 99.9 | 100 | 59.2 | 78.7 | 74.3 | 86.6 |
To perform analysis, use the following scripts:
python predictions.py --dataset_name MSLS --your_method_path ./logs/lightning_logs/dnv2_NV_AB/wpca8192_last.ckpt_MSLS_predictions.npz --baseline_paths ./logs/lightning_logs/dnv2_NV_192PCA_AB/wpca8192_last.ckpt_MSLS_predictions.npz
python cluster_analysis.py --dataset_name MSLS --method_our dnv2_NV_AB --baseline_name dnv2_NV_192PCA_AB --your_method_path ./logs/lightning_logs/dnv2_NV_AB/wpca8192_last.ckpt_MSLS_predictions.npz --baseline_path ./logs/lightning_logs/dnv2_NV_192PCA_AB/wpca8192_last.ckpt_MSLS_predictions.npz
If you find our work valuable for your research, please consider citing our paper:
@inproceedings{khaliq2024vlad,
title={Vlad-buff: Burst-aware fast feature aggregation for visual place recognition},
author={Khaliq, Ahmad and Xu, Ming and Hausler, Stephen and Milford, Michael and Garg, Sourav},
booktitle={European Conference on Computer Vision. Springer},
volume={3},
number={4},
pages={8},
year={2024}
}
This code is built upon the following work: