vpr bench error #8

YznMur · 2023-09-06T08:05:06Z

Hi, thanks for your great work.
I am trying to run dino_v2_global_vpr.py from scripts with my own dataset (database + queries ). but faced error with generate_positives_and_utms function from datasets_ws.py.it asks for ground_truth_new.npy.

AnyLoc/dvgl_benchmark/datasets_ws.py

Line 113 in 063c75d

join(self.dataset_folder,'ground_truth_new.npy'))

Would u plz, tell me how to get/generate this file from my dataset and what it should contain?

The text was updated successfully, but these errors were encountered:

TheProjectsGuy · 2023-09-09T07:56:44Z

Hey @YznMur

Thanks for taking an interest in our work. I think you're trying to use our codebase on your (custom) dataset. There are two ways for this, and I'll align my response on both. I (hopefully) answer your question more directly in the end, I start with what I recommend.

Method 1: Global Descriptors (Preferred)

Here, you use our method to extract global descriptors. This is preferred since it can be a drop-in replacement for many existing pipelines (that do image retrieval based on global descriptors).
You can use our Colab Demo to do this without any resources from your end (see python code in the demo folder). You can change the folders (from ./data/CityCenter) to use your own dataset (or download the demo code and play with it).

Once you have global descriptors, you can use any method to do retrievals for a particular query. It's common to use faiss for such things.

Method 2: Full NN Search Pipiline using our codebase

This is using the same nearest-neighbor search method that we use. There are two ways to do this

Way 1: Use function (suggested)

I'd suggest using the get_top_k_recall function from utilities. The function is described here. It takes global descriptors and ground truth (for calculating recall).

Way 2: Use full codebase

You could use our full codebase (files like dino_v2_global_vpr.py). I guess this is what you're trying to do. The ground_truth_new.npy file is created by us (specifically for each dataset). This file is specific for each setting.

Example 1: Say you have a dataset (GPS geotagged) containing database and query images. You want a localization radius (the database images to be retrieved for a particular query) of 2m. You must generate a .npy file for this specific setting.

Example 2: Say you have a dataset with images captured in two sequences (query and database traversal) where you know that the indices approximately correspond. You decide that for query i, database images in i-4 to i+4 are a good retrieval (localization radius is 4 images). You must generate a .npy file for this specific setting.

Using this way for this method is not suggested since this leads to many .npy files (each with specific parameters you use for creating a ground truth) for each type of dataset.

Also, datasets originally from dvgl_benchmark don't need this file (like Pitts30k and St. Lucia); their ground truth is calculated on the fly using knn retrieval from file names. We add a modification that reads ground truth from a .npy file, to add compatibility with VPR-bench datasets (like 17places).

Numpy Array for VPR-Bench datasets

For a dataset from VPR-Bench, like 17places, the file reads as follows

# Assuming you're in the dataset directory
import numpy as np
gt = np.load("ground_truth_new.npy", allow_pickle=True)
# Need allow_pickle=True since it has objects
print(gt.shape)    # (406, 2)
# Here's how a random row (say 15) in this should look like
print(gt[15])    # [15 list([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])]
print(gt[1])      # [1 list([0, 1, 2, 3, 4, 5, 6])]

Note that each row (signifying a query) has two columns. First column is the row number itself (we don't use this 🙈). The second column is the list of indices of database images that correspond to this query. We use this for getting positives (negatives are anything outside). This is like example 2 above, with localization radius of 5 images (clipped to the limits of database images, that's why gt[1] has [0, _1_, 2, 3, 4, 5, 6]).

Note that 17places has the following directory structure

./17places
├── [ 14K]  ground_truth_new.npy
├── [ 13K]  my_ground_truth_new.npy
├── [ 12K]  query [406 entries exceeds filelimit, not opening dir]
├── [ 514]  ReadMe.txt
└── [ 12K]  ref [406 entries exceeds filelimit, not opening dir]

The image names in the ref (database) and query folders should be like 0.jpg, 1.jpg, etc. for this to work. You can download the 17places dataset from our public material (Datasets-All folder) to fully understand this.

Numpy array for other datasets

If you want to use a custom dataset and define a numpy array for it (I suggest not doing this for quick testing), you can look at how datasets are structured in custom_datasets. For example, the Eiffel (Sub-Atlantic Ridge) dataset is in eiffel_dataloader.py. A similar loading method can be found in this file as well

AnyLoc/custom_datasets/eiffel_dataloader.py

Lines 118 to 120 in 063c75d

    
           self.gt_positives = np.load(os.path.join(self.datasets_folder,self.dataset_name,"eiffel_gt.npy"),allow_pickle=True)[101:] #returns dictionary of gardens dataset

You'll have to create a Python file along similar lines if you want to use this method and way (method 2, way 2).

TheProjectsGuy · 2023-09-09T07:57:26Z

@YznMur let me know if this addresses your concerns.

Nik-V9 · 2023-09-16T21:02:18Z

Closing due to inactivity. Avneesh has addressed the concern.

TheProjectsGuy added good first issue Good for newcomers and removed good first issue Good for newcomers labels Sep 9, 2023

Nik-V9 closed this as completed Sep 16, 2023

This was referenced Oct 18, 2023

Minor issues for the next release #7

Open

Question about VP_AIR Dataset and Table 4 results. #5

Closed

TheProjectsGuy mentioned this issue Nov 15, 2023

About GT.npy of the datasets in the paper #17

Closed

TheProjectsGuy mentioned this issue Dec 5, 2023

vlad cluster center for custom dataset #22

Closed

TheProjectsGuy added the documentation Improvements or additions to documentation label Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vpr bench error #8

vpr bench error #8

YznMur commented Sep 6, 2023

TheProjectsGuy commented Sep 9, 2023

TheProjectsGuy commented Sep 9, 2023

Nik-V9 commented Sep 16, 2023

vpr bench error #8

vpr bench error #8

Comments

YznMur commented Sep 6, 2023

TheProjectsGuy commented Sep 9, 2023

Method 1: Global Descriptors (Preferred)

Method 2: Full NN Search Pipiline using our codebase

Way 1: Use function (suggested)

Way 2: Use full codebase

Numpy Array for VPR-Bench datasets

Numpy array for other datasets

TheProjectsGuy commented Sep 9, 2023

Nik-V9 commented Sep 16, 2023