Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vpr bench error #8

Closed
YznMur opened this issue Sep 6, 2023 · 3 comments
Closed

vpr bench error #8

YznMur opened this issue Sep 6, 2023 · 3 comments
Labels
documentation Improvements or additions to documentation

Comments

@YznMur
Copy link

YznMur commented Sep 6, 2023

Hi, thanks for your great work.
I am trying to run dino_v2_global_vpr.py from scripts with my own dataset (database + queries ). but faced error with generate_positives_and_utms function from datasets_ws.py.it asks for ground_truth_new.npy.

join(self.dataset_folder,'ground_truth_new.npy'))

Would u plz, tell me how to get/generate this file from my dataset and what it should contain?

@TheProjectsGuy
Copy link
Collaborator

Hey @YznMur

Thanks for taking an interest in our work. I think you're trying to use our codebase on your (custom) dataset. There are two ways for this, and I'll align my response on both. I (hopefully) answer your question more directly in the end, I start with what I recommend.

Method 1: Global Descriptors (Preferred)

Here, you use our method to extract global descriptors. This is preferred since it can be a drop-in replacement for many existing pipelines (that do image retrieval based on global descriptors).
You can use our Colab Demo to do this without any resources from your end (see python code in the demo folder). You can change the folders (from ./data/CityCenter) to use your own dataset (or download the demo code and play with it).

Once you have global descriptors, you can use any method to do retrievals for a particular query. It's common to use faiss for such things.

Method 2: Full NN Search Pipiline using our codebase

This is using the same nearest-neighbor search method that we use. There are two ways to do this

Way 1: Use function (suggested)

I'd suggest using the get_top_k_recall function from utilities. The function is described here. It takes global descriptors and ground truth (for calculating recall).

Way 2: Use full codebase

You could use our full codebase (files like dino_v2_global_vpr.py). I guess this is what you're trying to do. The ground_truth_new.npy file is created by us (specifically for each dataset). This file is specific for each setting.

Example 1: Say you have a dataset (GPS geotagged) containing database and query images. You want a localization radius (the database images to be retrieved for a particular query) of 2m. You must generate a .npy file for this specific setting.

Example 2: Say you have a dataset with images captured in two sequences (query and database traversal) where you know that the indices approximately correspond. You decide that for query i, database images in i-4 to i+4 are a good retrieval (localization radius is 4 images). You must generate a .npy file for this specific setting.

Using this way for this method is not suggested since this leads to many .npy files (each with specific parameters you use for creating a ground truth) for each type of dataset.

Also, datasets originally from dvgl_benchmark don't need this file (like Pitts30k and St. Lucia); their ground truth is calculated on the fly using knn retrieval from file names. We add a modification that reads ground truth from a .npy file, to add compatibility with VPR-bench datasets (like 17places).

Numpy Array for VPR-Bench datasets

For a dataset from VPR-Bench, like 17places, the file reads as follows

# Assuming you're in the dataset directory
import numpy as np
gt = np.load("ground_truth_new.npy", allow_pickle=True)
# Need allow_pickle=True since it has objects
print(gt.shape)    # (406, 2)
# Here's how a random row (say 15) in this should look like
print(gt[15])    # [15 list([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])]
print(gt[1])      # [1 list([0, 1, 2, 3, 4, 5, 6])]

Note that each row (signifying a query) has two columns. First column is the row number itself (we don't use this 🙈). The second column is the list of indices of database images that correspond to this query. We use this for getting positives (negatives are anything outside). This is like example 2 above, with localization radius of 5 images (clipped to the limits of database images, that's why gt[1] has [0, _1_, 2, 3, 4, 5, 6]).

Note that 17places has the following directory structure

./17places
├── [ 14K]  ground_truth_new.npy
├── [ 13K]  my_ground_truth_new.npy
├── [ 12K]  query [406 entries exceeds filelimit, not opening dir]
├── [ 514]  ReadMe.txt
└── [ 12K]  ref [406 entries exceeds filelimit, not opening dir]

The image names in the ref (database) and query folders should be like 0.jpg, 1.jpg, etc. for this to work. You can download the 17places dataset from our public material (Datasets-All folder) to fully understand this.

Numpy array for other datasets

If you want to use a custom dataset and define a numpy array for it (I suggest not doing this for quick testing), you can look at how datasets are structured in custom_datasets. For example, the Eiffel (Sub-Atlantic Ridge) dataset is in eiffel_dataloader.py. A similar loading method can be found in this file as well

self.gt_positives = np.load(os.path.join(self.datasets_folder,self.dataset_name,"eiffel_gt.npy"),allow_pickle=True)[101:] #returns dictionary of gardens dataset

You'll have to create a Python file along similar lines if you want to use this method and way (method 2, way 2).

@TheProjectsGuy
Copy link
Collaborator

@YznMur let me know if this addresses your concerns.

@TheProjectsGuy TheProjectsGuy added good first issue Good for newcomers and removed good first issue Good for newcomers labels Sep 9, 2023
@Nik-V9
Copy link
Contributor

Nik-V9 commented Sep 16, 2023

Closing due to inactivity. Avneesh has addressed the concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants