Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-similarity mining on Pittsburgh30k training set #22

Open
Anuradha-Uggi opened this issue May 26, 2023 · 6 comments
Open

Multi-similarity mining on Pittsburgh30k training set #22

Anuradha-Uggi opened this issue May 26, 2023 · 6 comments

Comments

@Anuradha-Uggi
Copy link

Anuradha-Uggi commented May 26, 2023

Hello Amar.

MixVPR is an amazing job. We are trying to train it on the pittsburgh30k dataset to compare it with our approaches. How does Multi-similarity mining work on the Pittsburgh 30k training set, where data loading gives a list of database+queries? Could you please clarify how to load the Pittsburgh training samples such that the triplets are mined error-free? I ran the Pittsburgh data code that you provided and made a few modifications. The main.py is running bug free but it's giving loss=0 and acc=1 for all epochs.

Many thanks
Anu

@amaralibey
Copy link
Owner

Hello @Anuradha-Uggi,

We have discussed the loss function and online mining strategies in another paper, which can be found at (https://github.com/amaralibey/gsv-cities). The motivation behind collecting the GSV-Cities dataset was the lack of precise ground truth in existing datasets.

The MS-mining strategy involves dynamically mining hard positive pairs and hard negative pairs during the training process, specifically at the loss level. This strategy requires precise labels, where each image in the batch must have an ID. However, this label requirement is not applicable to the Pittsburgh dataset.

In the Pittsburgh dataset, there are queries and their POTENTIAL positives, meaning that most of these POTENTIAL images do not actually correspond to the same location as the query (which is why the authors use weak supervision, which uses the easiest positives, to guarantee that the positive image represent the same location as the query). Consequently, the presence of these potential images does not allow for effective mining of hard positives.

The provided code is not designed to train on the Pittsburgh dataset due to specific requirements for the batch size and labels. The expected batch size format is (P, K, C, H, W), where P represents the number of places and K represents the number of images per place. Additionally, each image needs to have a corresponding label or ID.

The results you are obtaining are a direct consequence of the modifications you have made to the code. Based on my understanding, it seems that you might be assigning different IDs to each image within the batch. As a result, the online miner is unable to find any positive pairs, leading to an empty list. This indicates that there are no informative pairs present in the batch, resulting in a 1.0 accuracy (indicating the absence of hard pairs). Consequently, the loss function receives zero pairs and generates a loss value of zero.

To summarize, the Multi Similarity miner cannot be directly utilized without significant modifications to its core functionality when working with the Pittsburgh dataset.

@Anuradha-Uggi
Copy link
Author

Hi. Thanks for the explanation. If I understood correctly in GSV-cities, we know that the samples belonging to the same place are positives and they form negatives with the samples from other places. The GSV-cities already know which are positives and which are negatives through assigning ids for the images. I think you use MS-mining to further refine the triplets/pairs to based on the extent of positivesness and negativeness through thresholding.

If I want to modify it for pitts30k training, how should we modify the code?

One thing I think of is to replace the NetVLAD layer in https://github.com/Nanne/pytorch-NetVlad with MixVPR model and the corresponding loss functions as well. Do you think it works?

@amaralibey
Copy link
Owner

It is challenging to establish a definitive way for determining the positive images for each query in Pitts30k. While the potential positive images are those within a 10-meter radius, there is no information available regarding their azimuth (heading or orientation). As a result, a significant portion of these potential positives may correspond to entirely different locations than the query itself. In light of this limitation, it becomes difficult to mine hard positives in the presence of all the false positives.

An alternative approach you can consider is modifying the MSLoss code to mine for the easiest positives while retaining the hard negatives (https://github.com/msight-tech/research-ms-loss/blob/master/ret_benchmark/losses/multi_similarity_loss.py). The positive pair mining is done at line 39 pos_pair = pos_pair_[pos_pair_ - self.margin < max(neg_pair_)] Notice that mining the positive depends on the value of the hardest negative (max(neg_pair)).

You'll need to mine for the easiest positives (instead of the hardest) all while taking into account the similarity of the hardest negatives. I suggest you thouroughly read MSLoss paper at this [LINK](https://github. com/MalongTech/research-ms-loss) before doing so.

For the NetVLAD layer, you can use the implementation you just mentionned, that's the one most researchers use when training on Pitts30k using PyTorch.

Good luck,

@Anuradha-Uggi
Copy link
Author

The main problem is as you said there is no obvious way to find the labels for the pitts30k dataset. We may have to rely on the UTMs provided. You can correct me here.

We ran the baseline MixVPR 4096 model with backbone resnet50 on Nordland. It gives R@1 76%. Whereas, you mentioned it as 58% in your paper. The test dataset I used has 27592 samples in the databases and the same number of queries having the seasonal (winter, summer) changes between the database and the query. Could you please confirm?

We trained the MixVPR on pitts30k with the hard ming strategy given in Nannes NetVLAD torch code. R@1 on the pitts30k test came a little down making our proposed approach a little better. But the MixVPR with pitts30k is still leading on Nordland.

@amaralibey
Copy link
Owner

@Anuradha-Uggi,

Relying solely on UTMs is not sufficient as they do not account for bearing (or orientation with respect to the North Pole). For instance, if two images are 5 meters apart based on UTMs, it does not guarantee that they depict the same location. One image could be facing north while the other might be facing south.

Regarding Nordland, we used the dataset provided by VRP-Bench (https://github.com/MubarizZaffar/VPR-Bench). I have developed a script and a notebook that explains and demonstrates how to test on it.

As for Pitts30k dataset, I have not personally attempted training on it. It is possible that your technique may yield better results than MixVPR in cases where the data is weakly labeled (e.g. Pitts30k-train or pitts250k-train).

@Anuradha-Uggi
Copy link
Author

Thats true. Solely using UTMs might force the model to learn the similarity between visually dissimilar images. Will think about the other ways.

Thanks. I will check it out.

Sure. Thank you, Amar. That was a nice discussion with you. Good luck with your research!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants