Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x8 vs x4fsr #124

Open
OmniscienceAcademy opened this issue May 8, 2022 · 2 comments
Open

x8 vs x4fsr #124

OmniscienceAcademy opened this issue May 8, 2022 · 2 comments

Comments

@OmniscienceAcademy
Copy link

INFO:autofaiss: Computing best hyperparameters for index faiss_titles.faiss 05/05/2022, 07:16:53                                                            
WARNING:autofaiss:The maximum nearest neighbors coverage is 10.65% for this index. It means that when requesting 20 nearest neighbors, the average number of retrieved neighbors will be 2. The program will try to find the best hyperparameters to reach 95% of this max coverage at least, and then will optimize the search time for this target. The index search speed could be higher than the requested max search speed.

What can we do to prevent this?

This happened with "OPQ768_768,IVF262144_HNSW32,PQ768x8" -> bad max coverage
With the index_key "OPQ768_768,IVF262144_HNSW32,PQ768x4fsr", everything was ok. The vectors were just a bit too compressed.

My d is 768.

Thank you

@victor-paltz
Copy link
Contributor

Hello!

Autofaiss will do a binary search to find the best set of hyperparameters and will set the lower exploration bound given the targeted minimum number of neighbors to retrieve (20 according to the logs).

In your case, it seems that the function estimating the output coverage returned 10.65% for the higher bound of the exploration window (see code: get_nearest_neighbors_coverage). I see several possible explanations:

  • Your index is nearly empty, and you only used the tune_index function on an existing index
  • Your index is not empty but nearly all the vectors you put inside contain null values, making them not retrievable by the search function.
  • You might created your own custom clusters and the first 6144 closest clusters (the max bound for the exploration) are empty most of the time for the first 100s vectors of the index
  • The vector used to compute the coverage (the first 100s in the index) are outliers, maybe some are not searchable because they contain null,

In order to help you more, I would need more context on the commands you used to get these results
But for now, you should try to estimate the coverage of your index with the get_nearest_neighbors_coverage function and analyze the output :)

PS: Autofaiss doesn't support the x4fsr variant of these indices
As you can see here in the code, the tuning of an "OPQ768_768,IVF262144_HNSW32,PQ768x4fsr" index would raise a NotImplementedError

@OmniscienceAcademy
Copy link
Author

Thank you for your kind help and this extensive answer.

My index is not empty, my nvectors is correct. The thing which is weird is that I only change the encoding in my script by putting x8 instead of x4fsr. I will continue to investigate on my side.

Yeah, I know that you do not support it. I forked your repo to build it, in order to modify build_index to be able to index any valid index_key even if it means that there is no hyperparameter tuning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants