-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop exploring HNSW graph if scores are not getting better. #12770
Stop exploring HNSW graph if scores are not getting better. #12770
Conversation
oh, good catch! I guess equality is unusual, but this should help in some cases. I wonder if it helps with degenerate case where all scores are equal? EG all zero vectors (see #11626) |
It would help by ending exploration for sure. In this degenerate case, it is even worse than causing slowness. If there are millions of vectors we could cause an OOME as there are no barriers preventing the candidate list from expanding (we only pop one, but then could add 15 more on each neighbor explored for example). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent Thanks Ben, nice optimization
I noticed while testing lower dimensionality and quantization, we would explore the HNSW graph way too much. I was stuck figuring out why until I noticed the searcher checks for distance equality (not just if the distance is better) when exploring neighbors-of-neighbors. This seems like a bad heuristic, but to double check I looked at what nmslib does. This pointed me back to this commit: nmslib/nmslib#106 Seems like this performance hitch was discovered awhile ago :). This commit adjusts HNSW to only explore the graph layer if the distance is actually better.
I noticed while testing lower dimensionality and quantization, we would explore the HNSW graph way too much. I was stuck figuring out why until I noticed the searcher checks for distance equality (not just if the distance is better) when exploring neighbors-of-neighbors. This seems like a bad heurstic, but to double check I looked at what nmslib does. This pointed me back to this commit: nmslib/nmslib#106
Seems like this performance hitch was discovered awhile ago :).
This commit adjusts HNSW to only explore the graph layer if the distance is actually better.