-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HNSW graph visitation limit bug #12413
Conversation
Intuitively, it sounds like a good approach to me to not take live docs into account to find good entry points, as there could be nodes that are good entry points even though they might be marked as deleted or not match the filter? Should we consider never exiting before hitting the zero-th level instead? |
🤔 The idea is that if we cannot even get to the zeroth level before hitting the visitation limit, we shouldn't even bother going through the graph structure anymore as it will likely be slower than an exact match. I think exiting with nothing and indicating incomplete makes the most sense to me as it gives clear indication that searching the graph structure shouldn't be done. If we end early on the zeroth lauer we could give a "knn" that is nowhere near the actual kNN because we exited early on the zero-th layer. I don't know why that would be any better than exiting before reaching that layer. |
Sorry, I commented too quickly, before understanding what your change was doing, I thought it was ignoring filtered out ords on levels > 0 at first. Your change makes sense to me now. |
…inding entry point
OK, I reverted my minor optimizations and moved the method to be more inline with what Lucene did before. Now I am getting exactly the same recall and the weird bug is fixed where we return partial results that may or may not contain documents not within the acceptableOrd set. I still kept it a unique method to be perfectly clear what this method is doing. |
visitedCount++; | ||
if (friendSimilarity >= minAcceptedSimilarity) { | ||
candidates.add(friendOrd, friendSimilarity); | ||
if (results.insertWithOverflow(friendOrd, friendSimilarity) && results.size() >= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems a little odd to preserve the ceremony of adding to a priority queue that will always be of length 1, although I suppose this preserves the idea of length > 1? Maybe we would want to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visitedLimit -= results.visitedCount(); | ||
|
||
if (results.incomplete()) { | ||
results.setVisitedCount(numVisited); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to discard results here? There will never be more than one, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msokolov found my bug 🤦 in the simplified version. I updated, removed the need for tracking candidates & results since we only care about the best found entry point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what bug I found, but this LGTM
foundBetter = true; | ||
visited.set(currentEp); | ||
// Keep searching the given level until we stop finding a better candidate entry point | ||
while (foundBetter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: with do/while you could avoid setting foundBetter = true above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do/while is indeed valid. But daggum, it confuses the crap out of me, wish Java didn't have it. If you don't mind, I will keep the while
loop. I don't particularly mind if somebody changes it later.
Commas are important! Just meant to tell you I found my own bug. Not that YOU found my bug. |
We have some weird behavior in HNSW searcher when finding the candidate entry point for the zeroth layer. While trying to find the best entry point to gather the full candidate set, we don't filter based on the acceptableOrds bitset. Consequently, if we exit the search early (before hitting the zeroth layer), the results that are returned may contain documents NOT within that bitset. Luckily since the results are marked as incomplete, the *VectorQuery logic switches back to an exact scan and throws away the results. However, if any user called the leaf searcher directly, bypassing the query, they could run into this bug.
We have some weird behavior in HNSW searcher when finding the candidate entry point for the zeroth layer.
While trying to find the best entry point to gather the full candidate set, we don't filter based on the acceptableOrds bitset. Consequently, if we exit the search early (before hitting the zeroth layer), the results that are returned may contain documents NOT within that bitset.
Luckily since the results are marked as incomplete, the
*VectorQuery
logic switches back to an exact scan and throws away the results.However, if any user called the leaf searcher directly, bypassing the query, they could run into this bug.