Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the HitQueue size more appropriate for KNN exact search #13184

Merged
merged 2 commits into from
Mar 19, 2024

Conversation

bugmakerrrrrr
Copy link
Contributor

Description

Currently, when performing KNN exact search, we consistently set the HitQueue size to k. However, there may be instances where the number of candidates is actually lower than k.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a sane optimization. Mind adding a CHANGES entry under optimizations?

@@ -98,7 +98,8 @@ protected TopDocs exactSearch(LeafReaderContext context, DocIdSetIterator accept
parentBitSet,
query,
fi.getVectorSimilarityFunction());
HitQueue queue = new HitQueue(k, true);
final int queueSize = Math.min(k, Math.toIntExact(acceptIterator.cost()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know a better way, but, since this diversifies over parent doc ids, its possible that the hitqueue is still much smaller than acceptIterator.cost() as acceptIterator.cost() is the iterator over CHILD docs (e.g. passage vector docs). I think any further optimization (e.g. counting the number of relevant parents) would add undo overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@benwtrent benwtrent merged commit 7a08eea into apache:main Mar 19, 2024
3 checks passed
benwtrent pushed a commit that referenced this pull request Mar 19, 2024
Currently, when performing KNN exact search, we consistently set the HitQueue size to k. However, there may be instances where the number of candidates is actually lower than k.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants