Skip to content

Conversation

jtibshirani
Copy link
Member

@jtibshirani jtibshirani commented Feb 13, 2023

The helper class DocAndScoreQuery implements advanceShallow to help skip
non-competitive documents. This method doesn't actually keep track of where it
has advanced, so it might do extra work on each call.

Overall the complexity here didn't seem worth it, given the low cost of
collecting matching kNN docs. This PR switches to a simple approach, which uses
a fixed upper bound on the max score. This is low overhead, while still
allowing for skipping in some cases.

The helper class DocAndScoreQuery implements advanceShallow to help skip
non-competitive documents. This method doesn't actually keep track of where it
has advanced, which means it can do extra work.

Overall the complexity here didn't seem worth it, given the low cost of
collecting matching kNN docs. This PR switches to a simpler approach, which uses
a fixed upper bound on the max score.
@jtibshirani
Copy link
Member Author

The context: I've been testing out an AI code assistant, and I asked it if there were any bugs in AbstractKnnVectorQuery. It pointed out that "'advanceShallow' should move the 'upTo' field like 'nextDoc' does" ... which didn't seem right, but made me realize this logic looked funny.

I'm not an expert in the doc skipping code, and had trouble fully understanding the method contracts. So let me know if this is off!

@jtibshirani jtibshirani marked this pull request as ready for review February 13, 2023 23:32
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your logic looks correct.

I wondered if this change could deoptimize some important cases, but I think that your intuition is true that since this query would generally matche few documents, then advance() would move far ahead and the maximum scores produced by this query would get ignored anyway because they fall outside of the window of doc IDs that is being considered (upTo in BlockMaxMaxscoreScorer).

@jtibshirani jtibshirani merged commit 8340b01 into main Feb 16, 2023
@jtibshirani jtibshirani deleted the jtibshirani/max-score branch February 16, 2023 20:04
@jtibshirani
Copy link
Member Author

Thanks for the review!

benwtrent added a commit to benwtrent/lucene that referenced this pull request Jun 30, 2025
benwtrent added a commit that referenced this pull request Jun 30, 2025
* Ensure vector queries handle advanceShallow correctly

* adding changes

* Adjusting to just be a backport of #12146
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants