-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout support to AbstractVectorSimilarityQuery #13285
Conversation
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
This seems sane to me. @vigyasharma what do you think? |
@kaivalnp could you update CHANGES as well? |
Thanks @benwtrent! Added an entry now.. |
Saw some merge conflicts after a recent commit and resolved those.. |
Hi @benwtrent @vigyasharma could you help review this? Thanks! |
// Return a lazy-loading iterator | ||
return VectorSimilarityScorer.fromAcceptDocs( | ||
this, | ||
boost, | ||
createVectorScorer(context), | ||
new BitSetIterator(acceptDocs, cardinality), | ||
resultSimilarity); | ||
} else if (results.scoreDocs.length == 0) { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't return null
any more whenm there are 0 results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh never mind I see this got moved to VectorSimilarityScorer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was common in a couple of places so I moved it there to reduce repetition
@@ -105,13 +116,16 @@ public Scorer scorer(LeafReaderContext context) throws IOException { | |||
LeafReader leafReader = context.reader(); | |||
Bits liveDocs = leafReader.getLiveDocs(); | |||
|
|||
QueryTimeout queryTimeout = searcher.getTimeout(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm what if there is no timeout? will queryTimeout
be null? In that case do we still want to create a TimeLimitingKnnCollectorManager
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will
queryTimeout
be null?
Yes, this is null
when a timeout isn't set
In this case, the TimeLimitingKnnCollectorManager
returns an unwrapped KnnCollector
which does not add overhead of time checking (even null
checks) during graph search (also visible in benchmarks)
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Summary of latest changes:
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR. I just left some minor comments/questions
final Scorer vectorSimilarityScorer; | ||
|
||
QueryTimeout queryTimeout = searcher.getTimeout(); | ||
TimeLimitingKnnCollectorManager timeLimitingKnnCollectorManager = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we share this variable for all segments? Such as creating it at top-level variable in createWeight
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, we can reduce unnecessary object creation. I'll update in the next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return VectorSimilarityScorerSupplier.fromScoreDocs(boost, results.scoreDocs); | ||
} else { | ||
// Return a lazy-loading iterator | ||
return VectorSimilarityScorerSupplier.fromAcceptDocs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be a waste that we can't reuse the results from the approximate search (I also saw similar behavior in top-k KnnVectorQuery).
Maybe we can pass the partial results to this method, and we don't need to compute score for those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried to explore this in #12820, but the cost seemed to outweigh the benefit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It's interesting that we have tried that already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vectorSimilarityScorer = | ||
VectorSimilarityScorer.fromScoreDocs(this, boost, results.scoreDocs); | ||
return VectorSimilarityScorerSupplier.fromScoreDocs(boost, results.scoreDocs); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also have a test for this case, where we exhaust the filter before hitting timeout? I guess testFilterWithNoMatches() tests it but only for null
QueryTimeout values? Do we need one for non-null timeouts as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I've added tests to check for a filter + non-null timeout
# Conflicts: # lucene/CHANGES.txt
There was a conflict in |
searcher.setTimeout( | ||
new CountingQueryTimeout(numFiltered - 1)); // Timeout before scoring all filtered docs | ||
int filteredCount = searcher.count(filteredQuery); | ||
assertTrue( | ||
"0 < filteredCount=" + filteredCount + " < numFiltered=" + numFiltered, | ||
filteredCount > 0 && filteredCount < numFiltered); // Expect partial results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this tests for cases where we timeout before exhausting the filter, nice!
Thank you @vigyasharma! |
Description
Along similar lines of #13202, adding timeout support for
AbstractVectorSimilarityQuery
which performs similarity-based vector searchesWhile the graph search happens inside
#scorer
, it may go over the configuredQueryTimeout
and we can early terminate it to return whatever partial results are found..One inherent benefit we have for exact search is that we return a lazy-loading iterator over all vectors, so this is inherently covered by the
TimeLimitingBulkScorer
(as opposed to exact search ofAbstractKnnVectorQuery
which manually goes over all vectors to retain the topK during#rewrite
)