Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout support to AbstractVectorSimilarityQuery #13285

Merged
merged 2 commits into from
Aug 6, 2024

Conversation

kaivalnp
Copy link
Contributor

@kaivalnp kaivalnp commented Apr 9, 2024

Description

Along similar lines of #13202, adding timeout support for AbstractVectorSimilarityQuery which performs similarity-based vector searches

While the graph search happens inside #scorer, it may go over the configured QueryTimeout and we can early terminate it to return whatever partial results are found..

One inherent benefit we have for exact search is that we return a lazy-loading iterator over all vectors, so this is inherently covered by the TimeLimitingBulkScorer (as opposed to exact search of AbstractKnnVectorQuery which manually goes over all vectors to retain the topK during #rewrite)

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Apr 24, 2024
@benwtrent
Copy link
Member

This seems sane to me.

@vigyasharma what do you think?

@benwtrent
Copy link
Member

@kaivalnp could you update CHANGES as well?

@kaivalnp
Copy link
Contributor Author

Thanks @benwtrent! Added an entry now..

@github-actions github-actions bot removed the Stale label Apr 25, 2024
@kaivalnp
Copy link
Contributor Author

kaivalnp commented May 9, 2024

Saw some merge conflicts after a recent commit and resolved those..

@kaivalnp
Copy link
Contributor Author

Hi @benwtrent @vigyasharma could you help review this? Thanks!

// Return a lazy-loading iterator
return VectorSimilarityScorer.fromAcceptDocs(
this,
boost,
createVectorScorer(context),
new BitSetIterator(acceptDocs, cardinality),
resultSimilarity);
} else if (results.scoreDocs.length == 0) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't return null any more whenm there are 0 results?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh never mind I see this got moved to VectorSimilarityScorer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was common in a couple of places so I moved it there to reduce repetition

@@ -105,13 +116,16 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
LeafReader leafReader = context.reader();
Bits liveDocs = leafReader.getLiveDocs();

QueryTimeout queryTimeout = searcher.getTimeout();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm what if there is no timeout? will queryTimeout be null? In that case do we still want to create a TimeLimitingKnnCollectorManager?

Copy link
Contributor Author

@kaivalnp kaivalnp May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will queryTimeout be null?

Yes, this is null when a timeout isn't set

In this case, the TimeLimitingKnnCollectorManager returns an unwrapped KnnCollector which does not add overhead of time checking (even null checks) during graph search (also visible in benchmarks)

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@kaivalnp
Copy link
Contributor Author

Summary of latest changes:

  1. Resolved merge conflicts
  2. Moved CHANGES.txt entry from 9.11 -> 9.12 since the prior is now released
  3. #Scorer is now final and not overrideable, changed VectorSimilarityScorer -> VectorSimilarityScorerSupplier

@github-actions github-actions bot removed the Stale label Jun 11, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jun 26, 2024
Copy link
Contributor

@dungba88 dungba88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. I just left some minor comments/questions

final Scorer vectorSimilarityScorer;

QueryTimeout queryTimeout = searcher.getTimeout();
TimeLimitingKnnCollectorManager timeLimitingKnnCollectorManager =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we share this variable for all segments? Such as creating it at top-level variable in createWeight?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, we can reduce unnecessary object creation. I'll update in the next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return VectorSimilarityScorerSupplier.fromScoreDocs(boost, results.scoreDocs);
} else {
// Return a lazy-loading iterator
return VectorSimilarityScorerSupplier.fromAcceptDocs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be a waste that we can't reuse the results from the approximate search (I also saw similar behavior in top-k KnnVectorQuery).

Maybe we can pass the partial results to this method, and we don't need to compute score for those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried to explore this in #12820, but the cost seemed to outweigh the benefit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It's interesting that we have tried that already.

@github-actions github-actions bot removed the Stale label Jul 25, 2024
Copy link
Contributor

@vigyasharma vigyasharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I somehow missed this PR. Changes look good @kaivalnp , thanks for extending timeout functionality to *VectorSimilarityQuery.

Looks like you're planning another iteration addressing this comment. We can merge after your changes.

vectorSimilarityScorer =
VectorSimilarityScorer.fromScoreDocs(this, boost, results.scoreDocs);
return VectorSimilarityScorerSupplier.fromScoreDocs(boost, results.scoreDocs);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also have a test for this case, where we exhaust the filter before hitting timeout? I guess testFilterWithNoMatches() tests it but only for null QueryTimeout values? Do we need one for non-null timeouts as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I've added tests to check for a filter + non-null timeout

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Aug 5, 2024

There was a conflict in CHANGES.txt after a recent commit, merged from main and resolved that
@vigyasharma I've tried to address all open comments, please let me know if something is missing

Comment on lines +529 to +534
searcher.setTimeout(
new CountingQueryTimeout(numFiltered - 1)); // Timeout before scoring all filtered docs
int filteredCount = searcher.count(filteredQuery);
assertTrue(
"0 < filteredCount=" + filteredCount + " < numFiltered=" + numFiltered,
filteredCount > 0 && filteredCount < numFiltered); // Expect partial results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this tests for cases where we timeout before exhausting the filter, nice!

@vigyasharma vigyasharma merged commit e0e5d81 into apache:main Aug 6, 2024
3 checks passed
@kaivalnp kaivalnp deleted the timeout branch August 6, 2024 05:56
@kaivalnp
Copy link
Contributor Author

kaivalnp commented Aug 6, 2024

Thank you @vigyasharma!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants