Add timeout support to AbstractVectorSimilarityQuery #13285

kaivalnp · 2024-04-09T11:41:09Z

Description

Along similar lines of #13202, adding timeout support for AbstractVectorSimilarityQuery which performs similarity-based vector searches

While the graph search happens inside #scorer, it may go over the configured QueryTimeout and we can early terminate it to return whatever partial results are found..

One inherent benefit we have for exact search is that we return a lazy-loading iterator over all vectors, so this is inherently covered by the TimeLimitingBulkScorer (as opposed to exact search of AbstractKnnVectorQuery which manually goes over all vectors to retain the topK during #rewrite)

github-actions · 2024-04-24T00:17:41Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

benwtrent · 2024-04-24T19:14:07Z

This seems sane to me.

@vigyasharma what do you think?

benwtrent · 2024-04-24T19:14:31Z

@kaivalnp could you update CHANGES as well?

kaivalnp · 2024-04-24T21:08:35Z

Thanks @benwtrent! Added an entry now..

kaivalnp · 2024-05-09T21:42:49Z

Saw some merge conflicts after a recent commit and resolved those..

kaivalnp · 2024-05-13T08:53:49Z

Hi @benwtrent @vigyasharma could you help review this? Thanks!

msokolov · 2024-05-13T21:58:39Z

lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java

          // Return a lazy-loading iterator
          return VectorSimilarityScorer.fromAcceptDocs(
              this,
              boost,
              createVectorScorer(context),
              new BitSetIterator(acceptDocs, cardinality),
              resultSimilarity);
-        } else if (results.scoreDocs.length == 0) {
-          return null;


we don't return null any more whenm there are 0 results?

oh never mind I see this got moved to VectorSimilarityScorer

Yes, it was common in a couple of places so I moved it there to reduce repetition

msokolov · 2024-05-13T21:59:54Z

lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java

@@ -105,13 +116,16 @@ public Scorer scorer(LeafReaderContext context) throws IOException {
        LeafReader leafReader = context.reader();
        Bits liveDocs = leafReader.getLiveDocs();

+        QueryTimeout queryTimeout = searcher.getTimeout();


hmm what if there is no timeout? will queryTimeout be null? In that case do we still want to create a TimeLimitingKnnCollectorManager?

will queryTimeout be null?

Yes, this is null when a timeout isn't set

In this case, the TimeLimitingKnnCollectorManager returns an unwrapped KnnCollector which does not add overhead of time checking (even null checks) during graph search (also visible in benchmarks)

github-actions · 2024-05-28T00:18:34Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

kaivalnp · 2024-06-10T13:00:47Z

Summary of latest changes:

Resolved merge conflicts
Moved CHANGES.txt entry from 9.11 -> 9.12 since the prior is now released
#Scorer is now final and not overrideable, changed VectorSimilarityScorer -> VectorSimilarityScorerSupplier

github-actions · 2024-06-26T00:19:01Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

dungba88

Thank you for this PR. I just left some minor comments/questions

dungba88 · 2024-07-24T07:39:34Z

lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java

-        final Scorer vectorSimilarityScorer;
+
+        QueryTimeout queryTimeout = searcher.getTimeout();
+        TimeLimitingKnnCollectorManager timeLimitingKnnCollectorManager =


Can we share this variable for all segments? Such as creating it at top-level variable in createWeight?

Nice catch, we can reduce unnecessary object creation. I'll update in the next commit

dungba88 · 2024-07-24T07:44:39Z

lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java

+            return VectorSimilarityScorerSupplier.fromScoreDocs(boost, results.scoreDocs);
+          } else {
+            // Return a lazy-loading iterator
+            return VectorSimilarityScorerSupplier.fromAcceptDocs(


It seems to be a waste that we can't reuse the results from the approximate search (I also saw similar behavior in top-k KnnVectorQuery).

Maybe we can pass the partial results to this method, and we don't need to compute score for those?

We tried to explore this in #12820, but the cost seemed to outweigh the benefit

Thanks! It's interesting that we have tried that already.

vigyasharma

Sorry I somehow missed this PR. Changes look good @kaivalnp , thanks for extending timeout functionality to *VectorSimilarityQuery.

Looks like you're planning another iteration addressing this comment. We can merge after your changes.

vigyasharma · 2024-07-25T18:33:41Z

lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java

-            vectorSimilarityScorer =
-                VectorSimilarityScorer.fromScoreDocs(this, boost, results.scoreDocs);
+            return VectorSimilarityScorerSupplier.fromScoreDocs(boost, results.scoreDocs);
+          } else {


Do we also have a test for this case, where we exhaust the filter before hitting timeout? I guess testFilterWithNoMatches() tests it but only for null QueryTimeout values? Do we need one for non-null timeouts as well?

Makes sense, I've added tests to check for a filter + non-null timeout

# Conflicts: # lucene/CHANGES.txt

kaivalnp · 2024-08-05T09:42:00Z

There was a conflict in CHANGES.txt after a recent commit, merged from main and resolved that
@vigyasharma I've tried to address all open comments, please let me know if something is missing

vigyasharma · 2024-08-06T00:11:59Z

lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java

+      searcher.setTimeout(
+          new CountingQueryTimeout(numFiltered - 1)); // Timeout before scoring all filtered docs
+      int filteredCount = searcher.count(filteredQuery);
+      assertTrue(
+          "0 < filteredCount=" + filteredCount + " < numFiltered=" + numFiltered,
+          filteredCount > 0 && filteredCount < numFiltered); // Expect partial results


So this tests for cases where we timeout before exhausting the filter, nice!

kaivalnp · 2024-08-06T05:56:44Z

Thank you @vigyasharma!

kaivalnp mentioned this pull request Apr 15, 2024

Add timeout support to AbstractKnnVectorQuery #13202

Merged

github-actions bot added the Stale label Apr 24, 2024

github-actions bot removed the Stale label Apr 25, 2024

msokolov reviewed May 13, 2024

View reviewed changes

github-actions bot added the Stale label May 28, 2024

kaivalnp force-pushed the timeout branch from f786f2b to aefc54d Compare June 10, 2024 12:57

github-actions bot removed the Stale label Jun 11, 2024

github-actions bot added the Stale label Jun 26, 2024

dungba88 reviewed Jul 24, 2024

View reviewed changes

github-actions bot removed the Stale label Jul 25, 2024

vigyasharma approved these changes Jul 25, 2024

View reviewed changes

Add timeout support to AbstractVectorSimilarityQuery

2f54db6

kaivalnp force-pushed the timeout branch from aefc54d to 2f54db6 Compare July 30, 2024 18:39

Merge remote-tracking branch 'origin/main' into timeout

606fc89

# Conflicts: # lucene/CHANGES.txt

vigyasharma reviewed Aug 6, 2024

View reviewed changes

vigyasharma approved these changes Aug 6, 2024

View reviewed changes

vigyasharma merged commit e0e5d81 into apache:main Aug 6, 2024
3 checks passed

kaivalnp deleted the timeout branch August 6, 2024 05:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeout support to AbstractVectorSimilarityQuery #13285

Add timeout support to AbstractVectorSimilarityQuery #13285

kaivalnp commented Apr 9, 2024

github-actions bot commented Apr 24, 2024

benwtrent commented Apr 24, 2024

benwtrent commented Apr 24, 2024

kaivalnp commented Apr 24, 2024

kaivalnp commented May 9, 2024

kaivalnp commented May 13, 2024

msokolov May 13, 2024

msokolov May 13, 2024

kaivalnp May 13, 2024

msokolov May 13, 2024

kaivalnp May 13, 2024 •

edited

Loading

github-actions bot commented May 28, 2024

kaivalnp commented Jun 10, 2024

github-actions bot commented Jun 26, 2024

dungba88 left a comment

dungba88 Jul 24, 2024

kaivalnp Jul 24, 2024

kaivalnp Jul 30, 2024

dungba88 Jul 24, 2024

kaivalnp Jul 24, 2024

dungba88 Jul 24, 2024

vigyasharma left a comment

vigyasharma Jul 25, 2024

kaivalnp Jul 30, 2024

kaivalnp commented Aug 5, 2024

vigyasharma Aug 6, 2024

kaivalnp commented Aug 6, 2024

Add timeout support to AbstractVectorSimilarityQuery #13285

Add timeout support to AbstractVectorSimilarityQuery #13285

Conversation

kaivalnp commented Apr 9, 2024

Description

github-actions bot commented Apr 24, 2024

benwtrent commented Apr 24, 2024

benwtrent commented Apr 24, 2024

kaivalnp commented Apr 24, 2024

kaivalnp commented May 9, 2024

kaivalnp commented May 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaivalnp May 13, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented May 28, 2024

kaivalnp commented Jun 10, 2024

github-actions bot commented Jun 26, 2024

dungba88 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vigyasharma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaivalnp commented Aug 5, 2024

Choose a reason for hiding this comment

kaivalnp commented Aug 6, 2024

kaivalnp May 13, 2024 •

edited

Loading