Make TermStates#build concurrent #12183

shubhamvishu · 2023-03-03T15:31:18Z

Description

This PR now deals with making TermStates#build run concurrently using the IndexSearcher's executor .

Old idea/description :

This change tries to make some queries with heavy rewrites make use of concurrency using the api added in #11840. In #12160 it turns out have good gains in KnnVectorQuery. This is initial rough implementation to get some early thoughts/feedback on or ideas on if this could be useful and worth pursuing for some other queries as well which are doing some heavy lifting in rewrite. This change initially tries to achieve some benefit of concurrency for FieldExistsQuery , BlendedTermQuery and CompletionQuery.There are also others also I could find which are not covered in this but maybe could benefit with concurrency.

Queries changed to have || rewrite in this PR:

FieldExistsQuery
BlendedTermQuery
CompletionQuery

Some possible candidate(s) with non-trivial rewrites ?

CommonTermsQuery
SpanNearQuery
SpanOrQuery
CoveringQuery
DisjunctionMaxQuery

Benchmarks : yet to be run

PS : As some query rewrites calls rewrites on the child/sub queries. Can we achieve this in those queries as well? So I tried to make DisjunctionMaxQuery#rewrite concurrent but that seems to get stuck on my machine when running TestDisjunctionMaxQuery unit test. Maybe thats too much parallelism as the test is randomly generating a very large disjunctive query and making that rewrite concurrent is not correct or helpful ?

rmuir · 2023-03-03T20:16:31Z

the change to BlendedTermQuery makes the code nearly unreadable. It is important to be able to understand how it is literally "blending the stats". Can we please find a way to make this feature less invasive?

lucene/core/src/java/org/apache/lucene/search/BlendedTermQuery.java

rmuir · 2023-03-03T20:21:31Z

Personally I don't think we should do this. Maybe the API needs to be rethought. I see race conditions being added in this change, now this adds potential for all kinds of crazy bugs in lucene.

I don't think any performance gain is worth that.

lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java

shubhamvishu · 2023-03-04T16:21:02Z

Thanks for the quick feedback @rmuir @uschindler !

the change to BlendedTermQuery makes the code nearly unreadable. It is important to be able to understand how it is literally "blending the stats". Can we please find a way to make this feature less invasive?

Sure, this was to get some initial feedback on whether this would make sense so haven't focussed much on those aspects. I'll try to make it more cleaner in the next revision and see how that turns out.

Personally I don't think we should do this. Maybe the API needs to be rethought. I see race conditions being added in this change, now this adds potential for all kinds of crazy bugs in lucene.
I don't think any performance gain is worth that.

Agreed. We should definitely not have any race conditions here. I'll try to address these and other suggestions. Maybe we could then evaluate if this would be beneficial and worth the possible benefits of concurrency or not.

jpountz · 2023-03-06T15:36:39Z

When we changed rewrite to take an IndexSearcher rather than an IndexReader in order to allow concurrent rewrites, my intuition was that it would eventually get used on a minority of queries that do expensive stuff in rewrite such as KnnFloatVectorQuery which needs to run a vector search. On the other hand, queries discussed here have super cheap rewrites, e.g. FieldExistsQuery runs in O(num_segments) and doesn't perform any I/O so I don't think that its rewrite should be made concurrent?

As some query rewrites calls rewrites on the child/sub queries. Can we achieve this in those queries as well?

Indeed compound queries should not rewrite their sub queries via the executor, otherwise we could have threads in the executor joining other threads from the executor, which can lead to deadlocks.

shubhamvishu · 2023-03-06T17:12:25Z

On the other hand, queries discussed here have super cheap rewrites, e.g. FieldExistsQuery runs in O(num_segments) and doesn't perform any I/O so I don't think that its rewrite should be made concurrent?

Yes, I'd drop FieldExistsQuery as it looks to be quite cheap. Among the other 2 queries, BlendedTermQuery seem to be good candidate to me to have a concurrent rewrite(as it does some expensive/non-trivial work in TermStates#build) whereas CompletionQuery maybe not. Do you think we should go with making BlendedTermQuery ,CompletionQuery concurrent (or) maybe not ? Let me know what do you think?

Indeed compound queries should not rewrite their sub queries via the executor, otherwise we could have threads in the executor joining other threads from the executor, which can lead to deadlocks.

Yeah, thats answers why it was not working for compound ones. Thanks @jpountz !

jpountz · 2023-03-09T21:46:51Z

If there is something that might benefit from being made concurrent, it may be TermStates.build because it does I/O via terms dictionary lookups. A benefit of making the change there is that it would automatically benefit TermQuery, BlendedTermQuery, etc. But the rest of the rewrite logic of these queries looks so cheap to me that I don't think we'd benefit from making it concurrent. I'm curious to get others' opinions as to whether this TermStates.build method is one we should consider making concurrent. For clarity, this could have been done before adding concurrent rewrites, as TermQuery calls this method in createWeight which has taken an IndexSearcher for a long time if not forever.

shubhamvishu · 2023-03-10T03:55:07Z

If there is something that might benefit from being made concurrent, it may be TermStates.build because it does I/O via terms dictionary lookups. A benefit of making the change there is that it would automatically benefit TermQuery, BlendedTermQuery, etc. But the rest of the rewrite logic of these queries looks so cheap to me that I don't think we'd benefit from making it concurrent. I'm curious to get others' opinions as to whether this TermStates.build method is one we should consider making concurrent. For clarity, this could have been done before adding concurrent rewrites, as TermQuery calls this method in createWeight which has taken an IndexSearcher for a long time if not forever.

@jpountz That sounds like a really good idea to me. Thanks for pointing!

shubhamvishu · 2023-03-19T17:28:10Z

So in the new revision I have changed the TermStates#build to run concurrently if an executor is passed. Let me know if this makes sense. Thanks!

jpountz

Sorry for the lag! I had another look and the general direction looks good to me, I left some suggestions. I'd be interested in getting feedback from others as this would be quite new usage of the IndexSearcher executor in the sense that the code that it called is not especially CPU-intensive but could do quite some random I/O, so we'd be mostly using the executor to improve I/O concurrency.

lucene/core/src/java/org/apache/lucene/document/FeatureField.java

lucene/core/src/java/org/apache/lucene/index/TermStates.java

mikemccand · 2023-06-20T12:55:31Z

I also like the direction this is going! This might be a nice latency reduction for primary key lookups on idle-ish hosts.

It is certainly a new use of IndexSearcher's executor -- can we improve the javadocs of that parameter to explain that many more tasks than simply the number of segments/slices might be executed? We don't want callers thinking they can pass in an overly bounded queue size on their executors.

Unfortunately, Lucene's nightly benchmarks (luceneutil) don't test concurrency within a single query at all -- I opened this issue to improve the situation. @jpountz also previously opened this issue to ensure we are exercising that code path too.

…nts (apache#12325)" This reverts commit 10bebde. Based on a recent discussion in apache#12183 (comment) we agreed it makes more sense to parallelize knn query vector rewrite across leaves rather than leaf slices.

…nts (#12325)" (#12385) This reverts commit 10bebde. Based on a recent discussion in #12183 (comment) we agreed it makes more sense to parallelize knn query vector rewrite across leaves rather than leaf slices.

jpountz

I left some comments on exception handling but otherwise it looks good to me. Can you add a CHANGES entry under 9.8 as well?

lucene/core/src/java/org/apache/lucene/index/TermStates.java

shubhamvishu · 2023-07-11T07:13:54Z

@jpountz As I was running the tests(./gradlew test) with this PR I observed it getting stuck on TestPayloadCheckQuery or TestBasics a couple of times intermittently. Looking into it, it seemed some threads were parked for indefinite time and were never unparked causing it to stuck when the searcher has been passed an executor. I was able to get the tests working by increasing the threads below to atleast 13(i.e. setting pool size and max pool size to minimum 13) in LuceneTestCase for the searcher used in these tests. Should we override the passed executor for these tests?. But its not right thing to do and I think and we should not enforce this?. We should not set it to >8 which is the maximum currently(and I think it depends on the no. of cores available?). Is there some other way to fix such issues? Let me know what do you think?

https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java#L1946-L1954

threads = TestUtil.nextInt(random, 1, 8);
        ex =
            new ThreadPoolExecutor(
                threads,
                threads,
                0L,
                TimeUnit.MILLISECONDS,
                new LinkedBlockingQueue<Runnable>(),
                new NamedThreadFactory("LuceneTestCase"));

To reproduce

gradlew test --tests TestPayloadCheckQuery.testInequalityPayloadChecks -Dtests.seed=8C15E6548CE0251F
gradlew test --tests TestBasics.testSpanNearExact -Dtests.seed=736DA8E94FF33AD0

lucene/core/src/java/org/apache/lucene/index/TermStates.java

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

javanna · 2023-09-20T10:20:25Z

Hey @shubhamvishu heads up: I merged #12659 to address the deadlock issue and opened #12574 to adjust TaskExecutor visibility outside of this PR. Hopefully you are next going to be able to merge your PR!

shubhamvishu · 2023-09-20T12:36:48Z

Hey @shubhamvishu heads up: I merged #12659 to address the deadlock issue and opened #12574 to adjust TaskExecutor visibility outside of this PR. Hopefully you are next going to be able to merge your PR!

Thanks a lot @javanna! I'll push the new revision post #12574 is merged.

shubhamvishu · 2023-09-20T13:54:00Z

I have rebased the PR based on the changes in #12574. Could someone please review the changes? Thanks!

javanna

I have reviewed the concurrent execution part, interaction with TaskExecutor and it looks good to me. I'd ask others to double check the parallelism introduced in TermStates because I am not super familiar with it.

jpountz

The change looks good to me, thanks for your tenacity @shubhamvishu and @javanna for the help!

jpountz · 2023-09-21T11:21:54Z

lucene/core/src/java/org/apache/lucene/index/TermStates.java

+                              return null;
+                            }))
+                .toList();
+        List<TaskExecutor.Task<TermStateInfo>> taskList = new ArrayList<>(tasks);


Why do we need to clone the list?

To me it makes no sense to clone the list, as it is a new list already (created by stream.toList()). In addition the original list is not used anymore.

Yes we don't, it was required in earlier revisions but became obsolute with the new rebased changes. I have removed this in the new revision.

uschindler · 2023-09-21T11:53:10Z

lucene/core/src/java/org/apache/lucene/index/TermStates.java

+                              return null;
+                            }))
+                .toList();
+        List<TaskExecutor.Task<TermStateInfo>> taskList = new ArrayList<>(tasks);


To me it makes no sense to clone the list, as it is a new list already (created by stream.toList()). In addition the original list is not used anymore.

uschindler · 2023-09-21T11:55:57Z

lucene/core/src/java/org/apache/lucene/index/TermStates.java

@@ -211,4 +244,40 @@ public String toString() {

    return sb.toString();
  }
+
+  /** Wrapper over TermState, ordinal value, term doc frequency and total term frequency */


In main branch we could make this a record (unfortunately not in Java 11).

Indeed, I didn't use record because I couldn't find any other usages of record in the whole codebase(just 1-2) which convinced me to have a static class rather. Is there a minimum JDK that have to be supported for lucene(jdk 11 or 17?)? In build.gradle I see minJavaVersion = JavaVersion.VERSION_17 so maybe its fine/safe to use record in the above change(I'm not much aware about it)? Let me know if I should instead change this to a record or maybe keep it as is. Thanks!

Lucene main (Future Lucene 10.x) will use minimum Java 17. branch_9x (For Lucene 9.x) uses Java 11.

So record is fine in Lucene Main branch but requires additional work when backporting changes. But at some point we should use records for this type of classes. This would be a good start.

Thanks for explaining and reviewing @uschindler!

Lucene main (Future Lucene 10.x) will use minimum Java 17. branch_9x ....

Got it, yes that would be great! I would be happy to open a separate PR dealing with changing all such current classes in lucene into a record class which get us started atleast(I don't know if that should be rather incremental effort though).

javanna · 2023-09-22T07:21:57Z

Great to see this merged, thanks @shubhamvishu for all the work as well as patience as we were figuring out a way forward!

shubhamvishu force-pushed the concurrent-rewrites branch from 55bafc1 to 2020435 Compare March 3, 2023 16:39

rmuir reviewed Mar 3, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/BlendedTermQuery.java Outdated Show resolved Hide resolved

rmuir reviewed Mar 3, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java Outdated Show resolved Hide resolved

uschindler requested changes Mar 3, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java Outdated Show resolved Hide resolved

shubhamvishu force-pushed the concurrent-rewrites branch from 2020435 to ac7fcca Compare March 19, 2023 16:54

shubhamvishu requested review from uschindler and rmuir and removed request for uschindler and rmuir March 22, 2023 15:51

shubhamvishu requested a review from rmuir May 11, 2023 07:54

jpountz reviewed Jun 8, 2023

View reviewed changes

javanna reviewed Jun 20, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/index/TermStates.java Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/index/TermStates.java Outdated Show resolved Hide resolved

javanna mentioned this pull request Jun 23, 2023

Revert "Parallelize knn query rewrite across slices rather than segments #12385

Merged

shubhamvishu force-pushed the concurrent-rewrites branch from ac7fcca to 68dca5a Compare July 9, 2023 16:32

shubhamvishu requested a review from jpountz July 9, 2023 16:34

jpountz requested changes Jul 10, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/index/TermStates.java Outdated Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/index/TermStates.java Outdated Show resolved Hide resolved

shubhamvishu force-pushed the concurrent-rewrites branch from c01d85f to 954016b Compare September 13, 2023 21:45

shubhamvishu requested a review from uschindler September 18, 2023 05:59

javanna reviewed Sep 18, 2023

View reviewed changes

This was referenced Sep 19, 2023

Prevent concurrent tasks from parallelizing further #12569

Merged

Make TaskExecutor public #12574

Merged

shubhamvishu added a commit to shubhamvishu/lucene that referenced this pull request Sep 20, 2023

Make TermStates#build concurrent (apache#12183)

408d0fb

shubhamvishu force-pushed the concurrent-rewrites branch from ef8efe1 to 408d0fb Compare September 20, 2023 13:52

shubhamvishu requested review from jpountz and javanna September 20, 2023 13:54

javanna approved these changes Sep 21, 2023

View reviewed changes

jpountz approved these changes Sep 21, 2023

View reviewed changes

uschindler requested changes Sep 21, 2023

View reviewed changes

Make TermStates#build concurrent (apache#12183)

4a23df8

shubhamvishu force-pushed the concurrent-rewrites branch from 408d0fb to 4a23df8 Compare September 21, 2023 14:20

shubhamvishu requested review from uschindler and jpountz September 21, 2023 14:33

uschindler approved these changes Sep 21, 2023

View reviewed changes

jpountz approved these changes Sep 21, 2023

View reviewed changes

jpountz merged commit fb1f4dd into apache:main Sep 21, 2023
4 checks passed

jpountz pushed a commit that referenced this pull request Sep 21, 2023

Make TermStates#build concurrent (#12183)

be57460

javanna added this to the 9.9.0 milestone Nov 30, 2023

tarzanek mentioned this pull request Dec 6, 2023

lucene 9.9.0 oracle/opengrok#4505

Merged

sarthakn7 mentioned this pull request Mar 4, 2024

Lucene 9.9 upgrade Yelp/nrtsearch#625

Merged

jimczi mentioned this pull request May 15, 2024

Named query performance degradation after upgrading to 8.13.2 (from 8.7.1) elastic/elasticsearch#108556

Open

trakos mentioned this pull request Jun 8, 2024

Upgrade to Elasticsearch 8.14.0 o19s/elasticsearch-learning-to-rank#488

Merged

shubhamvishu mentioned this pull request Jun 19, 2024

TaskExecutor should not fork unnecessarily #13472

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make TermStates#build concurrent #12183

Make TermStates#build concurrent #12183

shubhamvishu commented Mar 3, 2023 •

edited

Loading

rmuir commented Mar 3, 2023

rmuir commented Mar 3, 2023

shubhamvishu commented Mar 4, 2023

jpountz commented Mar 6, 2023

shubhamvishu commented Mar 6, 2023 •

edited

Loading

jpountz commented Mar 9, 2023

shubhamvishu commented Mar 10, 2023

shubhamvishu commented Mar 19, 2023

jpountz left a comment

mikemccand commented Jun 20, 2023

jpountz left a comment

shubhamvishu commented Jul 11, 2023 •

edited

Loading

javanna commented Sep 20, 2023

shubhamvishu commented Sep 20, 2023

shubhamvishu commented Sep 20, 2023 •

edited

Loading

javanna left a comment

jpountz left a comment

jpountz Sep 21, 2023

uschindler Sep 21, 2023

shubhamvishu Sep 21, 2023

uschindler Sep 21, 2023

uschindler Sep 21, 2023

shubhamvishu Sep 21, 2023 •

edited

Loading

uschindler Sep 21, 2023

shubhamvishu Sep 21, 2023

javanna commented Sep 22, 2023

Make TermStates#build concurrent #12183

Make TermStates#build concurrent #12183

Conversation

shubhamvishu commented Mar 3, 2023 • edited Loading

Description

rmuir commented Mar 3, 2023

rmuir commented Mar 3, 2023

shubhamvishu commented Mar 4, 2023

jpountz commented Mar 6, 2023

shubhamvishu commented Mar 6, 2023 • edited Loading

jpountz commented Mar 9, 2023

shubhamvishu commented Mar 10, 2023

shubhamvishu commented Mar 19, 2023

jpountz left a comment

Choose a reason for hiding this comment

mikemccand commented Jun 20, 2023

jpountz left a comment

Choose a reason for hiding this comment

shubhamvishu commented Jul 11, 2023 • edited Loading

To reproduce

javanna commented Sep 20, 2023

shubhamvishu commented Sep 20, 2023

shubhamvishu commented Sep 20, 2023 • edited Loading

javanna left a comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

jpountz Sep 21, 2023

Choose a reason for hiding this comment

uschindler Sep 21, 2023

Choose a reason for hiding this comment

shubhamvishu Sep 21, 2023

Choose a reason for hiding this comment

uschindler Sep 21, 2023

Choose a reason for hiding this comment

uschindler Sep 21, 2023

Choose a reason for hiding this comment

shubhamvishu Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

uschindler Sep 21, 2023

Choose a reason for hiding this comment

shubhamvishu Sep 21, 2023

Choose a reason for hiding this comment

javanna commented Sep 22, 2023

shubhamvishu commented Mar 3, 2023 •

edited

Loading

shubhamvishu commented Mar 6, 2023 •

edited

Loading

shubhamvishu commented Jul 11, 2023 •

edited

Loading

shubhamvishu commented Sep 20, 2023 •

edited

Loading

shubhamvishu Sep 21, 2023 •

edited

Loading