Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TermStates#build concurrent #12183

Merged
merged 1 commit into from
Sep 21, 2023
Merged

Conversation

shubhamvishu
Copy link
Contributor

@shubhamvishu shubhamvishu commented Mar 3, 2023

Description

This PR now deals with making TermStates#build run concurrently using the IndexSearcher's executor .


Old idea/description :

This change tries to make some queries with heavy rewrites make use of concurrency using the api added in #11840. In #12160 it turns out have good gains in KnnVectorQuery. This is initial rough implementation to get some early thoughts/feedback on or ideas on if this could be useful and worth pursuing for some other queries as well which are doing some heavy lifting in rewrite. This change initially tries to achieve some benefit of concurrency for FieldExistsQuery , BlendedTermQuery and CompletionQuery.There are also others also I could find which are not covered in this but maybe could benefit with concurrency.

 
Queries changed to have || rewrite in this PR:

  1. FieldExistsQuery
  2. BlendedTermQuery
  3. CompletionQuery

Some possible candidate(s) with non-trivial rewrites ?

  • CommonTermsQuery
  • SpanNearQuery
  • SpanOrQuery
  • CoveringQuery
  • DisjunctionMaxQuery

 
Benchmarks : yet to be run

PS : As some query rewrites calls rewrites on the child/sub queries. Can we achieve this in those queries as well? So I tried to make DisjunctionMaxQuery#rewrite concurrent but that seems to get stuck on my machine when running TestDisjunctionMaxQuery unit test. Maybe thats too much parallelism as the test is randomly generating a very large disjunctive query and making that rewrite concurrent is not correct or helpful ?

@rmuir
Copy link
Member

rmuir commented Mar 3, 2023

the change to BlendedTermQuery makes the code nearly unreadable. It is important to be able to understand how it is literally "blending the stats". Can we please find a way to make this feature less invasive?

@rmuir
Copy link
Member

rmuir commented Mar 3, 2023

Personally I don't think we should do this. Maybe the API needs to be rethought. I see race conditions being added in this change, now this adds potential for all kinds of crazy bugs in lucene.

I don't think any performance gain is worth that.

@shubhamvishu
Copy link
Contributor Author

Thanks for the quick feedback @rmuir @uschindler !

the change to BlendedTermQuery makes the code nearly unreadable. It is important to be able to understand how it is literally "blending the stats". Can we please find a way to make this feature less invasive?

Sure, this was to get some initial feedback on whether this would make sense so haven't focussed much on those aspects. I'll try to make it more cleaner in the next revision and see how that turns out.

Personally I don't think we should do this. Maybe the API needs to be rethought. I see race conditions being added in this change, now this adds potential for all kinds of crazy bugs in lucene.
I don't think any performance gain is worth that.

Agreed. We should definitely not have any race conditions here. I'll try to address these and other suggestions. Maybe we could then evaluate if this would be beneficial and worth the possible benefits of concurrency or not.

@jpountz
Copy link
Contributor

jpountz commented Mar 6, 2023

When we changed rewrite to take an IndexSearcher rather than an IndexReader in order to allow concurrent rewrites, my intuition was that it would eventually get used on a minority of queries that do expensive stuff in rewrite such as KnnFloatVectorQuery which needs to run a vector search. On the other hand, queries discussed here have super cheap rewrites, e.g. FieldExistsQuery runs in O(num_segments) and doesn't perform any I/O so I don't think that its rewrite should be made concurrent?

As some query rewrites calls rewrites on the child/sub queries. Can we achieve this in those queries as well?

Indeed compound queries should not rewrite their sub queries via the executor, otherwise we could have threads in the executor joining other threads from the executor, which can lead to deadlocks.

@shubhamvishu
Copy link
Contributor Author

shubhamvishu commented Mar 6, 2023

On the other hand, queries discussed here have super cheap rewrites, e.g. FieldExistsQuery runs in O(num_segments) and doesn't perform any I/O so I don't think that its rewrite should be made concurrent?

Yes, I'd drop FieldExistsQuery as it looks to be quite cheap. Among the other 2 queries, BlendedTermQuery seem to be good candidate to me to have a concurrent rewrite(as it does some expensive/non-trivial work in TermStates#build) whereas CompletionQuery maybe not. Do you think we should go with making BlendedTermQuery ,CompletionQuery concurrent (or) maybe not ? Let me know what do you think?

Indeed compound queries should not rewrite their sub queries via the executor, otherwise we could have threads in the executor joining other threads from the executor, which can lead to deadlocks.

Yeah, thats answers why it was not working for compound ones. Thanks @jpountz !

@jpountz
Copy link
Contributor

jpountz commented Mar 9, 2023

If there is something that might benefit from being made concurrent, it may be TermStates.build because it does I/O via terms dictionary lookups. A benefit of making the change there is that it would automatically benefit TermQuery, BlendedTermQuery, etc. But the rest of the rewrite logic of these queries looks so cheap to me that I don't think we'd benefit from making it concurrent. I'm curious to get others' opinions as to whether this TermStates.build method is one we should consider making concurrent. For clarity, this could have been done before adding concurrent rewrites, as TermQuery calls this method in createWeight which has taken an IndexSearcher for a long time if not forever.

@shubhamvishu
Copy link
Contributor Author

If there is something that might benefit from being made concurrent, it may be TermStates.build because it does I/O via terms dictionary lookups. A benefit of making the change there is that it would automatically benefit TermQuery, BlendedTermQuery, etc. But the rest of the rewrite logic of these queries looks so cheap to me that I don't think we'd benefit from making it concurrent. I'm curious to get others' opinions as to whether this TermStates.build method is one we should consider making concurrent. For clarity, this could have been done before adding concurrent rewrites, as TermQuery calls this method in createWeight which has taken an IndexSearcher for a long time if not forever.

@jpountz That sounds like a really good idea to me. Thanks for pointing!

@shubhamvishu
Copy link
Contributor Author

So in the new revision I have changed the TermStates#build to run concurrently if an executor is passed. Let me know if this makes sense. Thanks!

@shubhamvishu shubhamvishu requested review from uschindler and rmuir and removed request for uschindler and rmuir March 22, 2023 15:51
@shubhamvishu shubhamvishu requested a review from rmuir May 11, 2023 07:54
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lag! I had another look and the general direction looks good to me, I left some suggestions. I'd be interested in getting feedback from others as this would be quite new usage of the IndexSearcher executor in the sense that the code that it called is not especially CPU-intensive but could do quite some random I/O, so we'd be mostly using the executor to improve I/O concurrency.

@mikemccand
Copy link
Member

I also like the direction this is going! This might be a nice latency reduction for primary key lookups on idle-ish hosts.

It is certainly a new use of IndexSearcher's executor -- can we improve the javadocs of that parameter to explain that many more tasks than simply the number of segments/slices might be executed? We don't want callers thinking they can pass in an overly bounded queue size on their executors.

Unfortunately, Lucene's nightly benchmarks (luceneutil) don't test concurrency within a single query at all -- I opened this issue to improve the situation. @jpountz also previously opened this issue to ensure we are exercising that code path too.

javanna added a commit to javanna/lucene that referenced this pull request Jun 23, 2023
…nts (apache#12325)"

This reverts commit 10bebde.

Based on a recent discussion in
apache#12183 (comment) we
agreed it makes more sense to parallelize knn query vector rewrite
across leaves rather than leaf slices.
javanna added a commit that referenced this pull request Jun 26, 2023
…nts (#12325)" (#12385)

This reverts commit 10bebde.

Based on a recent discussion in
#12183 (comment) we
agreed it makes more sense to parallelize knn query vector rewrite
across leaves rather than leaf slices.
javanna added a commit that referenced this pull request Jun 26, 2023
…nts (#12325)" (#12385)

This reverts commit 10bebde.

Based on a recent discussion in
#12183 (comment) we
agreed it makes more sense to parallelize knn query vector rewrite
across leaves rather than leaf slices.
@shubhamvishu shubhamvishu requested a review from jpountz July 9, 2023 16:34
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments on exception handling but otherwise it looks good to me. Can you add a CHANGES entry under 9.8 as well?

@shubhamvishu
Copy link
Contributor Author

shubhamvishu commented Jul 11, 2023

@jpountz As I was running the tests(./gradlew test) with this PR I observed it getting stuck on TestPayloadCheckQuery or TestBasics a couple of times intermittently. Looking into it, it seemed some threads were parked for indefinite time and were never unparked causing it to stuck when the searcher has been passed an executor. I was able to get the tests working by increasing the threads below to atleast 13(i.e. setting pool size and max pool size to minimum 13) in LuceneTestCase for the searcher used in these tests. Should we override the passed executor for these tests?. But its not right thing to do and I think and we should not enforce this?. We should not set it to >8 which is the maximum currently(and I think it depends on the no. of cores available?). Is there some other way to fix such issues? Let me know what do you think?

https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java#L1946-L1954

threads = TestUtil.nextInt(random, 1, 8);
        ex =
            new ThreadPoolExecutor(
                threads,
                threads,
                0L,
                TimeUnit.MILLISECONDS,
                new LinkedBlockingQueue<Runnable>(),
                new NamedThreadFactory("LuceneTestCase"));           

To reproduce

gradlew test --tests TestPayloadCheckQuery.testInequalityPayloadChecks -Dtests.seed=8C15E6548CE0251F
gradlew test --tests TestBasics.testSpanNearExact -Dtests.seed=736DA8E94FF33AD0

@javanna
Copy link
Contributor

javanna commented Sep 20, 2023

Hey @shubhamvishu heads up: I merged #12659 to address the deadlock issue and opened #12574 to adjust TaskExecutor visibility outside of this PR. Hopefully you are next going to be able to merge your PR!

@shubhamvishu
Copy link
Contributor Author

Hey @shubhamvishu heads up: I merged #12659 to address the deadlock issue and opened #12574 to adjust TaskExecutor visibility outside of this PR. Hopefully you are next going to be able to merge your PR!

Thanks a lot @javanna! I'll push the new revision post #12574 is merged.

shubhamvishu added a commit to shubhamvishu/lucene that referenced this pull request Sep 20, 2023
@shubhamvishu
Copy link
Contributor Author

shubhamvishu commented Sep 20, 2023

I have rebased the PR based on the changes in #12574. Could someone please review the changes? Thanks!

Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the concurrent execution part, interaction with TaskExecutor and it looks good to me. I'd ask others to double check the parallelism introduced in TermStates because I am not super familiar with it.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, thanks for your tenacity @shubhamvishu and @javanna for the help!

return null;
}))
.toList();
List<TaskExecutor.Task<TermStateInfo>> taskList = new ArrayList<>(tasks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to clone the list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it makes no sense to clone the list, as it is a new list already (created by stream.toList()). In addition the original list is not used anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we don't, it was required in earlier revisions but became obsolute with the new rebased changes. I have removed this in the new revision.

return null;
}))
.toList();
List<TaskExecutor.Task<TermStateInfo>> taskList = new ArrayList<>(tasks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it makes no sense to clone the list, as it is a new list already (created by stream.toList()). In addition the original list is not used anymore.

@@ -211,4 +244,40 @@ public String toString() {

return sb.toString();
}

/** Wrapper over TermState, ordinal value, term doc frequency and total term frequency */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In main branch we could make this a record (unfortunately not in Java 11).

Copy link
Contributor Author

@shubhamvishu shubhamvishu Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I didn't use record because I couldn't find any other usages of record in the whole codebase(just 1-2) which convinced me to have a static class rather. Is there a minimum JDK that have to be supported for lucene(jdk 11 or 17?)? In build.gradle I see minJavaVersion = JavaVersion.VERSION_17 so maybe its fine/safe to use record in the above change(I'm not much aware about it)? Let me know if I should instead change this to a record or maybe keep it as is. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lucene main (Future Lucene 10.x) will use minimum Java 17. branch_9x (For Lucene 9.x) uses Java 11.

So record is fine in Lucene Main branch but requires additional work when backporting changes. But at some point we should use records for this type of classes. This would be a good start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining and reviewing @uschindler!

Lucene main (Future Lucene 10.x) will use minimum Java 17. branch_9x ....

Got it, yes that would be great! I would be happy to open a separate PR dealing with changing all such current classes in lucene into a record class which get us started atleast(I don't know if that should be rather incremental effort though).

@jpountz jpountz merged commit fb1f4dd into apache:main Sep 21, 2023
4 checks passed
jpountz pushed a commit that referenced this pull request Sep 21, 2023
@javanna
Copy link
Contributor

javanna commented Sep 22, 2023

Great to see this merged, thanks @shubhamvishu for all the work as well as patience as we were figuring out a way forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants