Create a task executor when executor is not provided #12606

javanna · 2023-09-28T19:51:58Z

As we introduce more places where we add concurrency (there are currently three) there is a common pattern around checking whether there is an executor provided, and then going sequential on the caller thread or parallel relying on the executor.

That can be improved by internally creating a TaskExecutor that relies on an executor that executes tasks on the caller thread, which ensures that the task executor is never null, hence the common conditional is no longer needed, as the concurrent path that uses the task executor would be the default and only choice for operations that can be parallelized.

As we introduce more places where we add concurrency (there are currently three) there is a common pattern around checking whether there is an executor provided, and then going sequential on the caller thread or parallel relying on the executor. That can be improved by internally creating a TaskExecutor that relies on an executor that executes tasks on the caller thread, which ensures that the task executor is never null, hence the common conditional is no longer needed, as the concurrent path that uses the task executor would be the default and only choice for operations that can be parallelized.

javanna · 2023-09-28T19:52:57Z

lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java

   *
   * @lucene.experimental
   */
  public LeafSlice[] getSlices() {
-    return (executor == null) ? null : leafSlicesSupplier.get();
+    return leafSlicesSupplier.get();


I wonder whether this method is still needed, perhaps it's fine but we could make it final as a follow-up? It could be confusing otherwise for users to figure which of the two methods needs to be overridden between slices and getSlices ?

+1....I think we should keep this but make final here.

Out of curiosity, do you have usecases to call this method as a consumer?

Since the method #slices(List<LeafReaderContext> leaves) is protected in IndexSearcher, I thought #getSlices could be used by other classes(not subclass of IndexSearcher and in different package) to get all the slices without having to pass maxDocsPerSlice and maxSegmentsPerSlice to make use in concurrency?. But giving it a thought again, I feel maybe we should remove/deprecate #getSlices and make slices(List<LeafReaderContext> leaves) public because #getSlices is not doing any extra work other than returning those leaf slices(which is obtained from #slices)? Let me know what do you think?

Also I see the other #slices method is currently static so I think we should make the other one static too if incase we are pursuing this?

@sohami I think you could provide the context here

@javanna getSlices() was kept as a convenience method to handle the null executor case such that usage in IndexSearcher doesn't have to explicitly perform check for both executor being null and return value of leafSlices.
This method will be useful if consumer wants to know the slices created by the provider or the count of slices. For example: In OpenSearch, we are using this method to emit a metric around slice count per searcher which can give indication around if a search request is getting too many slices and if we need to adjust that by providing some other slice provider.

Also slices() method was kept protected such that extensions can provide their own implementation of it to control how slices are generated.

ok thanks are you ok if we make getSlices final? I'd like to clarify confusion around the two methods and how they should be used.

@javanna yes that should be fine. Thanks for checking!

I opened #12718

lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java

shubhamvishu · 2023-09-29T09:59:27Z

I really like this! This looks much more cleaner.

javanna · 2023-09-29T12:30:07Z

lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java

+    for (LeafReaderContext context : reader.leaves()) {
+      tasks.add(taskExecutor.createTask(() -> searchLeaf(context, filterWeight)));
+    }
+    TopDocs[] perLeafResults = taskExecutor.invokeAll(tasks).toArray(TopDocs[]::new);


there is a trade-off here: we create multiple tasks even if we are not parallelizing. That is good for simplicity, yet it makes the assumption that concurrency is not a corner case, but rather not having an executor is. We could make the task executor API more complex to factor in whether a real executor is provided or not, but I am not sure that's a good trade-off. I'd like to assume that we are evolving Lucene to make use of concurrency more and more, and at some point having an executor to parallelize is the default.

javanna · 2023-09-29T13:21:50Z

Thanks for looking @shubhamvishu !

jpountz

I'd like to assume that we are evolving Lucene to make use of concurrency more and more, and at some point having an executor to parallelize is the default.

++

As we introduce more places where we add concurrency (there are currently three) there is a common pattern around checking whether there is an executor provided, and then going sequential on the caller thread or parallel relying on the executor. That can be improved by internally creating a TaskExecutor that relies on an executor that executes tasks on the caller thread, which ensures that the task executor is never null, hence the common conditional is no longer needed, as the concurrent path that uses the task executor would be the default and only choice for operations that can be parallelized.

javanna added 2 commits September 28, 2023 21:38

iter

6cdcb44

javanna added this to the 9.9.0 milestone Sep 28, 2023

javanna requested a review from jpountz September 28, 2023 19:51

javanna commented Sep 28, 2023

View reviewed changes

javanna added 2 commits September 28, 2023 22:05

iter

98bd2dc

iter

25c80a6

shubhamvishu reviewed Sep 29, 2023

View reviewed changes

lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java Outdated Show resolved Hide resolved

javanna added 2 commits September 29, 2023 14:26

address tests

19df037

tidy

8167de1

javanna commented Sep 29, 2023

View reviewed changes

javanna added 3 commits October 2, 2023 17:07

Merge branch 'main' into enhancement/task_executor_non_null

62167ac

changes entry

94505ff

tidy

92f9460

jpountz approved these changes Oct 3, 2023

View reviewed changes

javanna merged commit 2106bf5 into apache:main Oct 3, 2023
4 checks passed

javanna mentioned this pull request Oct 24, 2023

Make IndexSearcher#getSlices final and clarify docs #12718

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a task executor when executor is not provided #12606

Create a task executor when executor is not provided #12606

javanna commented Sep 28, 2023

javanna Sep 28, 2023

shubhamvishu Sep 29, 2023

javanna Sep 29, 2023

shubhamvishu Sep 29, 2023 •

edited

Loading

shubhamvishu Sep 29, 2023 •

edited

Loading

reta Oct 2, 2023

sohami Oct 13, 2023

javanna Oct 13, 2023

sohami Oct 16, 2023

javanna Oct 24, 2023

shubhamvishu commented Sep 29, 2023

javanna Sep 29, 2023

javanna commented Sep 29, 2023

jpountz left a comment

Create a task executor when executor is not provided #12606

Create a task executor when executor is not provided #12606

Conversation

javanna commented Sep 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shubhamvishu Sep 29, 2023 • edited Loading

Choose a reason for hiding this comment

shubhamvishu Sep 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shubhamvishu commented Sep 29, 2023

Choose a reason for hiding this comment

javanna commented Sep 29, 2023

jpountz left a comment

Choose a reason for hiding this comment

shubhamvishu Sep 29, 2023 •

edited

Loading

shubhamvishu Sep 29, 2023 •

edited

Loading