-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a task executor when executor is not provided #12606
Create a task executor when executor is not provided #12606
Conversation
As we introduce more places where we add concurrency (there are currently three) there is a common pattern around checking whether there is an executor provided, and then going sequential on the caller thread or parallel relying on the executor. That can be improved by internally creating a TaskExecutor that relies on an executor that executes tasks on the caller thread, which ensures that the task executor is never null, hence the common conditional is no longer needed, as the concurrent path that uses the task executor would be the default and only choice for operations that can be parallelized.
* | ||
* @lucene.experimental | ||
*/ | ||
public LeafSlice[] getSlices() { | ||
return (executor == null) ? null : leafSlicesSupplier.get(); | ||
return leafSlicesSupplier.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this method is still needed, perhaps it's fine but we could make it final as a follow-up? It could be confusing otherwise for users to figure which of the two methods needs to be overridden between slices and getSlices ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1....I think we should keep this but make final here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, do you have usecases to call this method as a consumer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the method #slices(List<LeafReaderContext> leaves)
is protected in IndexSearcher
, I thought #getSlices
could be used by other classes(not subclass of IndexSearcher
and in different package) to get all the slices without having to pass maxDocsPerSlice
and maxSegmentsPerSlice
to make use in concurrency?. But giving it a thought again, I feel maybe we should remove/deprecate #getSlices
and make slices(List<LeafReaderContext> leaves)
public because #getSlices
is not doing any extra work other than returning those leaf slices(which is obtained from #slices
)? Let me know what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I see the other #slices
method is currently static
so I think we should make the other one static too if incase we are pursuing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sohami I think you could provide the context here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javanna getSlices()
was kept as a convenience method to handle the null executor case such that usage in IndexSearcher
doesn't have to explicitly perform check for both executor
being null
and return value of leafSlices
.
This method will be useful if consumer wants to know the slices created by the provider or the count of slices. For example: In OpenSearch, we are using this method to emit a metric around slice count per searcher which can give indication around if a search request is getting too many slices and if we need to adjust that by providing some other slice provider.
Also slices()
method was kept protected such that extensions can provide their own implementation of it to control how slices
are generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks are you ok if we make getSlices final? I'd like to clarify confusion around the two methods and how they should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javanna yes that should be fine. Thanks for checking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #12718
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java
Outdated
Show resolved
Hide resolved
I really like this! This looks much more cleaner. |
for (LeafReaderContext context : reader.leaves()) { | ||
tasks.add(taskExecutor.createTask(() -> searchLeaf(context, filterWeight))); | ||
} | ||
TopDocs[] perLeafResults = taskExecutor.invokeAll(tasks).toArray(TopDocs[]::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a trade-off here: we create multiple tasks even if we are not parallelizing. That is good for simplicity, yet it makes the assumption that concurrency is not a corner case, but rather not having an executor is. We could make the task executor API more complex to factor in whether a real executor is provided or not, but I am not sure that's a good trade-off. I'd like to assume that we are evolving Lucene to make use of concurrency more and more, and at some point having an executor to parallelize is the default.
Thanks for looking @shubhamvishu ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to assume that we are evolving Lucene to make use of concurrency more and more, and at some point having an executor to parallelize is the default.
++
As we introduce more places where we add concurrency (there are currently three) there is a common pattern around checking whether there is an executor provided, and then going sequential on the caller thread or parallel relying on the executor. That can be improved by internally creating a TaskExecutor that relies on an executor that executes tasks on the caller thread, which ensures that the task executor is never null, hence the common conditional is no longer needed, as the concurrent path that uses the task executor would be the default and only choice for operations that can be parallelized.
As we introduce more places where we add concurrency (there are currently three) there is a common pattern around checking whether there is an executor provided, and then going sequential on the caller thread or parallel relying on the executor.
That can be improved by internally creating a TaskExecutor that relies on an executor that executes tasks on the caller thread, which ensures that the task executor is never null, hence the common conditional is no longer needed, as the concurrent path that uses the task executor would be the default and only choice for operations that can be parallelized.