Simplify task executor for concurrent operations #12498

javanna · 2023-08-10T14:45:25Z

IndexSearcher supports parallelizing search across slices when an executor is available. Knn queries can also parallelize their rewrite across segments using the same executor. Potentially, other operations will be parallelized in the future using the same pattern.

Lucene currently has the notion of a TaskExecutor (previously named SliceExecutor, but because knn rewrite is parallelized across segments rather than slices, it was recently renamed to TaskExecutor) which is responsible for offloading tasks to the executor, wait for them all to complete and return the corresponding results.

IndexSearcher currently has an instanceof check of the provided executor, and if it's a ThreadPoolExecutor, which is likely, a QueueSizeBasedExecutor is used which tries to be smart about when to offload tasks to the executor vs when to execute them on the caller thread. This behaviour is not configurable in any way.

As part of exposing concurrent search in Elasticsearch, we have found that the current task executor is too opinionated, and either it needs to be customizable, or simplified to be less opinionated.

In a server-side search engine, the search itself is likely executed by a thread pool that has its own sizing, queueing mechanisms as well as rejection policy. When offloading requests to an executor for additional parallelism, it is important to be able to determine where the load is going to be and what type of workload it maps to. Ideally, the caller thread is merely coordinating and not doing any I/O nor CPU intensive work, that is instead all delegated to the separate worker threads. Having built-in rules for when to execute certain operations on the caller thread may cause more problems than it solves, as it is unpredictable and makes sizing of thread pools more difficult, because all of a sudden you end up with two thread pools that may execute I/O as well as CPU intensive operations.

My conclusion is that if flexibility is needed in terms of possibly executing on the caller thread, such behaviour can be included in the executor that is provided to the searcher (for example with an adaptive mechanism that conditionally executes directly instead of offloading based on queue size like QueueSizeBasedExecutor does), together with its saturation policy (as opposed to catching RejectedExecutionException and executing on the caller thread, which is potentially dangerous). Also, executing the last slice / task on the caller thread, as it's the one waiting for all the tasks to complete, is not necessarily addressing a real problem around under-utilization of a thread that is doing nothing. That wait is cheap, and it's possibly more important to divide the workload among the thread pools. That said, such behavior can also be included in the Executor itself and does not require additional extension points.

My proposal is that we remove the QueueSizeBasedExecutor entirely and we simply offload every task to the executor, unconditionally. It's up to the provided executor to determine what to do when execute is called. That is the pluggable behaviour. Lucene should not have strong opinions nor provide building blocks for how to execute concurrent tasks.

Additionally, I think that we should unconditionally offload execution to the executor when available, even when we have a single slice. It may seem counter intuitive but it's again to be able to determine what type of workload each thread pool performs.

Relates to #12438

The text was updated successfully, but these errors were encountered:

javanna · 2023-08-10T14:45:47Z

@jpountz @mikemccand pinging you two because you are likely to have thoughts on this.

IndexSearcher supports parallelizing search across slices when an executor is available. Knn queries can also parallelize their rewrite across segments using the same executor. Potentially, other operations will be parallelized in the future using the same pattern. Lucene currently has the notion of a TaskExecutor (previously named SliceExecutor, but because knn rewrite is parallelized across segments rather than slices, it was recently renamed to TaskExecutor) which is responsible for offloading tasks to the executor, wait for them all to complete and return the corresponding results. IndexSearcher currently has an instanceof check of the provided executor, and if it's a ThreadPoolExecutor, which is likely, a QueueSizeBasedExecutor is used which tries to be smart about when to offload tasks to the executor vs when to execute them on the caller thread. This behaviour is not configurable in any way. As part of exposing concurrent search in Elasticsearch, we have found that the current task executor is too opinionated, and either it needs to be customizable, or simplified to be less opinionated. In a server-side search engine, the search itself is likely executed by a thread pool that has its own sizing, queueing mechanisms as well as rejection policy. When offloading requests to an executor for additional parallelism, it is important to be able to determine where the load is going to be and what type of workload it maps to. Ideally, the caller thread is merely coordinating and not doing any I/O nor CPU intensive work, that is instead all delegated to the separate worker threads. Having built-in rules for when to execute certain operations on the caller thread may cause more problems than it solves, as it is unpredictable and makes sizing of thread pools more difficult, because all of a sudden you end up with two thread pools that may execute I/O as well as CPU intensive operations. My conclusion is that if flexibility is needed in terms of possibly executing on the caller thread, such behaviour can be included in the executor that is provided to the searcher (for example with an adaptive mechanism that conditionally executes directly instead of offloading based on queue size like QueueSizeBasedExecutor does), together with its saturation policy (as opposed to catching RejectedExecutionException and executing on the caller thread, which is potentially dangerous). Also, executing the last slice / task on the caller thread, as it's the one waiting for all the tasks to complete, is not necessarily addressing a real problem around under-utilization of a thread that is doing nothing. That wait is cheap, and it's possibly more important to divide the workload among the thread pools. That said, such behavior can also be included in the Executor itself and does not require additional extension points. My proposal is that we remove the QueueSizeBasedExecutor entirely and we simply offload every task to the executor, unconditionally. It's up to the provided executor to determine what to do when execute is called. That is the pluggable behaviour. Lucene should not have strong opinions nor provide building blocks for how to execute concurrent tasks. Relates to apache#12498

jpountz · 2023-08-11T20:31:12Z

It makes sense to me to push the responsibility of figuring out how to execute tasks to the executor. Also pinging @reta.

reta · 2023-08-11T21:06:47Z

It makes sense to me to push the responsibility of figuring out how to execute tasks to the executor. Also pinging @reta.

Thanks @jpountz , I second that

Additionally, I think that we should unconditionally offload execution to the executor when available, even when we have a single slice. It may seem counter intuitive but it's again to be able to determine what type of workload each thread pool performs.

That's is one of the difficulties we are dealing as well, specifically the exception branching logic has to account for wrapped / unwrapped exceptions.

javanna · 2023-08-17T07:56:57Z

Thanks for the feedback everyone, there's a PR up as a first step (does not include single slice offloading yet): #12499 . Reviews are welcome.

This commit removes the QueueSizeBasedExecutor (package private) in favour of simply offloading concurrent execution to the provided executor. In need of specific behaviour, it can all be included in the executor itself. This removes an instanceof check that determines which type of executor wrapper is used, which means that some tasks may be executed on the caller thread depending on queue size, whenever a rejection happens, or always for the last slice. This behaviour is not configurable in any way, and is too rigid. Rather than making this pluggable, I propose to make Lucene less opinionated about concurrent tasks execution and require that users include their own execution strategy directly in the executor that they provide to the index searcher. Relates to #12498

When an executor is set to the IndexSearcher, we shoudl try and offload most of the computation to such executor. Ideally, the caller thread would only do light coordination work, and the executor is responsible for the heavier workload. If we don't offload sequential execution to the executor, it becomes very difficult to make any distinction about the type of workload performed on the two sides. Closes apache#12498

When an executor is set to the IndexSearcher, we should try and offload most of the computation to such executor. Ideally, the caller thread would only do light coordination work, and the executor is responsible for the heavier workload. If we don't offload sequential execution to the executor, it becomes very difficult to make any distinction about the type of workload performed on the two sides. Closes apache#12498

Jeevananthan-23 · 2023-09-01T15:03:15Z

Hi @javanna I opened #12531 for async task processing, It could be great to use VirtualThreads(JDK21) for IndexSearch which able concurrent tasks execution similar to Executor to run in ThreadPool Carrier Thread rather it runs in lightweight VirtualThreads.

uschindler · 2023-09-04T09:57:45Z

You can use virtual threads out of box for IndexSearcher. Just pass a suitable executor: Executors.newVirtualThreadPerTaskExecutor()

When an executor is set to the IndexSearcher, we should try and offload most of the computation to such executor. Ideally, the caller thread would only do light coordination work, and the executor is responsible for the heavier workload. If we don't offload sequential execution to the executor, it becomes very difficult to make any distinction about the type of workload performed on the two sides. Closes #12498

javanna added the discussion Discussion label Aug 10, 2023

javanna mentioned this issue Aug 10, 2023

Simplify task executor for concurrent operations #12499

Merged

javanna closed this as completed in da89415 Sep 5, 2023

javanna added a commit that referenced this issue Sep 5, 2023

add missing changelog entry for #12498

d62ca4a

javanna added a commit that referenced this issue Sep 5, 2023

add missing changelog entry for #12498

460b27c

sohami mentioned this issue Sep 14, 2023

Update Apache Lucene to 9.8.0-snapshot-95cdd2e opensearch-project/OpenSearch#10031

Merged

6 tasks

zhaih added this to the 9.8.0 milestone Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify task executor for concurrent operations #12498

Simplify task executor for concurrent operations #12498

javanna commented Aug 10, 2023 •

edited

Loading

javanna commented Aug 10, 2023

jpountz commented Aug 11, 2023

reta commented Aug 11, 2023

javanna commented Aug 17, 2023

Jeevananthan-23 commented Sep 1, 2023

uschindler commented Sep 4, 2023 •

edited

Loading

Simplify task executor for concurrent operations #12498

Simplify task executor for concurrent operations #12498

Comments

javanna commented Aug 10, 2023 • edited Loading

javanna commented Aug 10, 2023

jpountz commented Aug 11, 2023

reta commented Aug 11, 2023

javanna commented Aug 17, 2023

Jeevananthan-23 commented Sep 1, 2023

uschindler commented Sep 4, 2023 • edited Loading

javanna commented Aug 10, 2023 •

edited

Loading

uschindler commented Sep 4, 2023 •

edited

Loading