Prevent concurrent tasks from parallelizing further #12569

javanna · 2023-09-18T18:36:13Z

Concurrent search is currently applied once per search call, either when search is called, or when concurrent query rewrite happens. They generally don't happen within one another. There are situations in which we are going to introduce parallelism in places where there could be multiple inner levels of parallelism requested as each task could try to parallelize further. In these cases, with certain executor implementations, like ThreadPoolExecutor, we may deadlock as we are waiting for all tasks to complete but they are waiting for threads to free up to complete their execution.

This commit introduces a simple safeguard that makes sure that we only parallelize via the executor at the top-level invokeAll call. When each task tries to parallelize further, we just execute them directly instead of submitting them to the executor.

Concurrent search is currently applied once per search call, either when search is called, or when concurrent query rewrite happens. They generally don't happen within one another. There are situations in which we are going to introduce parallelism in places where there could be multiple inner levels of parallelism requested as each task could try to parallelize further. In these cases, with certain executor implementations, like ThreadPoolExecutor, we may deadlock as we are waiting for all tasks to complete but they are waiting for threads to free up to complete their execution. This commit introduces a simple safeguard that makes sure that we only parallelize via the executor at the top-level invokeAll call. When each task tries to parallelize further, we just execute them directly instead of submitting them to the executor.

shubhamvishu · 2023-09-18T21:02:56Z

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

+ * This is to prevent deadlock with certain types of executors, as well as to limit the level of
+ * parallelism.


If have a question here, since we are limiting the levels of parallelism to 1 will we be not affecting the executor implementations that won't deadlock due to this and instead wants to make use of multi level parallelism instead. Will we not be restricting those opportunities and maybe in future we would also want to multi-level parallelism?

That's a good question. I assumed that once we are in a concurrent task, parallelizing further may cause more harm than good, especially given the "block and wait" for all tasks to be completed. In practice, I wonder what other executor implementations are desirable to provide to the index searcher that are not thread pool based: say that you have an executor that creates a new thread for each execute request (does not seem like a good idea anyways), would we be ok allowing for more than one level of parallelism? Is that a valid usecase?

I would remove the bit about limiting the level of parallelism. I don't think it's a goal, mostly a side effect of the logic to avoid deadlocks.

It's true that this might hurt executors that are not subject to deadlocks, but I would be very surprised if there were many users relying on it today since it can only happen when running a rewrite or a search from a rewrite or a search, which is not typical.

So it seems its thats not super important based on current use cases but maybe in future there are cases where we would like to have something like this i.e. to allow higher level of parallelism?. But I agree with @jpountz based on current scenario it might be a good choice to remove this bit instead and block directly if it is a exceutor thread.

shubhamvishu · 2023-09-18T21:22:44Z

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

-  final <T> List<T> invokeAll(Collection<RunnableFuture<T>> tasks) throws IOException {
-    for (Runnable task : tasks) {
-      executor.execute(task);
+  final <T> List<T> invokeAll(Collection<Task<T>> tasks) throws IOException {


Just a thought : To make us of this for concurrent search should we somehow ensure the caller always uses TaskExecutor's invokeAll instead of directly submitting the tasks to indexSearcher.getExecutor().execute(). Do we wish to enforce that somehow or leave it upto the user to decide how they would like to implement? I don't see any other usages of IndexSearcher#getExecutor in the codebase, though since its a public API I'm not sure if it would be right to deprecate this in this case. Appreciate any views on this.

Good point: I'd consider deprecating getExecutor in favour of always going through the TaskExecutor.

shubhamvishu · 2023-09-18T21:39:10Z

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

 */
 class TaskExecutor {
+  private static final ThreadLocal<Boolean> isConcurrentTask = ThreadLocal.withInitial(() -> false);


Instead of boolean should we allow the user to also pass the level of parallelism (L) which they would like to use with defaulting to 1 always like in this case? i.e. if any executor implementation(safeguarded from the deadlock situation) wants to make use of further concurrency they should be able to set that via an API in TaskExecutor?

I would keep things simple and not allow this for now, but I'd like to hear what others think.

I have a few problems:

it is a static threadlocal, which is fine as it is unlikely that several TaskExecutors use the same threadpool. But there could be problems if you create two different TaskExecutors both with the same ThreadPool. In that case the TaskExecutors are no longer decoupled (one affects the other). It might not be a problem at all, but keep that in mind.

in the case of different TaskExecutors one task would set the thread local to false in its finally block, this may cause deadlock in the other TaskExecutor using same thread pool.

In addition the name isConcurrentTask is misleading, as the idea is to prevent more concurrent tasks from being executed in thread pool. It should maybe called "runSameThread".

To also support higher parallelism than 1, I'd change this to ThreadLocal<Integer> and increment on starting task and decrement in the finally. Then you could have a logic like "run in same thread if current value >=parallelism". This would also prevent the issues above, because when entering the run method it is incremented and when exiting it is decremented, so different executors can't confuse the other.

In general I am not fully happy with using a ThreadLocal here at all. Would it not be better to pass around the Task instance and the task instance has a method to spawn a subtask? This would be similar to fork/join framework where the RecursiveTask is used for exectly that.

IMHO, we should really switch to fork/join, as we need work stealing algorithms to prevent deadlocks.

I am aware of some of the subtleties of thread locals. I started with a solution that did not use them but all in all it would do the same that thread locals do :) (e.g. keeping track of which thread runs what in a map).

I think that if we make sure that a single task executor is created, which is currently the case, we should be ok?

I am happy to address the naming as suggested, I was also not super happy with it.

I will play with the counter idea, though I don't think that allowing multiple levels of parallelism is required? Is that necessary in your opinion?

Would it not be better to pass around the Task instance and the task instance has a method to spawn a subtask?

I spent quite some time debating this with myself as well. The problem is that invokeAll does not have the current task. It may or may not be executed as part of a task. It looks like making the task available (without using thread locals!) would mean carrying the task around in many places, unless I am missing another way.

Thanks for all the feedback Uwe!

Hi, the latest commit looks fine, as we have at least not the binary thread local anymore.

Small suggestion: Do we have a MutableInt class availble in Lucene? It would make it easier to decrement/increment. An alternative is to use ThreadLocal<int[]> with a one-length array. This would also prevent autoboxing. Just initialize with:

private static final ThreadLocal<int[]> runSameThread = ThreadLocal.withInitial(() -> new int[1]);

and use like this:

final int[] counter = runSameThread.get(); counter[0]++; try { .... } finally { counter[0]--; }

My problem was mainly if external code like Elasticsearch passes a shared thread pool to multiple IndexSearchers (like different indexes on same node using same "searcher" thread pool).

Would it help? I would expect this number to always be 0 or 1, maybe 2 in rare cases, so we'd use the JVM's cached Integer instances?

Of course. The idea was that we'd only read the thread local once to get the mutable instance. Also to me code looks better. It was only a suggestion....

shubhamvishu · 2023-09-18T21:43:27Z

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

Does it make sense to move TaskExecutor under org.apache.lucene.util package as that seems like more suitable place to me for this sort of class and there could more probable usages of this class to achieve concurrency in various parts of codebase?

I am not entirely sure: one aspect is that I'd like to make sure that there is a single instance of TaskExecutor, which IndexSearcher creates. All other usages should go through the existing instance retrieved from the IndexSearcher. Does that seem like a reasonable expectation?

IndexSearcher should own the TaskExecutor and all queries/collectors can use it.

As this would change public methods, why not move to work-stealing fork/join here?

which public methods require changing? As far as I understand visibility of TaskExecutor needs to become publkic from package private, but that's not a breaking change?

why not move to work-stealing fork/join here?

There were concerns raised by others around that not being a good fit for I/O intensive operation. Also it's a bigger change and this change looked acceptable to move forward with #12183 .

uschindler

I am not happy with how the thread local is setup. It should be incrementing/decrementing instead binary true/false

uschindler · 2023-09-19T12:32:42Z

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

 */
 class TaskExecutor {
+  private static final ThreadLocal<Boolean> isConcurrentTask = ThreadLocal.withInitial(() -> false);


I have a few problems:

it is a static threadlocal, which is fine as it is unlikely that several TaskExecutors use the same threadpool. But there could be problems if you create two different TaskExecutors both with the same ThreadPool. In that case the TaskExecutors are no longer decoupled (one affects the other). It might not be a problem at all, but keep that in mind.

in the case of different TaskExecutors one task would set the thread local to false in its finally block, this may cause deadlock in the other TaskExecutor using same thread pool.

In addition the name isConcurrentTask is misleading, as the idea is to prevent more concurrent tasks from being executed in thread pool. It should maybe called "runSameThread".

To also support higher parallelism than 1, I'd change this to ThreadLocal<Integer> and increment on starting task and decrement in the finally. Then you could have a logic like "run in same thread if current value >=parallelism". This would also prevent the issues above, because when entering the run method it is incremented and when exiting it is decremented, so different executors can't confuse the other.

In general I am not fully happy with using a ThreadLocal here at all. Would it not be better to pass around the Task instance and the task instance has a method to spawn a subtask? This would be similar to fork/join framework where the RecursiveTask is used for exectly that.

IMHO, we should really switch to fork/join, as we need work stealing algorithms to prevent deadlocks.

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

uschindler · 2023-09-19T12:40:39Z

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

IndexSearcher should own the TaskExecutor and all queries/collectors can use it.

As this would change public methods, why not move to work-stealing fork/join here?

jpountz

I left some comments. In general I like this approach better than walking the stack.

jpountz · 2023-09-19T15:38:52Z

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

+ * This is to prevent deadlock with certain types of executors, as well as to limit the level of
+ * parallelism.


I would remove the bit about limiting the level of parallelism. I don't think it's a goal, mostly a side effect of the logic to avoid deadlocks.

It's true that this might hurt executors that are not subject to deadlocks, but I would be very surprised if there were many users relying on it today since it can only happen when running a rewrite or a search from a rewrite or a search, which is not typical.

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

Co-authored-by: Adrien Grand <jpountz@gmail.com>

javanna · 2023-09-20T08:51:35Z

I pushed new commits to address the latest review comments, thanks for all the input. This should be ready now.

jpountz

LGTM. It might be worth using CallerRunsPolicy with a small queue in tests sometimes, as this is an interesting case that will make tasks run in the current thread.

javanna · 2023-09-20T09:59:17Z

It might be worth using CallerRunsPolicy with a small queue in tests sometimes, as this is an interesting case that will make tasks run in the current thread.

Given that TaskExecutor runs in the caller thread when the counter is above zero, I was thinking that we already cover the situation where we increment the counter for the caller thread. And that has no effect other than executing in the caller thread any further task.

Concurrent search is currently applied once per search call, either when search is called, or when concurrent query rewrite happens. They generally don't happen within one another. There are situations in which we are going to introduce parallelism in places where there could be multiple inner levels of parallelism requested as each task could try to parallelize further. In these cases, with certain executor implementations, like ThreadPoolExecutor, we may deadlock as we are waiting for all tasks to complete but they are waiting for threads to free up to complete their execution. This commit introduces a simple safeguard that makes sure that we only parallelize via the executor at the top-level invokeAll call. When each task tries to parallelize further, we just execute them directly instead of submitting them to the executor. Co-authored-by: Adrien Grand <jpountz@gmail.com>

javanna marked this pull request as ready for review September 18, 2023 18:36

javanna requested a review from jpountz September 18, 2023 18:36

javanna added this to the 9.8.0 milestone Sep 18, 2023

javanna added the type:enhancement label Sep 18, 2023

javanna mentioned this pull request Sep 18, 2023

Make TermStates#build concurrent #12183

Merged

shubhamvishu reviewed Sep 18, 2023

View reviewed changes

jpountz requested a review from uschindler September 19, 2023 07:08

uschindler requested changes Sep 19, 2023

View reviewed changes

address review comments

ae33ca9

jpountz reviewed Sep 19, 2023

View reviewed changes

uschindler approved these changes Sep 20, 2023

View reviewed changes

javanna and others added 4 commits September 20, 2023 09:49

Update lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

35a6960

Co-authored-by: Adrien Grand <jpountz@gmail.com>

Update lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

c87dcda

Co-authored-by: Adrien Grand <jpountz@gmail.com>

review comments

9f0c19f

changes entry

96e4619

jpountz approved these changes Sep 20, 2023

View reviewed changes

javanna merged commit 937ebd4 into apache:main Sep 20, 2023
4 checks passed

javanna removed this from the 9.8.0 milestone Sep 20, 2023

javanna added a commit that referenced this pull request Sep 20, 2023

Move change entry for #12569 from 9.8 to 9.9

2420709

javanna added a commit that referenced this pull request Sep 20, 2023

Move change entry for #12569 from 9.8 to 9.9

738fec1

javanna added this to the 9.9.0 milestone Sep 20, 2023

javanna mentioned this pull request Oct 19, 2023

Using a searcher with an executor service does not work from within a Callable called by that same executor service [LUCENE-3803] #4876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent concurrent tasks from parallelizing further #12569

Prevent concurrent tasks from parallelizing further #12569

javanna commented Sep 18, 2023

shubhamvishu Sep 18, 2023

javanna Sep 19, 2023

jpountz Sep 19, 2023

shubhamvishu Sep 19, 2023

shubhamvishu Sep 18, 2023

javanna Sep 19, 2023

shubhamvishu Sep 18, 2023

javanna Sep 19, 2023

uschindler Sep 19, 2023

javanna Sep 19, 2023 •

edited

Loading

uschindler Sep 19, 2023

jpountz Sep 19, 2023

uschindler Sep 19, 2023

shubhamvishu Sep 18, 2023

javanna Sep 19, 2023

uschindler Sep 19, 2023

javanna Sep 19, 2023

javanna Sep 19, 2023

uschindler left a comment

uschindler Sep 19, 2023

uschindler Sep 19, 2023

jpountz left a comment

jpountz Sep 19, 2023

javanna commented Sep 20, 2023

jpountz left a comment

javanna commented Sep 20, 2023

		* This is to prevent deadlock with certain types of executors, as well as to limit the level of
		* parallelism.

Prevent concurrent tasks from parallelizing further #12569

Prevent concurrent tasks from parallelizing further #12569

Conversation

javanna commented Sep 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uschindler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna commented Sep 20, 2023

jpountz left a comment

Choose a reason for hiding this comment

javanna commented Sep 20, 2023

javanna Sep 19, 2023 •

edited

Loading