Fix race condition in KubernetesTaskRunner when task is added to the map#14643
Fix race condition in KubernetesTaskRunner when task is added to the map#14643kfaraz merged 6 commits intoapache:masterfrom
Conversation
|
I feel that re-introducing locks is a step backwards. If the changes in #14435 seem to be causing trouble, we should just revert that commit rather than introduce a new kind of locking. This would also make sense given the upcoming release of Druid 27. Alternatively, IIUC, the problem here is that while one thread is in the middle of adding the work item using There are two easy ways to avoid that (unless I am missing something): Option 1: Set
|
|
The problem of option 1 is the constructor of In this case I prefer to revert last commit. Though synchronized on tasks in all the operations seems overkill as it's already a ConcurrentHashMap but it makes the class more thread-safe (in theory). |
|
I agree, we should not update the super class. The Since we have a specific use case, it is okay to override the default behaviour. You would need to maintain the If |
Yeah, it is certainly thread-safe (even in practice) to have everything be synchronized under the same lock but it also limits performance. A single bad call can potentially block everything. We are trying to revisit all usages of locks in the Druid code base, and the original PR #14435 was in this same vein of improving performance. |
|
@kfaraz @YongGang i think it it makes sense to revert removing the locks since this is a valid synchronization use case (the main thread inserting into the map has to happen before the worker thread reads from the map). i think we can keep the non synchronized getRunningTasks/getKnownTasks/getPendingTasks from the other PR though. edit: hmm just saw your other comment, let me look at that solution first. i think solution 1 makes sense. there is still a concurrent access where the main thread is trying to set result while the worker thread is trying to get/set kubernetesPeonLifecycle but I afaik i don't think that should cause any issues. i am wondering if there will be a issue with having a work item with no result future attached to it though? that might be the reason the constructor is the way it is. this could be tested out though i think. |
|
it seems to me like there are some methods in TaskQueue that assume that statusFuture will not be null, if we want to go with option 1 i think we need to filter getKnownTasks, getPendingTasks to check that result is not null. |
|
after thinking about this some more I feel like it's simpler to just leave the synchronized block in for run/joinAsync (run only happens once when tasks are run and joinAsync only happens once when the overlord is restarted), and in doTask (only happens once in worker threads). we can leave getPendingTasks, getKnownTasks, getRunningTasks unsynchronized and I think that will prevent most of the lock contention. i think this is a safer solution for druid 27 and we can maybe investigate this change to remove synchronization entirely after doing some more scale testing to see how much it helps. @kfaraz what do you think? |
@georgew5656 , @YongGang, if you prefer having the synchronized for now, then we can proceed with it. 👍🏻
This is not a blocker for Druid 27 as the bug is in a contrib extension. But yes, it would be better to have it addressed.
Removing this synchronization might not help a lot in performance as the critical section is very small and threads would not be blocked for long. The primary reason I did not prefer it was to have homogeneity in the code. It gets confusing to have synchronization in some places and not in the other places. And if we have synchronization in all places, then it does start affecting performance. |
If we agree on this then I will discard the changes in this PR (the Option 1 solution) and work on partially revert this change #14435 |
Done. Please have a look @georgew5656 @kfaraz |
| tasks.computeIfAbsent(task.getId(), k -> new KubernetesWorkItem(task, exec.submit(() -> doTask(task, run)))); | ||
| return tasks.get(task.getId()).getResult(); |
There was a problem hiding this comment.
this can be a single chained statement. Also it would look more readable if the arguments to computeIfAbsent were on a different line as in the original method.
| throw new ISE("Task [%s] disappeared", task.getId()); | ||
| } | ||
| if (workItem == null) { | ||
| throw new ISE("Task [%s] disappeared", task.getId()); |
There was a problem hiding this comment.
Should we maybe just return a failed TaskStatus here instead of throwing an exception? The exception thrown by this method may or may not be handled by the calling code, but no point depending on that if we already know that the reason for the task failure.
We should do the same thing in the catch block too.
But this doesn't need to be done as a part of this PR, just wanted to call it out.
There was a problem hiding this comment.
I looked at what other TaskRunners do so we can have consistent behavior (throw error or return task failure).
In ThreadingTaskRunner, seems the code is similar to what we have here.
There was a problem hiding this comment.
Makes sense, we can revisit this later.
| try { | ||
| Task task = adapter.toTask(job); | ||
| tasks.add(Pair.of(task, joinAsync(task))); | ||
| restoredTasks.add(Pair.of(task, runOrJoinTask(task, false))); |
There was a problem hiding this comment.
I preferred the original separation between joinTask and runTask. Passing a boolean is cryptic and makes the code less readable.
There was a problem hiding this comment.
Updated. My thinking was since we added synchronized block (or lock in the previous commit), it's better to have a single place/method to have this complexity. But agree this may make the code less readable.
There was a problem hiding this comment.
Yeah, that is why I didn't give this feedback originally. But upon reading the code again, it felt cleaner to have them separate.
| } | ||
|
|
||
| protected ListenableFuture<TaskStatus> joinAsync(Task task) | ||
| protected ListenableFuture<TaskStatus> runOrJoinTask(Task task, boolean run) |
There was a problem hiding this comment.
It was better to have this as two separate methods, seemed more readable and easy to understand.
There was a problem hiding this comment.
yeah agree, we can just put the synchronized block back in in (revert parts of that previous pr i made to remove them) without changing any function names
There was a problem hiding this comment.
Done. My comment as above:
My thinking was since we added synchronized block (or lock in the previous commit), it's better to have a single place/method to have this complexity. But agree this may make the code less readable.
|
Some checks were skipped, re-triggering them. |
…map (apache#14643) Changes: - Fix race condition in KubernetesTaskRunner introduced by apache#14435 - Perform addition and removal from map inside a synchronized block - Update tests
Description
Seems there is a multi-threading issue introduced from this change to KubernetesTaskRunner #14435
Following exception was thrown under high load:
It's due to task is added to
tasksmap from the main thread and indoTask(called byrunTask) it will check task existence from a pool thread thus caused race condition as shown in the following code:In this PR we changed the KubernetesWorkItem constructor to allow TaskStatusFuture set by a method after the instance has been initialized.
Release note
This PR has: