-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9552] Add force control for killExecutors to avoid false killing for those busy executors #7888
Conversation
Test build #39524 has finished for PR 7888 at commit
|
initializing = false | ||
removeExecutor(executorId) | ||
expired = removeExecutor(executorId) | ||
if (expired) initializing = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without commenting on the validity of the change, you have some style problems, like this needing to be in a block in braces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done the style check before committing. sorry for missing the if
block here. I will fix that.
@GraceH , from the patch, I didn't see how the user to pass |
@CodingCat What I mean is to add the force control in the Regarding the public API for the users, we'd better have a discussion if to add a new public API (it is a little bit out of this PR's scope). From my perspective, to modify the exiting public API is not a good idea. It may cause compatibility issue. What do you think? |
final def killExecutors(executorIds: Seq[String], replace: Boolean): Boolean = synchronized { | ||
final def killExecutors(executorIds: Seq[String], | ||
replace: Boolean, | ||
force: Boolean): Boolean = synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: style for multiline method defs is each arg on its own line, double indented, so:
final def killExecutors(
executorIds: Seq[String],
replace: Boolean,
force: Boolean): Boolean = synchronized {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will fix that. :)
I see... I was just confused by
I thought |
@CodingCat Sorry for the ambiguous words in the description. In general, the patch aims to fix the false killing bug in dynamic allocation. And at the same time, we leave a chance to have more options in |
Test build #39625 has finished for PR 7888 at commit
|
It seems the test failure not related to this PR |
doKillExecutors(knownExecutors) | ||
executorsPendingToRemove ++= idleExecutors | ||
// return false: there has some busyExecutors or killing certain executor failed | ||
doKillExecutors(idleExecutors) && idleExecutors.size == knownExecutors.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the semantics of the return value. All it says is whether the request is ack'ed by the cluster manager, not whether the kill will actually happen. We should keep the original return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewor14 The problems here are,
- if there is not idleExecutor to be killed, shall we return back with acknowledged?
- It is quite tricky to have the force control for
killExecutors
. For example, we have 3 executors to kill. But only one of them are idle. Shall we return true to the end user?
Hi @GraceH thanks for fixing this problem. I agree with the problem statement and the root cause. However, there are two outstanding issues with the solution: (1) (2) Currently we never set What do you think? |
@andrewor14 Thanks for the feedback. I will take a look at your comments, and to revise the code accordingly. any concern, will let you know. |
@andrewor14 Thanks for the comments. Regarding #1, very good point. That's why I try to return back false if force-killing failed. This is the simplest way. That Regarding #2, Nice suggestion. That's also my thoughts too. The end user can force kill any executor by setting force=true. I will make private function for |
@andrewor14 I have pushed another proposal. Please let me know your comments.
|
Test build #42135 has finished for PR 7888 at commit
|
@andrewor14 I have tried to rebase the original proposal to latest master branch. Please let me know if you have further question or concern. Thanks a lot. |
Test build #44534 has finished for PR 7888 at commit
|
@vanzin can you have a look? |
Thanks @andrewor14. Hi @vanzin, Let me give a quick brief to you about the patch and its goal. There is a bug in dynamic allocation. Since some of the busy executors might be killed by "mistake", when we met such kind of situation in real-world deployment frequently.
To solve this problem, one thing is to make that task assignment and notification synchronized. But this approach is not suitable for current implementation (listener mechanism). Here I proposed another way. To add the force control in killExecutor(). For dynamic allocation, we need to check if the executor is busy or not before really taking the kill action. By doing so, even the listen event not arrives in time, we can actively rescue certain busy executors (to be killed but with new tasks assigned). Thru dynamic allocation we should not kill those busy executors (disable force control). And meanwhile, we open that force control to the end user (sparkcontext public API). The end user can decide whether to force kill certain executors . Please let me understand your thoughts. Thanks a lot. |
@@ -88,7 +88,8 @@ private[spark] class TaskSchedulerImpl( | |||
val nextTaskId = new AtomicLong(0) | |||
|
|||
// Which executor IDs we have executors on | |||
val activeExecutorIds = new HashSet[String] | |||
// each executor will record running or launched task number | |||
val activeExecutorIdsWithLoads = new HashMap[String, Int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of WithLoads
, WithTasks
or WithTaskCount
?
I'm not a huge fan of changing the public API, especially since it breaks binary compatibility (even if annotated with |
Thanks @vanzin for the comments. I will change the stuffs accordingly. |
Test build #45816 has finished for PR 7888 at commit
|
@vanzin and @andrewor14 , please let me know your further imports. sorry for certain rounds of amendments. |
Test build #45822 has finished for PR 7888 at commit
|
Test build #45829 has finished for PR 7888 at commit
|
@@ -87,8 +87,8 @@ private[spark] class TaskSchedulerImpl( | |||
// Incrementing task IDs | |||
val nextTaskId = new AtomicLong(0) | |||
|
|||
// Which executor IDs we have executors on | |||
val activeExecutorIds = new HashSet[String] | |||
// Number of tasks runing on each executor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops. sorry for that.
Suggestions for PR 7888
@andrewor14 That is really a good way to have mock busy status. Thanks a lot, really learn a lot from that. |
@vanzin Also thanks for helping me to clarify the thoughts for acknowledgement part. |
retest this please |
Test build #46039 has finished for PR 7888 at commit
|
@andrewor14 My bad. Since the
Now it should work. |
Test build #46073 has finished for PR 7888 at commit
|
retest this please |
Test build #46101 has finished for PR 7888 at commit
|
Ok, LGTM. I'm merging this into master and 1.6. We can fix the thing @vanzin pointed out (about not adding the executor to |
…ng for those busy executors By using the dynamic allocation, sometimes it occurs false killing for those busy executors. Some executors with assignments will be killed because of being idle for enough time (say 60 seconds). The root cause is that the Task-Launch listener event is asynchronized. For example, some executors are under assigning tasks, but not sending out the listener notification yet. Meanwhile, the dynamic allocation's executor idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the same time. 1. the timer expiration starts before the listener event arrives. 2. Then, the task is going to run on top of that killed/killing executor. It will lead to task failure finally. Here is the proposal to fix it. We can add the force control for killExecutor. If the force control is not set (i.e., false), we'd better to check if the executor under killing is idle or busy. If the current executor has some assignment, we should not kill that executor and return back false (to indicate killing failure). In dynamic allocation, we'd better to turn off force killing (i.e., force = false), we will meet killing failure if tries to kill a busy executor. And then, the executor timer won't be invalid. Later on, the task assignment event arrives, we can remove the idle timer accordingly. So that we can avoid false killing for those busy executors in dynamic allocation. For the rest of usages, the end users can decide if to use force killing or not by themselves. If to turn on that option, the killExecutor will do the action without any status checking. Author: Grace <jie.huang@intel.com> Author: Andrew Or <andrew@databricks.com> Author: Jie Huang <jie.huang@intel.com> Closes #7888 from GraceH/forcekill. (cherry picked from commit 965245d) Signed-off-by: Andrew Or <andrew@databricks.com>
@andrewor14 @vanzin Thanks all. I will follow that by creating a new patch under SPARK-9552. |
By using the dynamic allocation, sometimes it occurs false killing for those busy executors. Some executors with assignments will be killed because of being idle for enough time (say 60 seconds). The root cause is that the Task-Launch listener event is asynchronized.
For example, some executors are under assigning tasks, but not sending out the listener notification yet. Meanwhile, the dynamic allocation's executor idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the same time.
Here is the proposal to fix it. We can add the force control for killExecutor. If the force control is not set (i.e., false), we'd better to check if the executor under killing is idle or busy. If the current executor has some assignment, we should not kill that executor and return back false (to indicate killing failure). In dynamic allocation, we'd better to turn off force killing (i.e., force = false), we will meet killing failure if tries to kill a busy executor. And then, the executor timer won't be invalid. Later on, the task assignment event arrives, we can remove the idle timer accordingly. So that we can avoid false killing for those busy executors in dynamic allocation.
For the rest of usages, the end users can decide if to use force killing or not by themselves. If to turn on that option, the killExecutor will do the action without any status checking.