[SPARK-9552] Add force control for killExecutors to avoid false killing for those busy executors #7888

GraceH · 2015-08-03T06:41:20Z

By using the dynamic allocation, sometimes it occurs false killing for those busy executors. Some executors with assignments will be killed because of being idle for enough time (say 60 seconds). The root cause is that the Task-Launch listener event is asynchronized.

For example, some executors are under assigning tasks, but not sending out the listener notification yet. Meanwhile, the dynamic allocation's executor idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the same time.

the timer expiration starts before the listener event arrives.
Then, the task is going to run on top of that killed/killing executor. It will lead to task failure finally.

Here is the proposal to fix it. We can add the force control for killExecutor. If the force control is not set (i.e., false), we'd better to check if the executor under killing is idle or busy. If the current executor has some assignment, we should not kill that executor and return back false (to indicate killing failure). In dynamic allocation, we'd better to turn off force killing (i.e., force = false), we will meet killing failure if tries to kill a busy executor. And then, the executor timer won't be invalid. Later on, the task assignment event arrives, we can remove the idle timer accordingly. So that we can avoid false killing for those busy executors in dynamic allocation.

For the rest of usages, the end users can decide if to use force killing or not by themselves. If to turn on that option, the killExecutor will do the action without any status checking.

SparkQA · 2015-08-03T08:59:44Z

Test build #39524 has finished for PR 7888 at commit 4acbd79.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-08-03T09:17:05Z

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

-        initializing = false
-        removeExecutor(executorId)
+        expired = removeExecutor(executorId)
+        if (expired)  initializing = false


Without commenting on the validity of the change, you have some style problems, like this needing to be in a block in braces

I have done the style check before committing. sorry for missing the if block here. I will fix that.

CodingCat · 2015-08-03T12:24:29Z

@GraceH , from the patch, I didn't see how the user to pass force flag when calling API? killExecutors is a function in private[spark] class, i.e. it is not supposed to be touched by the end user

GraceH · 2015-08-04T01:02:09Z

@CodingCat What I mean is to add the force control in the killExecutors API. Currently, the dynamic allocation is using that API with force=false (I suppose we should not kill working executors in Dynamic allocation). And for others, they are free to use that option as true or false. If they really want to do that, they can call the private API by setting that as true.

Regarding the public API for the users, we'd better have a discussion if to add a new public API (it is a little bit out of this PR's scope). From my perspective, to modify the exiting public API is not a good idea. It may cause compatibility issue. What do you think?

squito · 2015-08-04T01:06:02Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

-  final def killExecutors(executorIds: Seq[String], replace: Boolean): Boolean = synchronized {
+  final def killExecutors(executorIds: Seq[String],
+                          replace: Boolean,
+                          force: Boolean): Boolean = synchronized {


nit: style for multiline method defs is each arg on its own line, double indented, so:

final def killExecutors( executorIds: Seq[String], replace: Boolean, force: Boolean): Boolean = synchronized {

Thanks. I will fix that. :)

CodingCat · 2015-08-04T01:06:56Z

I see... I was just confused by

For the rest of usages, the end users can decide if to use force killing or not by themselves. If to turn on that option, the killExecutor will do the action without any status checking.

I thought end users mean Spark users, so when they call sc. killExecutors, they can indicate if they want force kill....I agree to leave this flag within Spark

GraceH · 2015-08-04T01:11:29Z

@CodingCat Sorry for the ambiguous words in the description. In general, the patch aims to fix the false killing bug in dynamic allocation. And at the same time, we leave a chance to have more options in killExecutors.

SparkQA · 2015-08-04T02:33:47Z

Test build #39625 has finished for PR 7888 at commit 1c7ae2b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RowOrdering(ordering: Seq[SortOrder]) extends Ordering[InternalRow]

GraceH · 2015-08-05T05:06:19Z

It seems the test failure not related to this PR

andrewor14 · 2015-09-03T18:39:30Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

-    doKillExecutors(knownExecutors)
+    executorsPendingToRemove ++= idleExecutors
+    // return false: there has some busyExecutors or killing certain executor failed
+    doKillExecutors(idleExecutors) && idleExecutors.size == knownExecutors.size


Note the semantics of the return value. All it says is whether the request is ack'ed by the cluster manager, not whether the kill will actually happen. We should keep the original return value.

OK. will do.

@andrewor14 The problems here are,

if there is not idleExecutor to be killed, shall we return back with acknowledged?

It is quite tricky to have the force control for killExecutors. For example, we have 3 executors to kill. But only one of them are idle. Shall we return true to the end user?

andrewor14 · 2015-09-03T18:55:26Z

Hi @GraceH thanks for fixing this problem. I agree with the problem statement and the root cause. However, there are two outstanding issues with the solution:

(1) ExecutorAllocationManager currently assumes that all remove requests will eventually be fulfilled. This assumption is no longer true as of this patch if the executor turns out to be busy. In this case we need to somehow remove it from executorsPendingToRemove. We can do this, for instance, when we get a TaskStart and/or TaskEnd event for that executor.

(2) Currently we never set force = true right? I think we should set it to true if the user explicitly calls sc.killExecutor. However, to distinguish between that case and dynamic allocation, we'll need to add more private[spark] methods to ExecutorAllocationClient that accept the force flag. Then we'll have the ExecutorAllocationManager call the internal methods and explicitly pass force = false.

What do you think?

GraceH · 2015-09-07T01:02:09Z

@andrewor14 Thanks for the feedback. I will take a look at your comments, and to revise the code accordingly. any concern, will let you know.

GraceH · 2015-09-07T02:30:56Z

@andrewor14 Thanks for the comments.

Regarding #1, very good point. That's why I try to return back false if force-killing failed. This is the simplest way. That executorID won't be added to executorsPendingToRemove.add(executorId). See https://github.com/GraceH/spark/blob/forcekill/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L410. The only concern here is that it somehow changes the semantics for that return value.

Regarding #2, Nice suggestion. That's also my thoughts too. The end user can force kill any executor by setting force=true. I will make private function for ExecutorAllocationManager

GraceH · 2015-09-08T14:16:00Z

@andrewor14 I have pushed another proposal. Please let me know your comments.

The SparkContext allows end-user to set force control while killExecutor(s). Dynamic allocation will always uses force control as false to avoid false killing while executor is busy.
The killExectutors log out some executor busy, and cannot be killed if `force==false
The killExectutors return back acknowledge no matter it has executor to kill
The onTaskStart (i.e., OnExecutorBusy) will rescuer certain executor from pendingToRemove list if it is busy and added to that list by misjudgment.
Add one HashMap for all those activeExecutors, which records the running/launched task number. If the task number > 0, that executor is busy. And isExecutorBusy returns back true.

SparkQA · 2015-09-08T16:44:42Z

Test build #42135 has finished for PR 7888 at commit ebb13a3.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

GraceH · 2015-10-28T15:07:27Z

@andrewor14 I have tried to rebase the original proposal to latest master branch. Please let me know if you have further question or concern. Thanks a lot.

SparkQA · 2015-10-28T17:01:05Z

Test build #44534 has finished for PR 7888 at commit c23f887.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-10-29T11:25:42Z

@vanzin can you have a look?

GraceH · 2015-11-06T06:29:53Z

Thanks @andrewor14.

Hi @vanzin, Let me give a quick brief to you about the patch and its goal.

There is a bug in dynamic allocation. Since some of the busy executors might be killed by "mistake", when we met such kind of situation in real-world deployment frequently.

The executor is being idled for 60 seconds, and just marked as to be killed by dynamic allocation criteria.
The scheduler is assigning one/several tasks to that executor. The listener event not reached that time. (since the listener event only happens after new tasks assigned asynchronously)
The executor is killed as planned. But actually, that executor is just assigned with some tasks. That causes one busy executor is killed by ”mistake".

To solve this problem, one thing is to make that task assignment and notification synchronized. But this approach is not suitable for current implementation (listener mechanism).

Here I proposed another way. To add the force control in killExecutor(). For dynamic allocation, we need to check if the executor is busy or not before really taking the kill action. By doing so, even the listen event not arrives in time, we can actively rescue certain busy executors (to be killed but with new tasks assigned). Thru dynamic allocation we should not kill those busy executors (disable force control).

And meanwhile, we open that force control to the end user (sparkcontext public API). The end user can decide whether to force kill certain executors .

Please let me understand your thoughts. Thanks a lot.

vanzin · 2015-11-07T00:16:45Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

@@ -88,7 +88,8 @@ private[spark] class TaskSchedulerImpl(
  val nextTaskId = new AtomicLong(0)

  // Which executor IDs we have executors on
-  val activeExecutorIds = new HashSet[String]
+  // each executor will record running or launched task number
+  val activeExecutorIdsWithLoads = new HashMap[String, Int]


nit: instead of WithLoads, WithTasks or WithTaskCount?

vanzin · 2015-11-07T00:21:59Z

I'm not a huge fan of changing the public API, especially since it breaks binary compatibility (even if annotated with @DeveloperApi)... would it be possible to not change SparkContext and add the new parameter just to the internal classes?

GraceH · 2015-11-09T05:02:36Z

Thanks @vanzin for the comments. I will change the stuffs accordingly.

SparkQA · 2015-11-13T03:34:39Z

Test build #45816 has finished for PR 7888 at commit 342a59d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

GraceH · 2015-11-13T03:49:00Z

@vanzin and @andrewor14 , please let me know your further imports. sorry for certain rounds of amendments.

SparkQA · 2015-11-13T04:00:57Z

Test build #45822 has finished for PR 7888 at commit 4ce0ec0.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-13T06:54:10Z

Test build #45829 has finished for PR 7888 at commit 0000551.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-11-13T17:54:46Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

@@ -87,8 +87,8 @@ private[spark] class TaskSchedulerImpl(
  // Incrementing task IDs
  val nextTaskId = new AtomicLong(0)

-  // Which executor IDs we have executors on
-  val activeExecutorIds = new HashSet[String]
+  // Number of tasks runing on each executor


oops. sorry for that.

andrewor14 · 2015-11-13T18:55:33Z

@GraceH I think the new code in CoarseGrainedSchedulerBackend now looks kind of strange just to make the tests pass. I have suggested an alternative of fixing the test in a PR against your branch. Please have a look: GraceH#1

Suggestions for PR 7888

GraceH · 2015-11-17T00:56:27Z

@andrewor14 That is really a good way to have mock busy status. Thanks a lot, really learn a lot from that.

GraceH · 2015-11-17T01:09:39Z

@vanzin Also thanks for helping me to clarify the thoughts for acknowledgement part.

vanzin · 2015-11-17T01:13:29Z

retest this please

SparkQA · 2015-11-17T02:45:10Z

Test build #46039 has finished for PR 7888 at commit 0daeb5a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

GraceH · 2015-11-17T08:26:29Z

@andrewor14 My bad. Since the val executors = getExecutorIds(sc) is fetched beforehand. We should not kill executors.head again and again (it should be executor.head, and then executor(1)). Now, i change the sequence of that.

set force = false to ignore the first executor
set force = true to force kill that first executor.

Now it should work.

SparkQA · 2015-11-17T10:48:53Z

Test build #46073 has finished for PR 7888 at commit 1938e61.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-11-17T18:45:11Z

retest this please

SparkQA · 2015-11-17T20:39:38Z

Test build #46101 has finished for PR 7888 at commit 1938e61.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-11-17T23:42:35Z

Ok, LGTM. I'm merging this into master and 1.6.

We can fix the thing @vanzin pointed out (about not adding the executor to executorsPendingToRemove in the first place) later if necessary. It does simplify the logic a little so I'm slightly in favor of doing it too. @GraceH if you have time could you address that in a separate patch using the same JIRA?

…ng for those busy executors By using the dynamic allocation, sometimes it occurs false killing for those busy executors. Some executors with assignments will be killed because of being idle for enough time (say 60 seconds). The root cause is that the Task-Launch listener event is asynchronized. For example, some executors are under assigning tasks, but not sending out the listener notification yet. Meanwhile, the dynamic allocation's executor idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the same time. 1. the timer expiration starts before the listener event arrives. 2. Then, the task is going to run on top of that killed/killing executor. It will lead to task failure finally. Here is the proposal to fix it. We can add the force control for killExecutor. If the force control is not set (i.e., false), we'd better to check if the executor under killing is idle or busy. If the current executor has some assignment, we should not kill that executor and return back false (to indicate killing failure). In dynamic allocation, we'd better to turn off force killing (i.e., force = false), we will meet killing failure if tries to kill a busy executor. And then, the executor timer won't be invalid. Later on, the task assignment event arrives, we can remove the idle timer accordingly. So that we can avoid false killing for those busy executors in dynamic allocation. For the rest of usages, the end users can decide if to use force killing or not by themselves. If to turn on that option, the killExecutor will do the action without any status checking. Author: Grace <jie.huang@intel.com> Author: Andrew Or <andrew@databricks.com> Author: Jie Huang <jie.huang@intel.com> Closes #7888 from GraceH/forcekill. (cherry picked from commit 965245d) Signed-off-by: Andrew Or <andrew@databricks.com>

GraceH · 2015-11-18T04:54:04Z

@andrewor14 @vanzin Thanks all. I will follow that by creating a new patch under SPARK-9552.

GraceH changed the title ~~Add force control for killExecutors to avoid false killing for those busy executors~~ [SPARK-9552] Add force control for killExecutors to avoid false killing for those busy executors Aug 3, 2015

srowen reviewed Aug 3, 2015
View reviewed changes

squito reviewed Aug 4, 2015
View reviewed changes

andrewor14 reviewed Sep 3, 2015
View reviewed changes

GraceH force-pushed the forcekill branch from 41a06f4 to ebb13a3 Compare September 8, 2015 14:08

GraceH force-pushed the forcekill branch from ebb13a3 to c23f887 Compare October 28, 2015 15:04

rebase to master branch

c23f887

vanzin reviewed Nov 7, 2015
View reviewed changes

refine the unit test & change semantics for force == true only

806a64d

remove unnecessary imports

4ce0ec0

fix checkstyle issue

0000551

andrewor14 reviewed Nov 13, 2015
View reviewed changes

Suggestions

c44ef87

Andrew Or and others added 2 commits November 13, 2015 15:02

Clean up state in ExecutorAllocationManager

d3f51db

Merge pull request #1 from andrewor14/pr-7888-suggestions

0daeb5a

Suggestions for PR 7888

fix unittest issue

1938e61

asfgit closed this in 965245d Nov 17, 2015

[SPARK-9552] Add force control for killExecutors to avoid false killing for those busy executors #7888

[SPARK-9552] Add force control for killExecutors to avoid false killing for those busy executors #7888

Conversation

GraceH commented Aug 3, 2015

SparkQA commented Aug 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CodingCat commented Aug 3, 2015

GraceH commented Aug 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CodingCat commented Aug 4, 2015

GraceH commented Aug 4, 2015

SparkQA commented Aug 4, 2015

GraceH commented Aug 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 commented Sep 3, 2015

GraceH commented Sep 7, 2015

GraceH commented Sep 7, 2015

GraceH commented Sep 8, 2015

SparkQA commented Sep 8, 2015

GraceH commented Oct 28, 2015

SparkQA commented Oct 28, 2015

andrewor14 commented Oct 29, 2015

GraceH commented Nov 6, 2015

Choose a reason for hiding this comment

vanzin commented Nov 7, 2015

GraceH commented Nov 9, 2015

SparkQA commented Nov 13, 2015

GraceH commented Nov 13, 2015

SparkQA commented Nov 13, 2015

SparkQA commented Nov 13, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 commented Nov 13, 2015

GraceH commented Nov 17, 2015

GraceH commented Nov 17, 2015

vanzin commented Nov 17, 2015

SparkQA commented Nov 17, 2015

GraceH commented Nov 17, 2015

SparkQA commented Nov 17, 2015

andrewor14 commented Nov 17, 2015

SparkQA commented Nov 17, 2015

andrewor14 commented Nov 17, 2015

GraceH commented Nov 18, 2015