[SPARK-31478][CORE]Call `StopExecutor` before killing executors #28254

iRakson · 2020-04-18T17:37:45Z

What changes were proposed in this pull request?

Add a StopExecutor call to executors before killing them.
This revert SPARK-29152 ([SPARK-29152][CORE]Executor Plugin shutdown when dynamic allocation is enabled #26810)

A similar patch is tested here #26901

Why are the changes needed?

When executors do not goes down gracefully, their stop() method is never called. To solve this problem, a shutdown hook was used to execute the stop() method.
Instead of forcing a shutdown hook, we should just add a StopExecutor call to executors before they are killed.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually

iRakson · 2020-04-18T17:38:53Z

cc @dongjoon-hyun @vanzin
Kindly take a look.

dongjoon-hyun · 2020-04-18T20:40:11Z

Since this PR aims reverts the original patch of SPARK-29152 and add a new way, I believe @vanzin 's option is important to this.

dongjoon-hyun · 2020-04-18T20:40:26Z

ok to test

SparkQA · 2020-04-18T23:42:17Z

Test build #121459 has finished for PR 28254 at commit c4d3711.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Ngone51 · 2020-04-20T03:30:03Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

@@ -769,6 +769,8 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp

      val killExecutors: Boolean => Future[Boolean] =
        if (executorsToKill.nonEmpty) {
+          executorsToKill.foreach(id =>
+            executorDataMap.get(id).foreach(_.executorEndpoint.send(StopExecutor)))


Can we guarantee that stop is called before kill in this way?

StopExecutor may arrive at executor after kill arrive at worker/container due to network delay, isn't it possible?

vanzin · 2020-04-22T01:24:18Z

Sorry, don't really have the cycles for a detailed review. But if you remove the shutdown hook, you'll be missing certain executor exit scenarios, like a cluster manager killing the executor.

iRakson · 2020-06-14T20:14:51Z

For the time being we are continuing with the current approach only. Closing this PR.

call stop() before killing executors

c4d3711

probot-autolabeler bot added the CORE label Apr 18, 2020

iRakson mentioned this pull request Apr 18, 2020

[SPARK-29152][CORE][2.4] Executor Plugin shutdown when dynamic allocation is enabled #26901

Closed

Ngone51 reviewed Apr 20, 2020

View reviewed changes

iRakson closed this Jun 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31478][CORE]Call `StopExecutor` before killing executors #28254

[SPARK-31478][CORE]Call `StopExecutor` before killing executors #28254

iRakson commented Apr 18, 2020 •

edited by dongjoon-hyun

iRakson commented Apr 18, 2020

dongjoon-hyun commented Apr 18, 2020

dongjoon-hyun commented Apr 18, 2020

SparkQA commented Apr 18, 2020

Ngone51 Apr 20, 2020

iRakson Apr 20, 2020

Ngone51 Apr 20, 2020

vanzin commented Apr 22, 2020

iRakson commented Jun 14, 2020

[SPARK-31478][CORE]Call StopExecutor before killing executors #28254

[SPARK-31478][CORE]Call StopExecutor before killing executors #28254

Conversation

iRakson commented Apr 18, 2020 • edited by dongjoon-hyun

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

iRakson commented Apr 18, 2020

dongjoon-hyun commented Apr 18, 2020

dongjoon-hyun commented Apr 18, 2020

SparkQA commented Apr 18, 2020

Ngone51 Apr 20, 2020

Choose a reason for hiding this comment

iRakson Apr 20, 2020

Choose a reason for hiding this comment

Ngone51 Apr 20, 2020

Choose a reason for hiding this comment

vanzin commented Apr 22, 2020

iRakson commented Jun 14, 2020

[SPARK-31478][CORE]Call `StopExecutor` before killing executors #28254

[SPARK-31478][CORE]Call `StopExecutor` before killing executors #28254

iRakson commented Apr 18, 2020 •

edited by dongjoon-hyun