New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31478][CORE]Call StopExecutor
before killing executors
#28254
Conversation
cc @dongjoon-hyun @vanzin |
Since this PR aims reverts the original patch of SPARK-29152 and add a new way, I believe @vanzin 's option is important to this. |
ok to test |
Test build #121459 has finished for PR 28254 at commit
|
@@ -769,6 +769,8 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp | |||
|
|||
val killExecutors: Boolean => Future[Boolean] = | |||
if (executorsToKill.nonEmpty) { | |||
executorsToKill.foreach(id => | |||
executorDataMap.get(id).foreach(_.executorEndpoint.send(StopExecutor))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we guarantee that stop
is called before kill
in this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StopExecutor
may arrive at executor after kill
arrive at worker/container due to network delay, isn't it possible?
Sorry, don't really have the cycles for a detailed review. But if you remove the shutdown hook, you'll be missing certain executor exit scenarios, like a cluster manager killing the executor. |
For the time being we are continuing with the current approach only. Closing this PR. |
What changes were proposed in this pull request?
StopExecutor
call to executors before killing them.A similar patch is tested here #26901
Why are the changes needed?
When executors do not goes down gracefully, their
stop()
method is never called. To solve this problem, a shutdown hook was used to execute thestop()
method.Instead of forcing a shutdown hook, we should just add a
StopExecutor
call to executors before they are killed.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manually