[WIP][SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases #29788

Ngone51 · 2020-09-17T15:01:53Z

What changes were proposed in this pull request?

Split ExecutorDecommissionInfo into 5 classes for different use cases:

DynamicAllocationDecommission: for the case where decommission is triggered at executor dynamic allocation
StandaloneDecommission(workerHost): for the Standalone case
K8SDecommission: for the Kubernetes case, it extends ExecutorTriggeredDecommission.
ExecutorTriggeredDecommission: for the case where decommission is triggered at executor
TestExecutorDecommission(host): test only.

And all of them extend ExecutorDecommissionReason with a specific decommission reason.

On the other hand, ExecutorDecommissionState would accept the ExecutorDecommissionReason as an attribute and exposes common information by functions, e.g., isHostDecommissioned().

Why are the changes needed?

We have various decommission uses cases now. And ExecutorDecommissionInfo becomes not enough to distinguish the case where decommission is triggered at executor after #29722. That's also the reason why we added triggeredByExecutor. But things like triggeredByExecutor can be annoying and not easy to extend. So we need to improve the current way to work better with different cases.

There are a few benefits with this PR:

Get rid of the parameter triggeredByExecutor
No longer need to save the redundant workerHost info
The decommission handling logic is more clear than before

Does this PR introduce any user-facing change?

No.

How was this patch tested?

WIP: I need to check whether k8s has an existing test covered by my change.

Ngone51 · 2020-09-17T15:03:13Z

@holdenk @cloud-fan @agrawaldevesh Please take a look, thanks!

holdenk · 2020-09-17T15:53:11Z

Please tag this PR as WIP until it is tested, thanks for working on improving the code though @Ngone51 :)

SparkQA · 2020-09-17T15:55:38Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33453/

holdenk

So I'm personally against this refactoring, but if folks working on making this work for other cluster backends say this refactoring would make it easier for them I'm fine to proceed (although I'd appreciate a chance to do a more detailed review before hand).

holdenk · 2020-09-17T16:03:16Z

core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala

 package org.apache.spark

-import org.apache.spark.scheduler.ExecutorDecommissionInfo
+import org.apache.spark.scheduler.ExecutorDecommissionReason


Why are we renaming this?

+1, I would argue against un-necessary renaming even if it seems a bit "unnatural". It creates un-necessary diff noise.

To me "Info" and "Reason" are both similar: They both portend "additional information".

I guess @Ngone51 is trying to follow the style of TaskEndReason.

Thank @dongjoon-hyun for the clarification. In addition to follow the style of TaskEndReason, I acutally also want to handle the decommission info/reason in the similar way of TaskEndReason.

holdenk · 2020-09-17T16:05:13Z

core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionReason.scala

+
+package org.apache.spark.scheduler
+
+private[spark] sealed trait ExecutorDecommissionReason {


I get why this is a sealed trait, namely we're pattern matching against it. But this seems to remove flexibility for anyone working on scheduler backends

@holdenk, can you please provide an example of how having this as a sealed trait would limit the flexibility ?

It is marked as a private[spark], so the resource manager specific scheduler backends, should be able to extend it ... no ?

Duh. Sorry for my n00bness. I can totally see why this shouldn't be a sealed trait: For example it is forcing the TestExecutorDecommissionInfo to be in this file.

@Ngone51 is there a strong reason for making this be a sealed trait ? Is that required by the RPC framework for example ? If not, I don't think its worth it.

holdenk · 2020-09-17T16:06:35Z

core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionReason.scala

+/**
+ * For the Kubernetes workloads
+ */
+case class K8SDecommission() extends ExecutorTriggeredDecommission


I think maybe we could have a better level here, there isn't really anything K8s specific about this kind of message. Rather all external cluster manager decommissions could be the same perhaps?

SparkQA · 2020-09-17T17:06:26Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33453/

SparkQA · 2020-09-17T18:25:07Z

Test build #128829 has finished for PR 29788 at commit 9bebdd4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-09-18T02:57:53Z

@tgravescs, @mridulm, @squito FYI

cloud-fan · 2020-09-18T07:36:58Z

core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala

-      adjustTargetNumExecutors: Boolean,
-      triggeredByExecutor: Boolean): Seq[String] = {
-    killExecutors(executorsAndDecomInfo.map(_._1),
+      executorsAndDecomReason: Array[(String, ExecutorDecommissionReason)],


is it possible that different executors have different ExecutorDecommissionReason? If it's not possible, I think we are over-engineering here.

This is how it was earlier -- so we aren't changing the semantics save the renaming :-) And plus yes this can happen: Different executors on different hosts would have different ExecutorDecommissionReason/Info with different hosts potentially in them.

This is simply a bulk api : Instead of making n calls we are folding them into one.

agrawaldevesh

I think this is a great refactoring and it does help to separate out the different use cases. It's close but I that there are some rough edges worth fixing and we can make the changes be even tighter.

agrawaldevesh · 2020-09-18T17:01:51Z

core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala

 package org.apache.spark

-import org.apache.spark.scheduler.ExecutorDecommissionInfo
+import org.apache.spark.scheduler.ExecutorDecommissionReason


+1, I would argue against un-necessary renaming even if it seems a bit "unnatural". It creates un-necessary diff noise.

To me "Info" and "Reason" are both similar: They both portend "additional information".

agrawaldevesh · 2020-09-18T17:03:32Z

core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala

-      adjustTargetNumExecutors: Boolean,
-      triggeredByExecutor: Boolean): Seq[String] = {
-    killExecutors(executorsAndDecomInfo.map(_._1),
+      executorsAndDecomReason: Array[(String, ExecutorDecommissionReason)],


This is how it was earlier -- so we aren't changing the semantics save the renaming :-) And plus yes this can happen: Different executors on different hosts would have different ExecutorDecommissionReason/Info with different hosts potentially in them.

This is simply a bulk api : Instead of making n calls we are folding them into one.

agrawaldevesh · 2020-09-18T17:07:14Z

core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionReason.scala

+
+package org.apache.spark.scheduler
+
+private[spark] sealed trait ExecutorDecommissionReason {


@holdenk, can you please provide an example of how having this as a sealed trait would limit the flexibility ?

It is marked as a private[spark], so the resource manager specific scheduler backends, should be able to extend it ... no ?

agrawaldevesh · 2020-09-18T17:10:15Z

core/src/main/scala/org/apache/spark/scheduler/ExecutorLossReason.scala

 */
-private [spark] case class ExecutorDecommission(workerHost: Option[String] = None)
- extends ExecutorLossReason("Executor decommission.")
+private [spark] case class ExecutorDecommission(reason: String, host: Option[String] = None)


My scala knowledge is really really poor, but I would rather we make this be a non case class if you are planning to do this. Currently, I think the field "reason" is going to be duplicated in the base class ExecutorLossReason and the ExecutorDecommission.

That's also the reason why you are pattern matching it above with an additional _ (for the reason) argument, when you really don't care about the reason.

The case class is neeed because we'd apply pattern matching on it.

The "reason" is necessary because of class inheritance, no? Please see ExecutorProcessLost for as instance, the ExecutorProcessLost also has the field _message, which is needed to assigne to ExecutorProcessLost.message

You can right unapply methods if you need to do pattern matching with something other than a case class.

agrawaldevesh · 2020-09-18T17:11:36Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

      val workerHost = reason match {
        case ExecutorProcessLost(_, workerHost, _) => workerHost
-        case ExecutorDecommission(workerHost) => workerHost
+        case ExecutorDecommission(_, host) => host


See comment below for ExecutorDecommission ... Should this be changed to a:

case decom @ ExecutorDecommission => decom.workerHost // or decom.host

You don't need to add an extra '_' then.

Yeah, this actually a good point! This's actually a rule of Databricks' scala style guide. But I just follow the style of above ExecutorProcessLost here. I think it's acceptable when there're not many arugumenetes are being expaned.

agrawaldevesh · 2020-09-18T17:23:28Z

core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionReason.scala

+package org.apache.spark.scheduler
+
+private[spark] sealed trait ExecutorDecommissionReason {
+  val reason: String = "decommissioned"


I don't think the reason field is really needed anywhere, besides it being used for toString ? Should we just require overriding toString by marking toString abstract ? I don't think that child classes need to override both toString and reason : I would prefer we just override methods instead of fields.

We also indirectly use the reason for logExecutorLoss. And yeah we can just override toString indeed.

agrawaldevesh · 2020-09-18T17:25:06Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

      logInfo(s"Executor $executorId on $hostPort killed by driver.")
+    case ExecutorDecommission(reason, _) =>
+      // use logInfo instead of logError as the loss of decommissioned executor is what we expect
+      logInfo(s"Decommissioned executor $executorId on $hostPort shutdown: $reason")


instead of 'shutdown', should we say 'is finally lost' ? To be more accurate in this setting.

+1 on this change to avoid log spam.

agrawaldevesh · 2020-09-18T17:26:37Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

      val exitCausedByApp: Boolean = reason match {
        case exited: ExecutorExited => exited.exitCausedByApp
-        case ExecutorKilled | ExecutorDecommission(_) => false
+        case ExecutorKilled | ExecutorDecommission(_, _) => false


I am wondering if we should instead pattern match in a separate arm like:

_ @ ExecutorDecommission => false

To avoid having to change the case arms when we make changes to the structure definitions.

agrawaldevesh · 2020-09-18T17:32:01Z

core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionReason.scala

+  }
+}
+
+/**


Can you move this test only class somewhere in the test only package ?

See TestResourceIDs as an example.

agrawaldevesh · 2020-09-18T17:36:54Z

...rc/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala

  private class KubernetesDriverEndpoint extends DriverEndpoint {

+    override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
+      case ExecutorDecommissioning(executorId) =>


I didn't fully follow the need for distinction b/w the K8s case and the simple executor triggered case.

I thought K8s only needs the SIGPWR based thing, and indeed ExecutorDecommissioning is only sent in response to a SIGPWR.

So I am missing why we override ExecutorDecommissioning here and the motivation for K8SDecommission.

holdenk · 2020-09-20T02:30:06Z

Since there was another PR in the same area committed that broke the existing integration tests in this area I don't feel confident with my soft reservations and switching to a vetoing for this change (e.g. -1).
Technical justification: Lack of new test coverage on top of existing broken suite & I don't believe the stated benefit makes up for the increased risk.
The suggested path forward: Mark PR as WIP, improve test coverage (ideally in a way which demonstrates the benefits), go through review process. I would like to suggest since this PR states that the reason this change is needed is to make the mechanism more extensible we should consider if a private sealed trait is well suited to achieving that goal. I believe that this part of the code may need to be revisited as well.

holdenk · 2020-09-20T02:47:59Z

(willing to switch back to -0 once the original issue is addressed, I just don't want us in a state with broken tests as normal).

Ngone51 · 2020-09-21T02:57:10Z

Thanks for everyone's review. Agree with @holdenk that we should resolve the issue (#29722 (comment)) first. We can continue the discussion after the issue resolved. Thanks again!

cloud-fan · 2020-09-21T07:52:17Z

This PRs adds 5 classes to represent 5 different decommission reasons, but we don't really have 5 different branches to handle these reasons. I think the abstraction should be based on the real requirements, can we simplify them?

Ngone51 · 2020-09-22T02:21:54Z

Since there was another PR in the same area committed that broke the existing integration tests in this area I don't feel confident with my soft reservations and switching to a vetoing for this change (e.g. -1).

So it turns out that the PR (#29722) doesn't break the existing integration tests. (It's someone else but we don't know yet). Therefore, I think we are safe to continue the discussion.

So as for the main concern, I think I can actually change the ExecutorDecommissionInfo to:

ExecutorDecommissionInfo(message, hostOpt, isDynamic, isTriggeredByExecutor)

Then, we'd still keep one decommission info instance. Does this sounds good to you?

(BTW, the updates could be late since the PR (#29722) is already reverted and we have conflits here. And the re-submitted PR is being block by the broken integration tests.)

holdenk · 2020-09-23T23:21:17Z

I think it would be good to see your proposal in code @Ngone51 because I'm not 100% sure what you mean.
I would really like to see both this and the precursor tested more thoroughly.

Ngone51 · 2020-09-28T08:43:34Z

@holden This PR is based on #29817. I will update this PR after #29817 gets merged.

holdenk · 2020-09-28T14:09:05Z

Sounds good

holden · 2020-09-28T14:12:15Z

If @holdenk thinks so, then I agree as well!

Ngone51 · 2020-09-28T15:44:55Z

Sorry for the bother @holden 😢

github-actions · 2021-01-07T01:15:43Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Ngone51 added 2 commits September 17, 2020 21:21

impr

ffe3c9b

add comment

9bebdd4

probot-autolabeler bot added CORE DSTREAM KUBERNETES labels Sep 17, 2020

holdenk requested changes Sep 17, 2020

View reviewed changes

cloud-fan reviewed Sep 18, 2020

View reviewed changes

agrawaldevesh reviewed Sep 18, 2020

View reviewed changes

Ngone51 changed the title ~~[SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases~~ [WIP][SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases Sep 28, 2020

github-actions bot added the Stale label Jan 7, 2021

github-actions bot closed this Jan 8, 2021


		package org.apache.spark.scheduler

		private[spark] sealed trait ExecutorDecommissionReason {

+                }
+              }
+              /**

[WIP][SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases #29788

[WIP][SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases #29788

Uh oh!

Conversation

Ngone51 commented Sep 17, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Ngone51 commented Sep 17, 2020

Uh oh!

holdenk commented Sep 17, 2020

Uh oh!

SparkQA commented Sep 17, 2020

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 17, 2020

Uh oh!

SparkQA commented Sep 17, 2020

Uh oh!

HyukjinKwon commented Sep 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agrawaldevesh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holdenk commented Sep 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dongjoon-hyun Sep 18, 2020 •

edited

Loading

holdenk commented Sep 20, 2020 •

edited

Loading

Ngone51 commented Sep 22, 2020 •

edited

Loading