[SPARK-22839] [K8s] Refactor to unify driver and executor pod builder APIs #20910

mccheah · 2018-03-26T23:06:19Z

What changes were proposed in this pull request?

Breaks down the construction of driver pods and executor pods in a way that uses a common abstraction for both spark-submit creating the driver and KubernetesClusterSchedulerBackend creating the executor. Encourages more code reuse and is more legible than the older approach.

The high-level design is discussed in more detail on the JIRA ticket. This pull request is the implementation of that design with some minor changes in the implementation details.

No user-facing behavior should break as a result of this change.

How was this patch tested?

Migrated all unit tests from the old submission steps architecture to the new architecture. Integration tests should not have to change and pass given that this shouldn't change any outward behavior.

It's now subsumed by the basicdriverfeaturestep.

mccheah · 2018-03-26T23:15:56Z

For all reviewers - this change is very large. Github's interpretation of the diff also doesn't present the changes in the most easily consumed manner.

To account for this, the pull request is best reviewed and understood commit by commit. Each commit roughly translates one component from the old architecture to the new architecture. The changes are incrementally built as follows:

We begin by presenting the base building blocks.
Migrate basic driver configurations.
Migrate mounting Kubernetes credentials in the driver.
Migrate creation of the driver service.
Remove dependency resolution as it is subsumed by BasicDriverFeatureStep.
Migrate mounting user-provided driver secrets.
Wire steps 2-6 above to build the entire driver spec, composing the incremental steps accordingly. Removes all of the remaining old driver-construction code that wasn't removed in steps 2-6.
Move all of the executor construction to use the new architecture as well.

We can alternatively create multiple pull requests to merge this change incrementally, but each intermediate pull request would likely be broken in the K8s functionality. To ensure that master is never in a broken unusable state for K8s, we unfortunately need to merge the entire change at once.

mccheah · 2018-03-26T23:17:41Z

Requesting review from @vanzin , @foxish, @ifilonenko, @liyinan926, @Eje. Any other feedback is welcome!

SparkQA · 2018-03-26T23:21:46Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1745/

SparkQA · 2018-03-26T23:24:17Z

Test build #88607 has finished for PR 20910 at commit 67e9ca1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mccheah · 2018-03-26T23:24:31Z

...tes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala

+      clientArguments.mainAppResource,
+      clientArguments.mainClass,
+      clientArguments.driverArgs)
+    val orchestrator = new KubernetesDriverBuilder


Should be builder

mccheah · 2018-03-26T23:26:55Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesSpec.scala

-      driverContainer = container
-    )
-  }
+private[k8s] object KubernetesSpec {


Probably should just be private[spark]

SparkQA · 2018-03-26T23:36:40Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1745/

liyinan926

Finished the first pass briefly. Overall the new APIs and abstraction look good to me. One thing I would suggest in general is to try to shorten the names of arguments/parameters.

liyinan926 · 2018-03-26T23:30:30Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala

+    .map(str => str.split(",").toSeq)
+    .getOrElse(Seq.empty[String])
+
+  def driverCustomEnvs(): Seq[(String, String)] =


This is driver specific and probably should not be here. What about making custom envs as an argument of the class similarly to labels and annotations? Then createDriverConf below gets the driver custom envs and pass them in. This also works for executor environment variables specified by spark.executorEnv..

liyinan926 · 2018-03-26T23:30:43Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala

+  def get(conf: String, defaultValue: String): String = sparkConf.get(conf, defaultValue)
+
+  def getOption(key: String): Option[String] = sparkConf.getOption(key)
+


Extra new line.

Sorry, do you mean we should remove this newline or that one should be added here?

Oh, I meant removing the extra new line.

Ok, will address in the next patch after others review.

liyinan926 · 2018-03-26T23:36:12Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala

+      KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_DRIVER_ANNOTATION_PREFIX)
+    val driverSecretNamesToMountPaths =
+      KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_DRIVER_SECRETS_PREFIX)
+    new KubernetesConf(


Can you add a new line before new KubernetesConf?

liyinan926 · 2018-03-26T23:36:31Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala

+          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
+        }
+    }
+    val driverCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs(


Can you add a new line before this line?

liyinan926 · 2018-03-26T23:37:06Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala

+      KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_EXECUTOR_ANNOTATION_PREFIX)
+    val executorSecrets =
+      KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_EXECUTOR_SECRETS_PREFIX)
+    new KubernetesConf(


Ditto, it improves readability with new lines separating the code a bit.

liyinan926 · 2018-03-26T23:40:04Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesSpec.scala

+private[k8s] case class KubernetesSpec(
+  pod: SparkPod,
+  additionalDriverKubernetesResources: Seq[HasMetadata],
+  podJavaSystemProperties: Map[String, String])


Can we shorten the name to just systemProperties? One of the most frequent types of comments I got while working on the upstreaming was to use short names.

liyinan926 · 2018-03-26T23:40:42Z

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala

+private[spark] object SparkPod {
+  def initialPod(): SparkPod = {
+    SparkPod(
+      new PodBuilder().withNewMetadata().endMetadata().withNewSpec().endSpec().build(),


Do you need .withNewMetadata().endMetadata().withNewSpec().endSpec() here?

Sort of. It allows everything that consumes one of these to use .editMetadata() or editOrNewMetadata when creating features. If you don't initialize the metadata and spec and then a downstream caller tries to invoke editMetadata then we throw an NPE.

liyinan926 · 2018-03-26T23:41:08Z

...rnetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala

+import org.apache.spark.launcher.SparkLauncher
+
+private[spark] class BasicDriverFeatureStep(
+  kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])


Should we rename this to driverConf?

liyinan926 · 2018-03-26T23:55:50Z

...ernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala

+    val allFeatures = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
+      baseFeatures ++ Seq(provideSecretsStep(kubernetesConf))
+    } else baseFeatures
+    var spec = KubernetesSpec.initialSpec(kubernetesConf.sparkConf.getAll.toMap)


Can you add a new line before this line?

liyinan926 · 2018-03-26T23:59:30Z

...tes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala

    new ConfigMapBuilder()
      .withNewMetadata()
        .withName(configMapName)
-        .withNamespace(namespace)


Why removed this?

It's not necessary to set namespaces on these objects because the kubernetes client itself is namespaced.

This reverts commit 4c944c4.

SparkQA · 2018-03-27T00:26:13Z

Test build #88608 has finished for PR 20910 at commit 4c944c4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-03-27T00:27:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1746/

SparkQA · 2018-03-27T00:32:05Z

Test build #88609 has finished for PR 20910 at commit 27b8634.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-03-27T00:41:10Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1746/

mccheah · 2018-03-27T00:51:51Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala

+  executorId: String, driverPod: Pod)
+  extends KubernetesRoleSpecificConf
+
+private[spark] class KubernetesConf[T <: KubernetesRoleSpecificConf](


Maybe should be a case class? This seems like a struct-like object which inclines me to think using a case class seems more idiomatic here.

mccheah · 2018-04-04T23:18:04Z

retest this please

SparkQA · 2018-04-04T23:27:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1943/

SparkQA · 2018-04-04T23:28:38Z

Test build #88912 has finished for PR 20910 at commit 7b339c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liyinan926 · 2018-04-04T23:31:01Z

...etes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala

+    } else {
+      executorCores.toString
+    }
+  private val executorLimitCores = kubernetesConf.sparkConf.get(KUBERNETES_EXECUTOR_LIMIT_CORES)


Looks like this can also be simplified as kubernetesConf .get.

liyinan926 · 2018-04-04T23:31:51Z

LGTM with only one comment.

SparkQA · 2018-04-04T23:38:19Z

Test build #88913 has finished for PR 20910 at commit 7b339c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-04T23:38:23Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1943/

SparkQA · 2018-04-04T23:38:32Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1944/

kimoonkim

LGTM. Thanks!

SparkQA · 2018-04-04T23:49:50Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1944/

SparkQA · 2018-04-04T23:50:00Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1946/

SparkQA · 2018-04-04T23:52:10Z

Test build #88915 has finished for PR 20910 at commit dbe35fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-05T00:00:54Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1946/

foxish

@mccheah just one comment.

foxish · 2018-04-10T19:07:54Z

...rce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesSpec.scala

- * Represents a step in configuring the Spark driver pod.
- */
-private[spark] trait DriverConfigurationStep {
+private[spark] case class KubernetesSpec(


This class is named as though it applies to driver and executor construction. Maybe KubernetesDriverSpec? It's also a bit unclear to me what purpose this abstraction serves as opposed to the way KubernetesExecutorBuilder goes about building the pod.

If you check KubernetesClientApplication, it needs the extra fields here (the additional driver resources and the driver system properties) to construct the pod. In other words, the driver builder has to return a structure with more than just the pod.

Think we could move some of the logic from KubernetesClientApplication into KubernetesDriverBuilder. Do you have any suggestions if we should move around the abstraction boundaries a bit?

SparkQA · 2018-04-11T14:59:52Z

Test build #89202 has finished for PR 20910 at commit 4b92989.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

foxish · 2018-04-12T18:04:10Z

@mccheah I think it's fine as is now. We can take care of moving abstractions between the submission client and the driver in a future PR if necessary. Just the scala style issue needs taking care of; and then this LGTM.

SparkQA · 2018-04-12T21:08:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2244/

SparkQA · 2018-04-12T21:12:17Z

Test build #89300 has finished for PR 20910 at commit 7807c9c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-12T21:14:16Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2244/

foxish · 2018-04-13T15:42:22Z

Merging to master

…pod builder APIs

mccheah added 12 commits March 26, 2018 15:55

Fundamental building blocks for the new pod construction architecture.

f1b8c08

Intermediate commit to move file.

80e1562

Move basic driver configuration to new architecture.

c3460ae

Intermediate commit to move file

6d1711b

Migrate mounting K8s credentials to the new architecture.

4036d72

Intermediate commit to move file.

d46d671

Migrate creating the driver service to the new architecture.

2936aa5

Remove dependency resolution step entirely.

d2751b6

It's now subsumed by the basicdriverfeaturestep.

Move mounting driver secrets to new architecture.

430fbb2

Complete driver migration to new pod construction architecture.

fd3e8e6

Intermediate commit to move file

f0ea6d9

Migrate executor pod construction to use the new architecture.

67e9ca1

mccheah mentioned this pull request Mar 26, 2018

Initial framework for pod construction architecture refactor. mccheah/spark#1

Closed

mccheah commented Mar 26, 2018

View reviewed changes

liyinan926 reviewed Mar 27, 2018

View reviewed changes

mccheah added 2 commits March 26, 2018 17:08

Manage args differently.

4c944c4

Revert "Manage args differently."

27b8634

This reverts commit 4c944c4.

Make envs role-specific

9c67016

Address comments.

f3540f8

mccheah commented Mar 27, 2018

View reviewed changes

liyinan926 reviewed Apr 4, 2018

View reviewed changes

Simplify a line

dbe35fa

kimoonkim approved these changes Apr 4, 2018

View reviewed changes

foxish reviewed Apr 10, 2018

View reviewed changes

Rename KubernetesSpec -> KubernetesDriverSpec

4b92989

madanadit mentioned this pull request Apr 12, 2018

[SPARK-23529][K8s] Support mounting hostPath volumes for executors #21032

Closed

Fix scalastyle

7807c9c

asfgit closed this in a83ae0d Apr 13, 2018

echarles mentioned this pull request Apr 15, 2018

[SPARK-23146][WIP] Support client mode for Kubernetes in Out-Cluster mode #20451

Closed

liyinan926 mentioned this pull request Apr 20, 2018

Sync with apache/spark/master kubeflow/spark-operator#129

Closed

8 tasks

echarles added a commit to datalayer-externals/spark that referenced this pull request May 4, 2018

update further to apache#20910 Refactor to unify driver and executor …

a07c9df

…pod builder APIs

		def get(conf: String, defaultValue: String): String = sparkConf.get(conf, defaultValue)

		def getOption(key: String): Option[String] = sparkConf.getOption(key)

[SPARK-22839] [K8s] Refactor to unify driver and executor pod builder APIs #20910

[SPARK-22839] [K8s] Refactor to unify driver and executor pod builder APIs #20910

Conversation

mccheah commented Mar 26, 2018

What changes were proposed in this pull request?

How was this patch tested?

mccheah commented Mar 26, 2018 • edited Loading

mccheah commented Mar 26, 2018

SparkQA commented Mar 26, 2018

SparkQA commented Mar 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 26, 2018

liyinan926 left a comment

Choose a reason for hiding this comment

liyinan926 Mar 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 27, 2018

SparkQA commented Mar 27, 2018

SparkQA commented Mar 27, 2018

SparkQA commented Mar 27, 2018

Choose a reason for hiding this comment

mccheah commented Apr 4, 2018

SparkQA commented Apr 4, 2018

SparkQA commented Apr 4, 2018

Choose a reason for hiding this comment

liyinan926 commented Apr 4, 2018

SparkQA commented Apr 4, 2018

SparkQA commented Apr 4, 2018

SparkQA commented Apr 4, 2018

kimoonkim left a comment

Choose a reason for hiding this comment

SparkQA commented Apr 4, 2018

SparkQA commented Apr 4, 2018

SparkQA commented Apr 4, 2018

SparkQA commented Apr 5, 2018

foxish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Apr 11, 2018

foxish commented Apr 12, 2018 • edited Loading

SparkQA commented Apr 12, 2018

SparkQA commented Apr 12, 2018

SparkQA commented Apr 12, 2018

foxish commented Apr 13, 2018 • edited Loading

mccheah commented Mar 26, 2018 •

edited

Loading

liyinan926 Mar 26, 2018 •

edited

Loading

foxish commented Apr 12, 2018 •

edited

Loading

foxish commented Apr 13, 2018 •

edited

Loading