Submission client redesign to use a step-based builder pattern #365

mccheah · 2017-06-30T19:20:10Z

Applies changes assuming PySpark is present from #364 .

This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change.

The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it.

Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended.

This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments.

The current steps are:

aseSubmissionStep: Baseline configurations such as the docker image and resource requests.
DriverKubernetesCredentialsStep: Resolves Kubernetes credentials configuration in the driver pod. Mounts a secret if necessary.
InitContainerBootstrapStep: Attaches the init-container, if necessary, to the driver pod. This is optional and wont' be loaded if all URIs are "local" or there are no URIs at all.
DependencyResolutionStep: Sets the classpath, spark.jars, and spark.files properties. This step is partially not isolated as it assumes that files that are remote or locally submitted will be downloaded to a given location. Unit tests should verify that this contract holds.
PythonStep: Configures Python environment variables if using PySpark.

mccheah · 2017-06-30T19:20:31Z

Rebuilt from #363.

mccheah · 2017-06-30T19:30:42Z

...s/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/PythonStep.scala

@@ -27,15 +27,19 @@ private[spark] class PythonStep(
    filesDownloadPath: String) extends KubernetesSubmissionStep {

  override def prepareSubmission(driverSpec: KubernetesDriverSpec): KubernetesDriverSpec = {
+    val resolvedOtherPyFilesString = if (otherPyFiles.isEmpty) {


@ifilonenko is this sufficient to cover the arguments handling? SPARK_DRIVER_ARGS is loaded in BaseSubmissionStep already. Or is there more we need to do here?

Not exactly, it would need to be null unless you have docker parse, but docker doesnt do if: else: blocks so thats why this woudn't be helpful per say

Ok - but if we changed this to be the string "null" then that would suffice?

We can have the Dockerfile use an if-else block also - we do this in a number of places in the existing ones. It's just bash if-else syntax in the command.

I am testing that rn in a seperate PR, but yes, it should. passing in just null I believed cause withValue() to raise an error. But I think that CMD PythonRunner PY_FILE "null" DRIVER_ARGS parses correctly.

if-else syntax doesnt exist? there is only if - then

Bash supports if-then-else. It might be tricky to get it exactly right in the Dockerfile so maybe it's not worth it.

ifilonenko · 2017-06-30T22:42:34Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

-        childArgs += "org.apache.spark.deploy.PythonRunner"
-        childArgs += "--other-py-files"
-        childArgs += args.pyFiles
+        childArgs ++= Array("--primary-py-file", args.primaryResource)


:) thanks for the --primary-py-file

ifilonenko · 2017-06-30T22:51:57Z

...rg/apache/spark/deploy/kubernetes/integrationtest/backend/minikube/MinikubeTestBackend.scala

@@ -27,7 +27,7 @@ private[spark] class MinikubeTestBackend extends IntegrationTestBackend {

  override def initialize(): Unit = {
    Minikube.startMinikube()
-    new SparkDockerImageBuilder(Minikube.getDockerEnv).buildSparkDockerImages()
+//    new SparkDockerImageBuilder(Minikube.getDockerEnv).buildSparkDockerImages()


Intentionally commited?

Nope good catch - was from local testing

mccheah · 2017-07-01T01:06:20Z

Conflicts are likely from the style changes in the base PR. I expect that unless we have significant deviations functionality-wise in the Python implementation, we can resolve most if not all of the conflicts by just taking this branch.

mccheah · 2017-07-01T01:08:59Z

...he/spark/deploy/kubernetes/submit/submitsteps/initcontainer/BaseInitContainerStepSuite.scala

+  @Mock
+  private var podWithDetachedInitContainer : PodWithDetachedInitContainer = _
+  @Mock
+  private var initContainerSpec : InitContainerSpec = _


Don't use a mock here - since it's a case class, the equivalent of a struct, it's fine to just use the implementation.

How would you simulate the .copy method tho?

We don't need to - since it's a Scala primitive we should operate under what its actual behavior is. We don't mock classes like scala.collection.List or java.util.Optional for similar reasons.

Agreed. I will just pass in an empty case class

mccheah · 2017-07-05T21:54:01Z

I resolved merge conflicts with the "ours" strategy since the only difference in the latest push that caused the conflicts were style changes. I'll rebase this branch to also be pointing to branch-2.1-kubernetes shortly.

This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change. The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it. Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended. This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments.

mccheah · 2017-07-05T21:59:17Z

...deploy/kubernetes/submit/submitsteps/initcontainer/InitContainerStepsOrchestratorSuite.scala

+  private val STAGING_SERVER_URI = "http://localhost:8000"
+
+  test ("Contact resource staging server w/o TLS") {
+    val SPARK_CONF = new SparkConf(true)


Call this sparkConf

mccheah · 2017-07-05T22:00:10Z

...deploy/kubernetes/submit/submitsteps/initcontainer/InitContainerStepsOrchestratorSuite.scala

+
+    val initSteps : Seq[InitContainerStep] = initContainerStepsOrchestrator.getInitContainerSteps()
+    assert(initSteps.length == 2)
+    assert( initSteps.map({


This is just more cleanly expressed as

assert(initSteps(0).isInstanceOf[BaseInitContainerStep]) assert(initSteps(1).isInstanceOf(SubmittedResourcesInitContainerStep])

instead of using pattern matching.

mccheah · 2017-07-05T22:00:29Z

...deploy/kubernetes/submit/submitsteps/initcontainer/InitContainerStepsOrchestratorSuite.scala

+      INIT_CONTAINER_CONFIG_MAP_NAME, INIT_CONTAINER_CONFIG_MAP_KEY, SPARK_CONF)
+    val initSteps : Seq[InitContainerStep] = initContainerStepsOrchestrator.getInitContainerSteps()
+    assert(initSteps.length == 1)
+    assert(initSteps.headOption.exists({


Similarly here, just use isInstanceOf.

mccheah · 2017-07-05T22:01:16Z

...he/spark/deploy/kubernetes/submit/submitsteps/initcontainer/BaseInitContainerStepSuite.scala

+  @Mock
+  private var podAndInitContainerBootstrap : SparkPodInitContainerBootstrap = _
+  @Mock
+  private var podWithDetachedInitContainer : PodWithDetachedInitContainer = _


This is a case class so it shouldn't be a mock.

mccheah · 2017-07-05T22:02:05Z

...he/spark/deploy/kubernetes/submit/submitsteps/initcontainer/BaseInitContainerStepSuite.scala

+      podAndInitContainerBootstrap)
+    val remoteJarsToDownload = KubernetesFileUtils.getOnlyRemoteFiles(SPARK_JARS)
+    val remoteFilesToDownload = KubernetesFileUtils.getOnlyRemoteFiles(SPARK_FILES)
+    assert(remoteJarsToDownload === List("hdfs://localhost:9000/app/jars/jar1.jar"))


Why this check? We aren't testing the functionality of KubernetesFileUtils here.

mccheah · 2017-07-05T22:02:58Z

...he/spark/deploy/kubernetes/submit/submitsteps/initcontainer/BaseInitContainerStepSuite.scala

+      new Container(), new Container(), new Pod, Seq.empty[HasMetadata]
+    )
+    val returnContainerSpec = baseInitStep.prepareInitContainer(initContainerSpec)
+    assert(expectedTest.toSet.subsetOf(returnContainerSpec.initContainerProperties.toSet))


Check for an exact match. We don't want to be setting unexpected properties here.

mccheah · 2017-07-05T22:03:11Z

...he/spark/deploy/kubernetes/submit/submitsteps/initcontainer/BaseInitContainerStepSuite.scala

+    assert(remoteJarsToDownload === List("hdfs://localhost:9000/app/jars/jar1.jar"))
+    assert(remoteFilesToDownload === List("hdfs://localhost:9000/app/files/file1.txt"))
+    val expectedTest = Map(
+      INIT_CONTAINER_JARS_DOWNLOAD_LOCATION.key -> JARS_DOWNLOAD_PATH,


Could be a better name for this - expectedDriverSparkConf, perhaps.

mccheah · 2017-07-05T22:05:36Z

...he/spark/deploy/kubernetes/submit/submitsteps/initcontainer/BaseInitContainerStepSuite.scala

+      new Container(), new Container(), new Pod, Seq.empty[HasMetadata]
+    )
+    val returnContainerSpec = baseInitStep.prepareInitContainer(initContainerSpec)
+    assert(expectedTest.toSet.subsetOf(returnContainerSpec.initContainerProperties.toSet))


We also should be inspecting the properties of the pod and the containers. The mock of the SparkPodInitContainerBootstrap instance we pass in should modify the Kubernetes components in such a way that we can check for them afterwards. Basically we want to verify that the pod init container bootstrap was used to make changes to the pod and containers.

mccheah · 2017-07-05T22:05:53Z

...deploy/kubernetes/submit/submitsteps/initcontainer/InitContainerStepsOrchestratorSuite.scala

+      .set(RESOURCE_STAGING_SERVER_URI, STAGING_SERVER_URI)
+
+    val initContainerStepsOrchestrator = new InitContainerStepsOrchestrator(
+      NAMESPACE, APP_RESOURCE_PREFIX, SPARK_JARS, SPARK_FILES, JARS_DOWNLOAD_PATH,


One argument per line.

mccheah · 2017-07-05T22:06:24Z

...deploy/kubernetes/submit/submitsteps/initcontainer/InitContainerStepsOrchestratorSuite.scala

+  private val INIT_CONTAINER_CONFIG_MAP_KEY = "spark-init-config-map-key"
+  private val STAGING_SERVER_URI = "http://localhost:8000"
+
+  test ("Contact resource staging server w/o TLS") {


Probably no need to mention TLS in the test description since it's largely irrelevant for what the test is actually checking.

mccheah · 2017-07-05T22:09:13Z

...nagers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/Client.scala

+      driverContainer = new ContainerBuilder().build(),
+      driverSparkConf = submissionSparkConf.clone(),
+      otherKubernetesResources = Seq.empty[HasMetadata])
+    // This orchestrator determines which steps are necessary to take to resolve varying


This isn't really the orchestrator - the orchestrator has pre-determined these steps to run. Perhaps this comment can be moved?

erikerlandson · 2017-07-05T23:52:58Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

-        childArgs += args.primaryResource
-        childArgs += "org.apache.spark.deploy.PythonRunner"
-        childArgs += args.pyFiles
+        childArgs ++= Array("--primary-py-file", args.primaryResource)


Wondering if it makes sense for (childArgs, childClasspath, sysProps, childMainClass) to be modeled as a case class with builder pattern, for similar reasons. Tangentially, mutable collections can be hazardous if not handled carefully - a case-class pattern using immutable collections might be worthwhile, given the complexity of the environment constructions

We don't have many better options here because SparkSubmit creates the submission client implementation reflectively and only expects the submission client to have a main method with a list of arguments. This is to account for the fact that the core module of Spark doesn't have a compile time dependency on the specific submission client implementations for the different cluster managers.

In Client.scala we parse the arguments array into a case class and report on errors when fields are missing.

mccheah · 2017-07-07T18:52:49Z

rerun integration tests please

mccheah · 2017-07-07T19:14:13Z

.../src/test/scala/org/apache/spark/deploy/kubernetes/SparkPodInitContainerBootstrapSuite.scala

+    INIT_CONTAINER_CONFIG_MAP_NAME,
+    INIT_CONTAINER_CONFIG_MAP_KEY
+  )
+  private val returnedPodWithCont = sparkPodInit.bootstrapInitContainerAndVolumes(


Move these into the test method.

Sharing fields is nice for the object under test, but for the functionality we specifically are testing, it's more idiomatic to put the method calls into the test.

mccheah · 2017-07-07T19:16:19Z

.../src/test/scala/org/apache/spark/deploy/kubernetes/SparkPodInitContainerBootstrapSuite.scala

+      initContainer = new Container(),
+      mainContainer = new ContainerBuilder().withName(MAIN_CONTAINER_NAME).build())
+  )
+  private val expectedSharedMap = Map(


Clearer name, related to volumes

mccheah · 2017-07-07T19:17:40Z

...org/apache/spark/deploy/kubernetes/InitContainerResourceStagingServerSecretPluginSuite.scala

+  private val INIT_CONTAINER_SECRET_NAME = "init-secret"
+  private val INIT_CONTAINER_SECRET_MOUNT = "/tmp/secret"
+
+  private val initContainerRSSP = new InitContainerResourceStagingServerSecretPluginImpl(


initContainerSecretPlugin - no need for an acronym.

mccheah · 2017-07-07T19:18:23Z

...org/apache/spark/deploy/kubernetes/InitContainerResourceStagingServerSecretPluginSuite.scala

+  test("Volume Mount into InitContainer") {
+    val returnedCont = initContainerRSSP.mountResourceStagingServerSecretIntoInitContainer(
+      new ContainerBuilder().withName("init-container").build()
+    )


This closing bracket should be on the previous line. There's a few places where this is done, so please fix the others also.

mccheah · 2017-07-07T19:18:45Z

...org/apache/spark/deploy/kubernetes/InitContainerResourceStagingServerSecretPluginSuite.scala

+    assert(returnedCont.getVolumeMounts.asScala.map(
+      vm => (vm.getName, vm.getMountPath)) ===
+        List((INIT_CONTAINER_SECRET_VOLUME_NAME, INIT_CONTAINER_SECRET_MOUNT))
+    )


Bracket on the previous line

mccheah · 2017-07-07T19:19:08Z

.../src/test/scala/org/apache/spark/deploy/kubernetes/SparkPodInitContainerBootstrapSuite.scala

+    DOWNLOAD_TIMEOUT_MINUTES,
+    INIT_CONTAINER_CONFIG_MAP_NAME,
+    INIT_CONTAINER_CONFIG_MAP_KEY
+  )


Bracket on the previous line

mccheah · 2017-07-07T19:19:31Z

.../src/test/scala/org/apache/spark/deploy/kubernetes/SparkPodInitContainerBootstrapSuite.scala

+  private val expectedSharedMap = Map(
+    JARS_DOWNLOAD_PATH -> INIT_CONTAINER_DOWNLOAD_JARS_VOLUME_NAME,
+    FILES_DOWNLOAD_PATH -> INIT_CONTAINER_DOWNLOAD_FILES_VOLUME_NAME
+  )


Bracket on the previous line

mccheah · 2017-07-07T19:21:39Z

...y/kubernetes/submit/submitsteps/initcontainer/SubmittedResourcesInitContainerStepSuite.scala

+import org.scalatest.BeforeAndAfter
+
+class SubmittedResourcesInitContainerStepSuite extends SparkFunSuite with BeforeAndAfter {
+  private def createTempFile(extension: String): String = {


Move this method to the bottom of the test class

mccheah · 2017-07-07T19:22:03Z

...y/kubernetes/submit/submitsteps/initcontainer/SubmittedResourcesInitContainerStepSuite.scala

+          .addToLabels("mountedSecret", "true")
+          .endMetadata()
+          .withNewSpec().endSpec()
+          .build()}})


Closing curly braces should be on their own lines.

ifilonenko

LGTM

ash211

Wow this is so much more readable than before! I think the concept of steps and the orchestrator assembling steps has worked out really well -- this is much easier to follow and understand than before.
Plus the tests seem much more straightforward than before great work!

Besides the little nits, I think adding a DriverSpecDeployer or something similar would be the biggest improvement from here. It would further extract logic to a testable place, and is nice to understand as the followup to creating a spec: should be to deploy the spec.

Again, love this change!

ash211 · 2017-07-13T19:05:43Z

.../core/src/main/scala/org/apache/spark/deploy/kubernetes/SparkPodInitContainerBootstrap.scala

-      mainContainerName: String,
-      originalPodSpec: PodBuilder): PodBuilder = {
+      originalPodWithUnattachedInitContainer: PodWithDetachedInitContainer)
+      : PodWithDetachedInitContainer = {


this syntax looks a bit weird -- should the : be on the prior line?

It doesn't fit on one line in under 100 characters.

ash211 · 2017-07-14T00:15:08Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/config.scala

@@ -418,6 +418,14 @@ package object config extends Logging {
      .stringConf
      .createOptional

+  private[spark] val INIT_CONTAINER_REMOTE_PYSPARK_FILES =
+    ConfigBuilder("spark.kubernetes.initcontainer.remotePyFiles")


why is this newly-required from this refactor? I expected there to be change in user-visible behavior

Good catch - I don't think this is necessary, this is an artifact of something I was trying before.

ash211 · 2017-07-14T00:15:56Z

...nagers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/Client.scala

+    var mainClass: Option[String] = None
+    val driverArgs = mutable.Buffer.empty[String]
+    args.sliding(2).toList.collect {
+      case Array("--primary-py-file", mainPyFile: String) =>


is this a new flag?

Yep - the contract for arguments sent to the child submission client class have changed.

why? I thought we wanted no user-visible changes?

Seems like a lot of these are to do the .sliding(2) thing

It's not user-visible because this class is proxied into from SparkSubmit.scala.

We shouldn't expect this class to be used directly.

ash211 · 2017-07-14T00:18:58Z

...nagers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/Client.scala

+        throw new RuntimeException(s"Unknown arguments: $other")
+    }
+    require(mainAppResource.isDefined,
+        "Main app resource must be defined by either --py-file or --main-java-resource.")


do you mean --primary-py-file and --primary-java-resource here?

ash211 · 2017-07-14T00:20:42Z

...nagers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/Client.scala

+   /**
+    * Run command that initalizes a DriverSpec that will be updated
+    * after each KubernetesSubmissionStep in the sequence that is passed in.
+    * The final driver-spec will be used to build the Driver Container,


s/driver-spec/DriverSpec/ -- this is the only place it's hyphenated

ash211 · 2017-07-14T00:52:19Z

...nagers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/Client.scala

+    */
+  def main(args: Array[String]): Unit = {
+    val parsedArguments = ClientArguments.fromCommandLineArgs(args)
+    val sparkConf = new SparkConf()


comment that this reads from system properties?

and reorder to match order of run method params

The SparkConf constructor's Scaladoc states this sufficiently enough.

The fact that it loads from System Properties I mean.

ash211 · 2017-07-14T00:52:24Z

...nagers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/Client.scala

+   /**
+    * Entry point from SparkSubmit in spark-core
+    *
+    *


nit: extra newline

ash211 · 2017-07-14T00:55:40Z

...etes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/KubernetesFileUtils.scala

+
+  /**
+   * For the collection of uris, resolves any files as follows:
+   * - Files with scheme file:// are resolved to the download path


the given download path

ash211 · 2017-07-14T00:56:27Z

.../scala/org/apache/spark/deploy/kubernetes/submit/KubernetesSubmissionStepsOrchestrator.scala

+    submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making it easy to connect the
+  // names of the Kubernetes resources from e.g. Kubectl or the Kubernetes dashboard to the


lowercase kubectl

ash211 · 2017-07-14T01:01:39Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/KubernetesSubmissionStep.scala

+/**
+ * Represents a step in preparing the Kubernetes driver.
+ */
+private[spark] trait KubernetesSubmissionStep {


Just DriverStep ? goes with the InitContainerStep

DriverStep sounds pretty vague - we want to indicate that we're configuring the driver somehow. Maybe DriverConfigurationStep and InitContainerConfigurationStep.

Oh I like DriverConfigurationStep and InitContainerConfigurationStep

…n-k8s/spark into submission-steps-refactor

Multi-line methods should have four-space indentation for arguments that aren't on the same line as the method call itself... but this is difficult to do consistently given how IDEs handle Scala multi-line indentation in most cases.

* Submission client redesign to use a step-based builder pattern. This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change. The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it. Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended. This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments. * Add a unit test for BaseSubmissionStep. * Add unit test for kubernetes credentials mounting. * Add unit test for InitContainerBootstrapStep. * unit tests for initContainer * Add a unit test for DependencyResolutionStep. * further modifications to InitContainer unit tests * Use of resolver in PythonStep and unit tests for PythonStep * refactoring of init unit tests and pythonstep resolver logic * Add unit test for KubernetesSubmissionStepsOrchestrator. * refactoring and addition of secret trustStore+Cert checks in a SubmissionStepSuite * added SparkPodInitContainerBootstrapSuite * Added InitContainerResourceStagingServerSecretPluginSuite * style in Unit tests * extremely minor style fix in variable naming * Address comments. * Rename class for consistency. * Attempt to make spacing consistent. Multi-line methods should have four-space indentation for arguments that aren't on the same line as the method call itself... but this is difficult to do consistently given how IDEs handle Scala multi-line indentation in most cases.

…e-spark-on-k8s#365) * Submission client redesign to use a step-based builder pattern. This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change. The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it. Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended. This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments. * Add a unit test for BaseSubmissionStep. * Add unit test for kubernetes credentials mounting. * Add unit test for InitContainerBootstrapStep. * unit tests for initContainer * Add a unit test for DependencyResolutionStep. * further modifications to InitContainer unit tests * Use of resolver in PythonStep and unit tests for PythonStep * refactoring of init unit tests and pythonstep resolver logic * Add unit test for KubernetesSubmissionStepsOrchestrator. * refactoring and addition of secret trustStore+Cert checks in a SubmissionStepSuite * added SparkPodInitContainerBootstrapSuite * Added InitContainerResourceStagingServerSecretPluginSuite * style in Unit tests * extremely minor style fix in variable naming * Address comments. * Rename class for consistency. * Attempt to make spacing consistent. Multi-line methods should have four-space indentation for arguments that aren't on the same line as the method call itself... but this is difficult to do consistently given how IDEs handle Scala multi-line indentation in most cases.

mccheah commented Jun 30, 2017

View reviewed changes

ifilonenko self-requested a review June 30, 2017 21:40

mccheah mentioned this pull request Jun 30, 2017

Python Bindings for launching PySpark Jobs from the JVM #364

Merged

ifilonenko reviewed Jun 30, 2017

View reviewed changes

mccheah commented Jul 1, 2017

View reviewed changes

mccheah force-pushed the submission-steps-refactor branch from 99cccdc to 9ff8c69 Compare July 5, 2017 21:57

mccheah changed the base branch from pyspark-integration to branch-2.1-kubernetes July 5, 2017 21:57

mccheah commented Jul 5, 2017

View reviewed changes

Add a unit test for BaseSubmissionStep.

c23bb4c

mccheah force-pushed the submission-steps-refactor branch 2 times, most recently from f124840 to c23bb4c Compare July 5, 2017 23:46

Add unit test for kubernetes credentials mounting.

f8d28b8

erikerlandson reviewed Jul 5, 2017

View reviewed changes

mccheah and others added 2 commits July 5, 2017 17:50

Add unit test for InitContainerBootstrapStep.

90f77fb

unit tests for initContainer

01b8d18

foxish mentioned this pull request Jul 7, 2017

Failing submission if submitter-local files are provided, but no reso… #362

Closed

ifilonenko added 2 commits July 7, 2017 11:07

added SparkPodInitContainerBootstrapSuite

9e002aa

Added InitContainerResourceStagingServerSecretPluginSuite

61a7561

mccheah commented Jul 7, 2017

View reviewed changes

ifilonenko added 2 commits July 7, 2017 13:46

style in Unit tests

fa78aad

extremely minor style fix in variable naming

c477a0c

ifilonenko approved these changes Jul 10, 2017

View reviewed changes

ifilonenko requested a review from ash211 July 10, 2017 18:33

ash211 reviewed Jul 14, 2017

View reviewed changes

mccheah added 4 commits July 13, 2017 20:05

Address comments.

5a76328

Merge branch 'submission-steps-refactor' of github.com:apache-spark-o…

16adf71

…n-k8s/spark into submission-steps-refactor

Rename class for consistency.

ed52eee

Attempt to make spacing consistent.

397312c

Multi-line methods should have four-space indentation for arguments that aren't on the same line as the method call itself... but this is difficult to do consistently given how IDEs handle Scala multi-line indentation in most cases.

ash211 approved these changes Jul 14, 2017

View reviewed changes

ash211 merged commit 0f4368f into branch-2.1-kubernetes Jul 14, 2017

ash211 deleted the submission-steps-refactor branch July 14, 2017 22:43

ifilonenko mentioned this pull request Aug 1, 2017

PySpark Submission Failing on --py-files #407

Merged

erikerlandson mentioned this pull request Aug 4, 2017

Integration test failing due to Resource Staging Server #389

Closed

liyinan926 mentioned this pull request Nov 9, 2017

Spark on Kubernetes - basic submission client #545

Closed

Submission client redesign to use a step-based builder pattern #365

Submission client redesign to use a step-based builder pattern #365

Conversation

mccheah commented Jun 30, 2017

mccheah commented Jun 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ifilonenko Jun 30, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mccheah commented Jul 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mccheah commented Jul 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mccheah commented Jul 7, 2017

mccheah Jul 7, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ifilonenko left a comment

Choose a reason for hiding this comment

ash211 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ifilonenko Jun 30, 2017 •

edited

mccheah Jul 7, 2017 •

edited