Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode #19954

Closed
wants to merge 14 commits into from

Conversation

Projects
None yet
9 participants
@liyinan926
Copy link
Contributor

commented Dec 12, 2017

What changes were proposed in this pull request?

This PR expands the Kubernetes mode to be able to use remote dependencies on http/https endpoints, GCS, S3, etc. It adds steps for configuring and appending the Kubernetes init-container into the driver and executor pods for downloading remote dependencies.
Init-containers, as the name suggests, are containers that are run to completion before the main containers start, and are often used to perform initialization tasks prior to starting the main containers. We use init-containers to localize remote application dependencies before the driver/executors start running. The code that the init-container runs is also included. This PR also adds a step to the driver and executors for mounting user-specified secrets that may store credentials for accessing data storage, e.g., S3 and Google Cloud Storage (GCS), into the driver and executors.

How was this patch tested?

  • The patch contains unit tests which are passing.
  • Manual testing: ./build/mvn -Pkubernetes clean package succeeded.
  • Manual testing of the following cases:
    • Running SparkPi using container-local spark-example jar.
    • Running SparkPi using container-local spark-example jar with user-specific secret mounted.
    • Running SparkPi using spark-example jar hosted remotely on an https endpoint.

cc @rxin @felixcheung @mateiz (shepherd)
k8s-big-data SIG members & contributors: @mccheah @foxish @ash211 @ssuchter @varunkatta @kimoonkim @erikerlandson @tnachen @ifilonenko @liyinan926
reviewers: @vanzin @felixcheung @jiangxb1987 @mridulm

@SparkQA

This comment has been minimized.

Copy link

commented Dec 12, 2017

Test build #84789 has finished for PR 19954 at commit f9dc86d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 12, 2017

Test build #84793 has finished for PR 19954 at commit cd5e832.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 12, 2017

Test build #84795 has finished for PR 19954 at commit b9a0090.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 13, 2017

Test build #84798 has finished for PR 19954 at commit 5512d80.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 13, 2017

Test build #84838 has finished for PR 19954 at commit 1a74521.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@foxish

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2017

@liyinan926, can you update the PR title please?
Maybe "Enable submission of GCS (or S3) and HTTP dependencies to Kubernetes Scheduler Backend"

@liyinan926 liyinan926 changed the title [SPARK-22757][Kubernetes] add init-container bootstrapping and secret mounting steps [SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode Dec 13, 2017

@foxish

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2017

@rxin @mateiz @vanzin @mridulm @jiangxb1987 @felixcheung, if you can help take a look at this, it would add a lot of value to Kubernetes mode in 2.3. Granted we already have a way to supply dependencies within docker images and that works as expected (with the minor fix in #19972), this would enable submitting from http and from object storage. That's a big value-add for people and enables them to have standard images and not bake a new one for each spark application.

If we can get this review going before everyone leaves for vacation (and even if it gets picked up after we all return), that's a big step for us in the community.

@erikerlandson

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2017

+1 to @foxish comment above. For accessing data from http, s3, etc this will be a huge reduction in barrier to entry. The difference between having to spin a custom docker image and just using it out of the box.

@liyinan926

This comment has been minimized.

Copy link
Contributor Author

commented Dec 14, 2017

+1. This PR closes a big feature gap of the Kubernetes scheduler backend compared with other backends. This makes the Kubernetes backend much more usable.

@mridulm

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2017

This is promising ! I will take look at it over the weekend, thanks for the great work :-)
My only concern would be if we can minimize fetches if multiple containers are running on the same host - sort of like what yarn does : but that is a minor nit compared to basic feature.

@foxish

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2017

@mridulm We did have some discussions on resource localization at some point. This is a powerful mechanism when coupled with a resource staging server (in the future). There is a cost per-pod when using localization in this manner, but there is an alternative - using a docker image baked with dependencies - k8s does node-level caching of the downloaded images, so, one would pay the cost only once per node.

...bernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala Outdated
@@ -209,9 +214,33 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)
.build()
}.getOrElse(executorContainer)

new PodBuilder(executorPod)
val (withMaybeSecretsMountedPod, withMaybeSecretsMountedContainer) =
mountSecretsBootstrap.map {bootstrap =>

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 15, 2017

Member

nit: add a space before bootstrap.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 15, 2017

Author Contributor

Done.

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala Outdated
initContainer,
mainContainerWithMountedFiles)
}

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 15, 2017

Member

nit: remove an extra line.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 15, 2017

Author Contributor

Done.

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala Outdated
.build())

val initContainer = new ContainerBuilder(podWithDetachedInitContainer.initContainer)
.withName(s"spark-init")

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 15, 2017

Member

nit: remove s.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 15, 2017

Author Contributor

Done.

.../scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala Outdated
remoteJars,
jarsDownloadDir,
s"Remote jars download directory specified at $jarsDownloadDir does not exist " +
s"or is not a directory.")

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 15, 2017

Member

nit: remove s.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 15, 2017

Author Contributor

Done.

...es/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala Outdated
// dependencies. The executor's main container and its init-container share the secrets
// because the init-container is sort of an implementation details and this sharing
// avoids introducing a dedicated configuration property just for the init-container.
val mayBeInitContainerMountSecretsBootstrap = if (maybeInitContainerBootstrap.nonEmpty &&

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 15, 2017

Member

What's the difference between this and mayBeMountSecretBootstrap above? Looks like mounting the same paths twice?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 15, 2017

Author Contributor

mayBeMountSecretBootstrap is for the executor main container, whereas mayBeInitContainerMountSecretsBootstrap is for the executor init container, which is run before the main container starts.

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 15, 2017

Member

Oh, I see. Thanks!

@SparkQA

This comment has been minimized.

Copy link

commented Dec 15, 2017

Test build #84966 has finished for PR 19954 at commit 38b850f.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@vanzin
Copy link
Contributor

left a comment

This is a pretty large change, and I need more time to look at everything. I left some comments, mostly stylistic, which also should apply to the code I haven't commented on yet.

In general, reading the code becomes a little tiring because it's hard to mentally keep track of all the long names being used.

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala Outdated
.stringConf
.createWithDefault("/var/spark-data/spark-files")

val INIT_CONTAINER_DOCKER_IMAGE =
ConfigBuilder("spark.kubernetes.initContainer.docker.image")

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

Mirroring the discussion in the other PR, are these really restricted to docker? Is it a required config?

This comment has been minimized.

Copy link
@foxish

foxish Dec 16, 2017

Contributor

None of it is docker specific, can be container everywhere.

This comment has been minimized.

Copy link
@foxish

foxish Dec 16, 2017

Contributor

Is it a required config?

No, as one may forgo the init container if they're building the deps into the docker image itself and supplying it via local:/// paths.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 16, 2017

Author Contributor

Renamed to spark.kubernetes.initContainer.image.

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala Outdated
}
val initContainerCustomEnvVars = sparkConf.getAllWithPrefix(initContainerCustomEnvVarKeyPrefix)
.toSeq
.map(env =>

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

.map { env =>

...agers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala Outdated

override def mountSecrets(pod: Pod, container: Container): (Pod, Container) = {
var podBuilder = new PodBuilder(pod)
secretNamesToMountPaths.keys.foreach(name =>

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

.foreach { name =>

...agers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala Outdated
.endSpec())

var containerBuilder = new ContainerBuilder(container)
secretNamesToMountPaths.foreach(namePath =>

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

.foreach { namePath =>. You get the idea. Please fix all of these.

...src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala Outdated
@@ -116,10 +122,53 @@ private[spark] class DriverConfigurationStepsOrchestrator(
None
}

val mayBeInitContainerBootstrapStep =
if (areAnyFilesNonContainerLocal(sparkJars ++ sparkFiles)) {
val initContainerConfigurationStepsOrchestrator =

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

The variable names in this code are very long in general; verbosity can both help and harm readability, in this case I don't think it's helping much. For example, orchestrator is just as good a name for this variable, since there's no other orchestrator being used here.

...park/deploy/k8s/submit/steps/initcontainer/InitContainerConfigurationStepsOrchestrator.scala Outdated
* Returns the complete ordered list of steps required to configure the init-container. This is
* only used when there are remote application dependencies to localize.
*/
private[spark] class InitContainerConfigurationStepsOrchestrator(

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

Similarly, some type names are also really long. InitContainerOrchestrator sounds just as good to me.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 16, 2017

Author Contributor

Renamed to InitContainerConfigOrchestrator and similarly DriverConfigurationStepsOrchestrator to DriverConfigOrchestrator.

...e-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/FileFetcher.scala Outdated
/**
* Utility for fetching remote file dependencies.
*/
private[spark] trait FileFetcher {

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

Why do you need a trait for this? If it's for the tests, you can mock classes, you don't need an interface for that.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 16, 2017

Author Contributor

Yeah, removed the trait.

.../scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala Outdated
* with different configurations for different download sources, or using the same container to
* download everything at once.
*/
private[spark] class KubernetesSparkDependencyDownloadInitContainer(

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

This name is so long it becomes confusing. How about KubernetesInitContainer? Or do you plan to have multiple different init containers for different things (ugh)?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 16, 2017

Author Contributor

Renamed to SparkPodInitContainer.

.../scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala Outdated
s"Remote files download directory specified at $filesDownloadDir does not exist " +
"or is not a directory.")
}
waitForFutures(

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

You have a thread pool, but really you're just submitting two tasks. Why not one task for each file / jar?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 16, 2017

Author Contributor

This class actually handles more tasks in our fork. For example, is is also responsible for downloading from the resource staging server that hosts submission client dependencies. The resource staging server will be in a future PR.

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

Sure, but that's not my point. If you have 10 jars and 10 files to download, the current code will only download 2 at a time. If you submit each jar / file separately, you'll download as many as your thread pool allows, and you can make that configurable.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 16, 2017

Author Contributor

Got it, will address this.

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 16, 2017

Author Contributor

Updated to create one task per file/jar to download. Regarding the type of thread pool, we are using a CachedThreadPool, which I think makes sense as it can be expected that the tasks are not long-lived.

.../scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala Outdated
}
}

private def waitForFutures(futures: Future[_]*) {

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 16, 2017

Contributor

: Unit = {

But really, there's a single caller, so just inline.

@liyinan926

This comment has been minimized.

Copy link
Contributor Author

commented Dec 16, 2017

@vanzin Addressed your comments so far. PTAL. Thanks!

@SparkQA

This comment has been minimized.

Copy link

commented Dec 16, 2017

Test build #84985 has finished for PR 19954 at commit 46a8c99.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 17, 2017

Test build #85013 has finished for PR 19954 at commit 197882d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 17, 2017

Test build #85028 has finished for PR 19954 at commit e20e212.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 17, 2017

Test build #85029 has finished for PR 19954 at commit 340fa41.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@ueshin

This comment has been minimized.

Copy link
Member

commented Dec 18, 2017

Jenkins, retest this please.

...s/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala Outdated
orchestrator: DriverConfigurationStepsOrchestrator,
types: Class[_ <: DriverConfigurationStep]*): Unit = {
orchestrator: DriverConfigOrchestrator,
types: Class[_ <: DriverConfigurationStep]*): Unit = {

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 18, 2017

Member

nit: indent

...t/scala/org/apache/spark/deploy/k8s/submit/steps/DriverInitContainerBootstrapStepSuite.scala Outdated
}
}

private object FirstTestInitContainerConfigurationStep$ extends InitContainerConfigurationStep {

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 18, 2017

Member

Do we need $ at the end of the object name?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 18, 2017

Author Contributor

No, it shouldn't be there. Removed.

...t/scala/org/apache/spark/deploy/k8s/submit/steps/DriverInitContainerBootstrapStepSuite.scala Outdated
}
}

private object SecondTestInitContainerConfigurationStep$ extends InitContainerConfigurationStep {

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 18, 2017

Member

ditto.

...ernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/SparkConfPropertiesParser.scala Outdated
val sparkConf = new SparkConf(true)

if (!propertiesFile.isFile) {
throw new IllegalArgumentException(s"Server properties file given at" +

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 18, 2017

Member

nit: remove s.
nit: move a space at the beginning of the string of the next line to the end of the string of this line.

@SparkQA

This comment has been minimized.

Copy link

commented Dec 18, 2017

Test build #85056 has finished for PR 19954 at commit 340fa41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@liyinan926

This comment has been minimized.

Copy link
Contributor Author

commented Dec 18, 2017

@ueshin addressed your comments. PTAL. Thanks!

@SparkQA

This comment has been minimized.

Copy link

commented Dec 18, 2017

Test build #85066 has finished for PR 19954 at commit 9ebfc73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 18, 2017

Test build #85069 has finished for PR 19954 at commit ddcb0f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@vanzin
Copy link
Contributor

left a comment

I started looking at this again, but sorry, the long variable names make it really hard to read and, especially, understand this code. I'm always getting lost because all variable names look the same aside from some small difference at the end.

I also don't fully understand all the abstractions being created here. It seems there are three separate concepts (a "bootstrap", a "configuration step", and an "orchestrator") and they're not used consistently.

It seems for example that "orchestrators" are used for the drivers, but for executors there's different code that does some similar things but is instead baked into ExecutorPodFactory (which is another trait with a single implementation).

It would be nice to take a look at the abstraction here and make sure it makes sense and is being used consistently.

At the very least, a "README.md" file explaining how these things are tied together would help a lot those who did not write this code.

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala Outdated
* This is separated out from the init-container steps API because this component can be reused to
* set up the init-container for executors as well.
*/
private[spark] trait InitContainerBootstrap {

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 19, 2017

Contributor

What's the purpose of all these traits that have a single implementation? That seems unnecessary.

This comment has been minimized.

Copy link
@mccheah

mccheah Dec 19, 2017

Contributor

It's more idiomatic to mock a trait than a class and our unit tests always create mocks for every component that isn't the class under test.

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 19, 2017

Contributor

It's more idiomatic to mock a trait than a class

Why? You can mock classes just fine.

This comment has been minimized.

Copy link
@mccheah

mccheah Dec 19, 2017

Contributor

It's probably fine to just use the class here, but some classes can't be mocked, such as final classes or classes with final methods. Having traits everywhere ensures that even if we change the classes down the road to have such characteristics, our tests won't break.

This also is not entirely without precedent. TaskScheduler is only implemented by TaskSchedulerImpl in the main scheduler code, as is TaskContext being extended only by TaskContextImpl. Putting a trait in front of an implementation communicates that it's expected for tests that dependency-inject instances of this to create stub implementations.

But we might be splitting hairs at this point, so using only the class could suffice until we run into problems from having done so.

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala Outdated
extends InitContainerBootstrap {

override def bootstrapInitContainer(
podWithDetachedInitContainer: PodWithDetachedInitContainer): PodWithDetachedInitContainer = {

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 19, 2017

Contributor

I commented about this in my previous review, but could you try to use shorter variable names throughout the PR?

For example, here, just repeating the name of the already long type to name the variable doesn't really help with readability. Imagine if you have two of those, are you going to start adding counters to the already long name?

...c/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverInitContainerBootstrapStep.scala Outdated
* configuration properties for the init-container.
*/
private[spark] class DriverInitContainerBootstrapStep(
initContainerConfigurationSteps: Seq[InitContainerConfigurationStep],

This comment has been minimized.

Copy link
@vanzin

vanzin Dec 19, 2017

Contributor

Another place where variable names are unnecessarily long and harm readability. initContainer is already implicit given the name of the class. I have to parse the variable name every time and ignore the prefix to know which one it is.

@mccheah

This comment has been minimized.

Copy link
Contributor

commented Dec 19, 2017

I also don't fully understand all the abstractions being created here. It seems there are three separate concepts (a "bootstrap", a "configuration step", and an "orchestrator") and they're not used consistently.

A configuration step is a logical unit that applies an additive transformation to the input. A steps orchestrator selects which configuration steps to apply based on the configuration of the application. A bootstrap is a component that can be shared by one or more configuration steps and the driver, since a lot of times the submission client and the driver will share code for configuring the driver and the executor pods, respectively. We discuss this a bit more in here: https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/architecture-docs/submission-client.md - but we don't cover the bootstrap abstraction. We're open to different representations and architectures as well.

@liyinan926

This comment has been minimized.

Copy link
Contributor Author

commented Dec 19, 2017

@mccheah @foxish I gave you push access to the fork I used for this PR. Feel free to push commits if you want. Please do let me know if you plan to address comments from @vanzin and make changes. Otherwise, I will address them later tonight.

@vanzin

This comment has been minimized.

Copy link
Contributor

commented Dec 19, 2017

We discuss this a bit more in here

It's nice that there is some documentation somewhere, but that documentation doesn't really seem to address my comments. For one example, it only explicitly talks about the driver - which sort of makes sense because the document is about submission. But why aren't orchestrators used when starting executors too? It seems there's similar code baked into another class instead.

What I'm asking is for this to be documented properly so that someone who didn't write the code has enough information to know that it's working as it should. Right now I don't see what some of these abstractions are for at all - for example, as far as I can see, the orchestrator can be replaced by a method call instead of being a completely separate type; it's not really abstracting anything. Look at where it's used:

    val configurationStepsOrchestrator = new DriverConfigOrchestrator(/* long list of arguments */)

    Utils.tryWithResource(SparkKubernetesClientFactory.createKubernetesClient(/* another long list of arguments */)0) { kubernetesClient =>
        val client = new Client(
          configurationStepsOrchestrator.getAllConfigurationSteps(),
          /* another long list of arguments */

So aside from not really being able to infer the structure of how these things work, the current abstraction seems to be creating a lot of constructors and methods with long lists of arguments, which is another thing that hurts the readability of the code.

@SparkQA

This comment has been minimized.

Copy link

commented Dec 23, 2017

Test build #85324 has finished for PR 19954 at commit 785b90e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 23, 2017

Test build #85327 has finished for PR 19954 at commit f82c568.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Dec 24, 2017

Test build #85352 has finished for PR 19954 at commit 9d9c841.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@jiangxb1987
Copy link
Contributor

left a comment

Mostly LGTM only some nits. Also cc @ueshin to take another look.

.createOptional

val INIT_CONTAINER_MOUNT_TIMEOUT =
ConfigBuilder("spark.kubernetes.mountDependencies.timeout")

This comment has been minimized.

Copy link
@jiangxb1987

jiangxb1987 Dec 25, 2017

Contributor

nit: spark.kubernetes.initContainer.mountDependencies.timeout?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 25, 2017

Author Contributor

Please see the response regarding spark.kubernetes.mountDependencies.maxSimultaneousDownloads.

.createWithDefault(5)

val INIT_CONTAINER_MAX_THREAD_POOL_SIZE =
ConfigBuilder("spark.kubernetes.mountDependencies.maxSimultaneousDownloads")

This comment has been minimized.

Copy link
@jiangxb1987

jiangxb1987 Dec 25, 2017

Contributor

nit: spark.kubernetes.initContainer.mountDependencies.maxSimultaneousDownloads?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 25, 2017

Author Contributor

I think the current name is already pretty long. Adding initContainer makes it even longer without much added value.

ConfigBuilder("spark.kubernetes.mountDependencies.timeout")
.doc("Timeout before aborting the attempt to download and unpack dependencies from remote " +
"locations into the driver and executor pods.")
.timeConf(TimeUnit.MINUTES)

This comment has been minimized.

Copy link
@jiangxb1987

jiangxb1987 Dec 25, 2017

Contributor

Why not TimeUnit.SECONDS?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 25, 2017

Author Contributor

Done.

mountSecretsStep
}

private def areAnyFilesNonContainerLocal(files: Seq[String]): Boolean = {

This comment has been minimized.

Copy link
@jiangxb1987

jiangxb1987 Dec 25, 2017

Contributor

nit:areAnyFilesNonContainerLocal -> existNonContainerLocalFiles

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 25, 2017

Author Contributor

Done.

private def downloadFiles(
filesCommaSeparated: Option[String],
downloadDir: File,
errMessageOnDestinationNotADirectory: String): Unit = {

This comment has been minimized.

Copy link
@jiangxb1987

jiangxb1987 Dec 25, 2017

Contributor

nit: errMessageOnDestinationNotADirectory -> errMessage?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 25, 2017

Author Contributor

Done.

val initContainerConfigMapKey = sparkConf.get(INIT_CONTAINER_CONFIG_MAP_KEY_CONF)

if (initContainerConfigMap.isEmpty) {
logWarning("The executor's init-container config map was not specified. Executors will " +

This comment has been minimized.

Copy link
@jiangxb1987

jiangxb1987 Dec 25, 2017

Contributor

nit: was not -> is not

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 25, 2017

Author Contributor

Done.

}

if (initContainerConfigMapKey.isEmpty) {
logWarning("The executor's init-container config map key was not specified. Executors will " +

This comment has been minimized.

Copy link
@jiangxb1987

jiangxb1987 Dec 25, 2017

Contributor

nit: was not -> is not

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 25, 2017

Author Contributor

Done.

@SparkQA

This comment has been minimized.

Copy link

commented Dec 25, 2017

Test build #85384 has finished for PR 19954 at commit c51bc56.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@ueshin
Copy link
Member

left a comment

LGTM except for one comment.


# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
# docker build -t spark-init:latest -f dockerfiles/init-container/Dockerfile .

This comment has been minimized.

Copy link
@ueshin

ueshin Dec 26, 2017

Member

kubernetes/dockerfiles/.. instead of dockerfiles/..

Btw, only nits but seems like paths here in Dockerfiles for driver/executor are wrong: kubernetes/dockerfiles/driver/Dockerfile and kubernetes/dockerfiles/executor/Dockerfile respectively?

This comment has been minimized.

Copy link
@liyinan926

liyinan926 Dec 26, 2017

Author Contributor

Done.

@SparkQA

This comment has been minimized.

Copy link

commented Dec 26, 2017

Test build #85398 has finished for PR 19954 at commit 28343fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@liyinan926

This comment has been minimized.

Copy link
Contributor Author

commented Dec 27, 2017

@vanzin @jiangxb1987 @ueshin Can this PR be merged now?

@ueshin

This comment has been minimized.

Copy link
Member

commented Dec 28, 2017

Thanks! merging to master.

@asfgit asfgit closed this in 171f6dd Dec 28, 2017

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 28, 2017

[SPARK-22648][K8S] Add documentation covering init containers and sec…
…rets

## What changes were proposed in this pull request?

This PR updates the Kubernetes documentation corresponding to the following features/changes in apache#19954.
* Ability to use remote dependencies through the init-container.
* Ability to mount user-specified secrets into the driver and executor pods.

vanzin jiangxb1987 foxish

Author: Yinan Li <liyinan926@gmail.com>

Closes apache#20059 from liyinan926/doc-update.

asfgit pushed a commit that referenced this pull request Jan 5, 2018

[SPARK-22757][K8S] Enable spark.jars and spark.files in KUBERNETES mode
## What changes were proposed in this pull request?

We missed enabling `spark.files` and `spark.jars` in #19954. The result is that remote dependencies specified through `spark.files` or `spark.jars` are not included in the list of remote dependencies to be downloaded by the init-container. This PR fixes it.

## How was this patch tested?

Manual tests.

vanzin This replaces #20157.

foxish

Author: Yinan Li <liyinan926@gmail.com>

Closes #20160 from liyinan926/SPARK-22757.

(cherry picked from commit 6cff7d1)
Signed-off-by: Felix Cheung <felixcheung@apache.org>

zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jan 5, 2018

[SPARK-22757][K8S] Enable spark.jars and spark.files in KUBERNETES mode
## What changes were proposed in this pull request?

We missed enabling `spark.files` and `spark.jars` in apache#19954. The result is that remote dependencies specified through `spark.files` or `spark.jars` are not included in the list of remote dependencies to be downloaded by the init-container. This PR fixes it.

## How was this patch tested?

Manual tests.

vanzin This replaces apache#20157.

foxish

Author: Yinan Li <liyinan926@gmail.com>

Closes apache#20160 from liyinan926/SPARK-22757.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.