Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22962][K8S] Fail fast if submission client local files are used #20320

Closed
wants to merge 3 commits into from
Closed

Conversation

liyinan926
Copy link
Contributor

@liyinan926 liyinan926 commented Jan 18, 2018

What changes were proposed in this pull request?

In the Kubernetes mode, fails fast in the submission process if any submission client local dependencies are used as the use case is not supported yet.

How was this patch tested?

Unit tests, integration tests, and manual tests.

@vanzin @foxish

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM only some nits.

@@ -117,6 +117,12 @@ private[spark] class DriverConfigOrchestrator(
.map(_.split(","))
.getOrElse(Array.empty[String])

if (existSubmissionLocalFiles(sparkJars) || existSubmissionLocalFiles(sparkFiles)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a TODO here if this is planned to be supported in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -117,6 +117,12 @@ private[spark] class DriverConfigOrchestrator(
.map(_.split(","))
.getOrElse(Array.empty[String])

if (existSubmissionLocalFiles(sparkJars) || existSubmissionLocalFiles(sparkFiles)) {
throw new SparkException("The Kubernetes mode does not yet support application " +
"dependencies local to the submission client. It currently only allows application" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space in the end of line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/12/

Copy link
Contributor

@foxish foxish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on docs change for 2.3 modulo minor comments.
Code changes ok if we also re-validate with manual tests against http (we have a caveat with our integration tests not testing this correctly yet), gcs and HDFS. Also ok with dropping the code change for 2.3, since it's a usability and not a functionality improvement.

`SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles.
`SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. The `local://` scheme is also required when referring to
dependencies in custom-built Docker images in `spark-submit`. Note that using application dependencies local to the submission
client is currently not yet supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"application dependencies from the local file system"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -117,7 +117,10 @@ This URI is the location of the example jar that is already in the Docker image.
If your application's dependencies are all hosted in remote locations like HDFS or HTTP servers, they may be referred to
by their appropriate remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images.
Those dependencies can be added to the classpath by referencing them with `local://` URIs and/or setting the
`SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles.
`SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. The `local://` scheme is also required when referring to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point above already covers that local:// is needed with custom-built images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's about adding to the classpath. I wanted to make it more specific and clearer.

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/12/

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Test build #86354 has finished for PR 20320 at commit df9cdfb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liyinan926
Copy link
Contributor Author

Regarding manual tests, our integration tests cover local:// and http:// (at least for files). I will do some manual tests on http:// for jars and gs://.

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/14/

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Test build #86357 has finished for PR 20320 at commit 8e1fffe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/14/

@liyinan926
Copy link
Contributor Author

liyinan926 commented Jan 18, 2018

@foxish Manual tests to verify that using http:///https://, gcs://, and local:// dependencies works fine. Also verified that using file:// dependencies results in submission failure due to the exception.

Copy link
Contributor

@foxish foxish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - is referencing a local file as a main app resource (/home/../spark-examples.jar for example) need to be dealt with separately?

@@ -117,6 +117,13 @@ private[spark] class DriverConfigOrchestrator(
.map(_.split(","))
.getOrElse(Array.empty[String])

// TODO(SPARK-23153): remote once submission client local dependencies are supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/remote/remove/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -117,6 +117,13 @@ private[spark] class DriverConfigOrchestrator(
.map(_.split(","))
.getOrElse(Array.empty[String])

// TODO(SPARK-23153): remote once submission client local dependencies are supported.
if (existSubmissionLocalFiles(sparkJars) || existSubmissionLocalFiles(sparkFiles)) {
throw new SparkException("The Kubernetes mode does not yet support application " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd shorten this to just "Kubernetes mode does not support referencing application dependencies in the local file system".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@liyinan926
Copy link
Contributor Author

Actually main application jar gets added to spark.jars automatically by the submission process so are treated the same as other jars on spark.jars.

@foxish
Copy link
Contributor

foxish commented Jan 18, 2018

Suspected that it would use the same code path - might be worth a manual test of that as well.
LGTM from my end.

@liyinan926
Copy link
Contributor Author

liyinan926 commented Jan 18, 2018

The manual tests I did actually use a main app jar located on gcs and http. To be specific and for record, I did the following tests:

  1. Using a gs:// main application jar and a http:// dependency jar. Succeeded.
  2. Using a https:// main application jar and a http:// dependency jar. Succeeded.
  3. Using a local:// main application jar. Succeeded.
  4. Using a file:// main application jar. Failed.
  5. Using a file:// dependency jar. Failed.

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/17/

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Test build #86359 has finished for PR 20320 at commit 649e34d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/17/

@vanzin
Copy link
Contributor

vanzin commented Jan 18, 2018

Merging to master / 2.3.

asfgit pushed a commit that referenced this pull request Jan 18, 2018
## What changes were proposed in this pull request?

In the Kubernetes mode, fails fast in the submission process if any submission client local dependencies are used as the use case is not supported yet.

## How was this patch tested?

Unit tests, integration tests, and manual tests.

vanzin foxish

Author: Yinan Li <liyinan926@gmail.com>

Closes #20320 from liyinan926/master.

(cherry picked from commit 5d7c4ba)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@asfgit asfgit closed this in 5d7c4ba Jan 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants