Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32119][CORE][3.0] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars #29621

Closed
wants to merge 2 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Sep 1, 2020

What changes were proposed in this pull request?

This is a backport PR for branch-3.0.

This PR changes Executor to load jars and files added by --jars and --files on Executor initialization.
To avoid downloading those jars/files twice, they are assosiated with startTime as their uploaded timestamp.

Why are the changes needed?

ExecutorPlugin can't work with Standalone Cluster and Kubernetes
when a jar which contains plugins and files used by the plugins are added by --jars and --files option with spark-submit.

This is because jars and files added by --jars and --files are not loaded on Executor initialization.
I confirmed it works with YARN because jars/files are distributed as distributed cache.

Does this PR introduce any user-facing change?

Yes. jars/files added by --jars and --files are downloaded on each executor on initialization.

How was this patch tested?

Added a new testcase.

sarutak and others added 2 commits September 2, 2020 05:54
…er and Kubernetes with --jars

### What changes were proposed in this pull request?

This PR changes Executor to load jars and files added by --jars and --files on Executor initialization.
To avoid downloading those jars/files twice, they are assosiated with `startTime` as their uploaded timestamp.

### Why are the changes needed?

ExecutorPlugin can't work with Standalone Cluster and Kubernetes
when a jar which contains plugins and files used by the plugins are added by --jars and --files option with spark-submit.

This is because jars and files added by --jars and --files are not loaded on Executor initialization.
I confirmed it works with YARN because jars/files are distributed as distributed cache.

### Does this PR introduce _any_ user-facing change?

Yes. jars/files added by --jars and --files are downloaded on each executor on initialization.

### How was this patch tested?

Added a new testcase.

Closes apache#28939 from sarutak/fix-plugin-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…ovement in SPARK-32119

### What changes were proposed in this pull request?
Update monitoring doc following the improvement/fix in SPARK-32119.

### Why are the changes needed?
SPARK-32119 removes the limitations listed in the monitoring doc "Distribution of the jar files containing the plugin code is currently not done by Spark."

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not relevant

Closes apache#29463 from LucaCanali/followupSPARK32119.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
@sarutak
Copy link
Member Author

sarutak commented Sep 1, 2020

cc: @tgravescs @mridulm @LucaCanali

@SparkQA
Copy link

SparkQA commented Sep 1, 2020

Test build #128166 has finished for PR 29621 at commit f76814c.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 1, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32791/

@SparkQA
Copy link

SparkQA commented Sep 1, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32791/

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

cc @zhengruifeng since you're the release manager of 3.0.1

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Test build #128172 has finished for PR 29621 at commit f76814c.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32797/

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32797/

@zhengruifeng
Copy link
Contributor

@HyukjinKwon Thanks for letting me know this. It seems that this issue may also exist in 2.4? I think you can vote -1 in this active vote.

@HyukjinKwon
Copy link
Member

Assuming from the JIRA, seems like it's not a blocker. I just cc'ed you just as a FYI :-). but @sarutak can you confirm if this blocks the release?

@sarutak
Copy link
Member Author

sarutak commented Sep 2, 2020

I don't think this is a blocker. We can't deploy plugins and its required files with --jars/--files but as @tgravescs mentioned here, we have a workaround by installing plugins on all worker nodes beforehand and specifying them with extraClassPath.

@dongjoon-hyun
Copy link
Member

Ya. I agree that this is not a blocker for 3.0.1 release.

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34981/

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34981/

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Test build #130380 has finished for PR 29621 at commit f76814c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Test build #130381 has finished for PR 29621 at commit f76814c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

AmbLap Server seems to have a broken Maven local repository again.

========================================================================
Running build tests
========================================================================
exec: curl -s -L https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz
exec: curl -s -L https://downloads.lightbend.com/scala/2.12.10/scala-2.12.10.tgz
Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/mvn
Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/mvn
Performing Maven install for hadoop-2.7-hive-1.2
Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/mvn
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on project spark-launcher_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-launcher_2.12/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-launcher_2.12/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got > (position: END_TAG seen ...</metadata>\n>... @13:2) -> [Help 1]
[ERROR] 

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34983/

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34984/

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34984/

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34983/

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Test build #130378 has finished for PR 29621 at commit f76814c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

This PR doesn't pass K8s IT since September.

The failures are on the following test cases. Recently, Jenkins K8s IT seems to be unstable.

  • Run SparkPi with no resources (Failed always, but known to be flaky)
  • Run SparkPi with a very long application name. (failed once in the above)
  • Run SparkR on simple dataframe.R example (Failed always, but known to be flaky

So, I verified manually on my laptop.

ubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Run SparkR on simple dataframe.R example
Run completed in 10 minutes, 17 seconds.

For the K8s IT failure, I've been taking a look independently.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to branch-3.0 for Apache Spark 3.0.2.

dongjoon-hyun pushed a commit that referenced this pull request Oct 28, 2020
…Cluster and Kubernetes with --jars

### What changes were proposed in this pull request?

This is a backport PR for branch-3.0.

This PR changes Executor to load jars and files added by --jars and --files on Executor initialization.
To avoid downloading those jars/files twice, they are assosiated with `startTime` as their uploaded timestamp.

### Why are the changes needed?

ExecutorPlugin can't work with Standalone Cluster and Kubernetes
when a jar which contains plugins and files used by the plugins are added by --jars and --files option with spark-submit.

This is because jars and files added by --jars and --files are not loaded on Executor initialization.
I confirmed it works with YARN because jars/files are distributed as distributed cache.

### Does this PR introduce _any_ user-facing change?

Yes. jars/files added by --jars and --files are downloaded on each executor on initialization.

### How was this patch tested?

Added a new testcase.

Closes #29621 from sarutak/fix-plugin-issue-3.0.

Lead-authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Co-authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@sarutak
Copy link
Member Author

sarutak commented Oct 29, 2020

@dongjoon-hyun @tgravescs Thank you for taking a look at this PR again!

@sarutak sarutak deleted the fix-plugin-issue-3.0 branch June 4, 2021 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants