-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21714][CORE][BACKPORT-2.2] Avoiding re-uploading remote resources in yarn client mode #19074
Conversation
…arn client mode With SPARK-10643, Spark supports download resources from remote in client deploy mode. But the implementation overrides variables which representing added resources (like `args.jars`, `args.pyFiles`) to local path, And yarn client leverage this local path to re-upload resources to distributed cache. This is unnecessary to break the semantics of putting resources in a shared FS. So here proposed to fix it. This is manually verified with jars, pyFiles in local and remote storage, both in client and cluster mode. Author: jerryshao <sshao@hortonworks.com> Closes apache#18962 from jerryshao/SPARK-21714. (cherry picked from commit 1813c4a) Signed-off-by: jerryshao <sshao@hortonworks.com> Change-Id: Ib2e8cb056707b362bc1c496002bac1472dc78ea7
Test build #81201 has started for PR 19074 at commit |
CC @vanzin @tgravescs please review. Since 2.2 and master are quite different in this part of code, so backporting changes a lot. |
Jenkins, retest this please. |
Test build #81205 has finished for PR 19074 at commit
|
LGTM. Merging to 2.2. |
Merged. @jerryshao please close manually. |
…ces in yarn client mode ## What changes were proposed in this pull request? This is a backport PR to fix issue of re-uploading remote resource in yarn client mode. The original PR is #18962. ## How was this patch tested? Tested in local UT. Author: jerryshao <sshao@hortonworks.com> Closes #19074 from jerryshao/SPARK-21714-2.2-backport.
@jerryshao looks like this breaks the 2.10 build: I reverted the patch waiting for the fix. |
Ohh, sorry about it, I forgot to fix it in 2.10 repl code. I will push a fix soon. BTW how do we trigger scala 2.10 build on Jenkins? |
I don't think there's a way to do that (the build Sean mentions happens after PRs are committed). |
OK, so I will do the test locally. |
Change-Id: Ic7ae6152187ce7ee2e527340ecc7e178da970d67
871a005
to
6ba3367
Compare
Test build #81242 has finished for PR 19074 at commit
|
Test build #81243 has finished for PR 19074 at commit
|
Alright, let's try again. Merging to 2.2; make sure to close this PR manually. |
…ces in yarn client mode ## What changes were proposed in this pull request? This is a backport PR to fix issue of re-uploading remote resource in yarn client mode. The original PR is #18962. ## How was this patch tested? Tested in local UT. Author: jerryshao <sshao@hortonworks.com> Closes #19074 from jerryshao/SPARK-21714-2.2-backport.
Thanks @vanzin , it should be passed now 😄 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryshao @gatorsmile This PR broke spark submit file downloading in standalone cluster mode. Basically, it doesn't use local file consistently after download. Can you take a look?
@@ -366,7 +376,7 @@ object SparkSubmit extends CommandLineUtils { | |||
// If a python file is provided, add it to the child arguments and list of files to deploy. | |||
// Usage: PythonAppRunner <main python file> <extra python files> [app arguments] | |||
args.mainClass = "org.apache.spark.deploy.PythonRunner" | |||
args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) ++ args.childArgs | |||
args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs | |||
if (clusterManager != YARN) { | |||
// The YARN backend distributes the primary file differently, so don't merge it. | |||
args.files = mergeFileLists(args.files, args.primaryResource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a behavior change here. I think we should use localFiles
and localPrimaryResource
instead of args.files
and args.primaryResource
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think it should use local files/primary resource here? args.files
will be assigned to "spark.files" for non-yarn deploy, and Spark's fileserver will download them to local for all the executors, so it should be fine to keep as remote resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here the changes restricted to client deploy mode, I'm not sure why it is related to standalone cluster mode?
@@ -376,8 +386,8 @@ object SparkSubmit extends CommandLineUtils { | |||
// The YARN backend handles python files differently, so don't merge the lists. | |||
args.files = mergeFileLists(args.files, args.pyFiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same reason as above.
var localPrimaryResource: String = null | ||
var localJars: String = null | ||
var localPyFiles: String = null | ||
var localFiles: String = null | ||
if (deployMode == CLIENT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to avoid download for yarn, can we just check the cluster mode it here?
@loneknightpy can you file a new bug instead of comment on a closed PR? |
ping @jerryshao |
@loneknightpy did you open a new JIRA about this issue? AFAIK, downloading resources to local disk is not supported for cluster mode even from beginning, would you please elaborate the issue in JIRA, thanks! |
@loneknightpy can you please elaborate more about the issue? I believe you brought this remote resources support in #18078. It doesn't support cluster mode from beginning. Also your original works brings in regression in running on yarn, that's why I made a change here. Maybe there's some changes for standalone cluster mode which assumes local resources, but I don't think it's a expected behavior. |
…ces in yarn client mode ## What changes were proposed in this pull request? This is a backport PR to fix issue of re-uploading remote resource in yarn client mode. The original PR is apache#18962. ## How was this patch tested? Tested in local UT. Author: jerryshao <sshao@hortonworks.com> Closes apache#19074 from jerryshao/SPARK-21714-2.2-backport.
…ces in yarn client mode ## What changes were proposed in this pull request? This is a backport PR to fix issue of re-uploading remote resource in yarn client mode. The original PR is apache#18962. ## How was this patch tested? Tested in local UT. Author: jerryshao <sshao@hortonworks.com> Closes apache#19074 from jerryshao/SPARK-21714-2.2-backport.
What changes were proposed in this pull request?
This is a backport PR to fix issue of re-uploading remote resource in yarn client mode. The original PR is #18962.
How was this patch tested?
Tested in local UT.