[SPARK-14705][YARN]support Multiple FileSystem for YARN STAGING DIR#12473
[SPARK-14705][YARN]support Multiple FileSystem for YARN STAGING DIR#12473lianhuiwang wants to merge 5 commits intoapache:masterfrom
Conversation
| // and add them as local resources to the application master. | ||
| val fs = FileSystem.get(hadoopConf) | ||
| val dst = getAppStagingDirPath(sparkConf, fs, appStagingDir) | ||
| val dst = new Path(appStagingBaseDir, appStagingDir) |
There was a problem hiding this comment.
You could pass appStagingDir as a Path to this method and save some duplication; same for setupLaunchEnv below.
This whole class could use some cleanup in that regard, but these two are pretty low-hanging fruit.
|
Test build #56064 has finished for PR 12473 at commit
|
|
test says NA, what testing have you done with this? In the very least I would like to see manual regression test hdfs and I assume you are making this to talk to some other filesystem, so what other filesystem was it tested with? |
|
@vanzin yes, I update code with your comments.Thanks. |
|
Test build #56182 has finished for PR 12473 at commit
|
|
Test build #56186 has finished for PR 12473 at commit
|
|
So my understanding is that actually supporting different HDFS other than default one, not multiple HDFS, is that right? |
|
@jerryshao Yes, what you said is right. |
|
Test build #56188 has finished for PR 12473 at commit
|
|
(This is super minor but I remember I was told it might be better if those cc are added in comments not in the description because PR description is the place where to describe the PR.) |
|
@HyukjinKwon the merge scripts clean up "@" references from the PR summary. |
|
LGTM, merging to master. |
|
@vanzin Thank you! but it might still look a bit weird that there is cc in PR description above maybe. |
|
@HyukjinKwon @vanzin Thanks. I have updated PR description. But @vanzin have merged to master before. So I think it does not matter for this PR. |
What changes were proposed in this pull request?
In SPARK-13063, It makes the SPARK YARN STAGING DIR as configurable. But it only support default FileSystem. If there are many clusters, It can be different FileSystem for different cluster in our spark.
How was this patch tested?
I have tested it successfully with following commands:
MASTER=yarn-client ./bin/spark-shell --conf spark.yarn.stagingDir=hdfs:namenode2/temp
$SPARK_HOME/bin/spark-submit --conf spark.yarn.stagingDir=hdfs:namenode2/temp