Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14423][YARN] Avoid same name files added to distributed cache again #12203

Closed
wants to merge 2 commits into from

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

In the current implementation of assembly-free spark deployment, jars under assembly/target/scala-xxx/jars will be uploaded to distributed cache by default, there's a chance these jars' name will be conflicted with name of jars specified in --jars, this will introduce exception when starting application:

client token: N/A
     diagnostics: Application application_1459907402325_0004 failed 2 times due to AM Container for appattempt_1459907402325_0004_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hw12100.local:8088/proxy/application_1459907402325_0004/Then, click on links to logs of each attempt.
Diagnostics: Resource hdfs://localhost:8020/user/sshao/.sparkStaging/application_1459907402325_0004/avro-mapred-1.7.7-hadoop2.jar changed on src filesystem (expected 1459909780508, was 1459909782590
java.io.IOException: Resource hdfs://localhost:8020/user/sshao/.sparkStaging/application_1459907402325_0004/avro-mapred-1.7.7-hadoop2.jar changed on src filesystem (expected 1459909780508, was 1459909782590
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

So here by checking the name of file to avoid same name files uploaded again.

How was this patch tested?

Unit test and manual integrated test is done locally.

@SparkQA
Copy link

SparkQA commented Apr 6, 2016

Test build #55105 has finished for PR 12203 at commit 7ff58be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -519,8 +528,7 @@ private[spark] class Client(
).foreach { case (flist, resType, addToClasspath) =>
flist.foreach { file =>
val (_, localizedPath) = distribute(file, resType = resType)
require(localizedPath != null)
if (addToClasspath) {
if (addToClasspath && localizedPath != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewor14 , in the previous code we assume all the files will be uploaded into distributed cache, so this localizedPath should not be null. But here with my change, some duplicated files will be neglected, this is return localizedPath as null instead, so here I change to this way.

@andrewor14
Copy link
Contributor

Looks good.

@vanzin
Copy link
Contributor

vanzin commented Apr 7, 2016

Change looks ok (should mostly affect local builds where example jars have duplicates), but could you add a small unit test in ClientSuite.scala?

@jerryshao
Copy link
Contributor Author

Sure, will do.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55302 has finished for PR 12203 at commit e1b09c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

@vanzin , please help to review again, thanks a lot.

val jar1 = TestUtils.createJarWithFiles(Map(), jarsDir)
val jar2 = TestUtils.createJarWithFiles(Map(), userLibs)
// Copy jar2 to jar3 with same name
val jar3 = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have used java.nio.file.Files.copy, but no need to change that now.

@vanzin
Copy link
Contributor

vanzin commented Apr 18, 2016

LGTM, merging to master.

@asfgit asfgit closed this in d6fb485 Apr 18, 2016
zzcclp added a commit to zzcclp/spark that referenced this pull request Apr 19, 2016
lw-lin pushed a commit to lw-lin/spark that referenced this pull request Apr 20, 2016
…again

## What changes were proposed in this pull request?

In the current implementation of assembly-free spark deployment, jars under `assembly/target/scala-xxx/jars` will be uploaded to distributed cache by default, there's a chance these jars' name will be conflicted with name of jars specified in `--jars`, this will introduce exception when starting application:

```
client token: N/A
	 diagnostics: Application application_1459907402325_0004 failed 2 times due to AM Container for appattempt_1459907402325_0004_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hw12100.local:8088/proxy/application_1459907402325_0004/Then, click on links to logs of each attempt.
Diagnostics: Resource hdfs://localhost:8020/user/sshao/.sparkStaging/application_1459907402325_0004/avro-mapred-1.7.7-hadoop2.jar changed on src filesystem (expected 1459909780508, was 1459909782590
java.io.IOException: Resource hdfs://localhost:8020/user/sshao/.sparkStaging/application_1459907402325_0004/avro-mapred-1.7.7-hadoop2.jar changed on src filesystem (expected 1459909780508, was 1459909782590
	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
	at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
	at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
	at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
```

So here by checking the name of file to avoid same name files uploaded again.

## How was this patch tested?

Unit test and manual integrated test is done locally.

Author: jerryshao <sshao@hortonworks.com>

Closes apache#12203 from jerryshao/SPARK-14423.
@RicoGit
Copy link

RicoGit commented Jul 3, 2016

Hi guys, it is possible to apply this patch to version 1.6? What can I do for this?

@jerryshao
Copy link
Contributor Author

@RicoGit This is a behavior change for jars uploading to distributed cache, I'm not sure if it is suitable to back-port to branch 1.6. Also this problem is not so severe in 1.6 since we do the assembly for packaging.

@RicoGit
Copy link

RicoGit commented Jul 4, 2016

Thanks for reply. I have problem with running spark job with oozie. This patch solves my problem. I applied this path to spark 1.6, built (spark-yarn_2.10-1.6.0-cdh5.7.0.jar) and put into sharedLibs of oozie.

@jerryshao
Copy link
Contributor Author

Can you make sure the problem you met is exactly the same as what this PR solved? Since the exception stack you pasted in the StackOverFlow is different from What I pasted here before. From you exception stack, what I could guess is that same jar (same path with same file name) added twice, this is a little different from this PR's mentioned problem.

@RicoGit
Copy link

RicoGit commented Jul 4, 2016

Thanks, i understand this is different problems. What will you advice me? I think that this is not good solution: require(localizedPath != null) just falling with exception message "requirements fails".It is better skip adding to the distributed cache and log warning. How do you think it is enough to open issue?

@jerryshao
Copy link
Contributor Author

Maybe as you mentioned - skip adding to distributed cache and log warning - is enough, throwing exception will fail the application and this is actually not a fatal problem. I'm OK to change the current behavior for this, what do you think @vanzin ?

@vanzin
Copy link
Contributor

vanzin commented Jul 5, 2016

I think there was a version of Oozie that triggered that assert; so maybe upgrading Oozie fixes the problem. It's also probably fine to remove that assert since we haven't seen many people hit it, meaning this situation should be rare.

And, btw, please avoid long discussions on closed PRs. That's why we have mailing lists and JIRA.

asfgit pushed a commit that referenced this pull request Nov 3, 2016
… --files and --archives

## What changes were proposed in this pull request?

During spark-submit, if yarn dist cache is instructed to add same file under --files and --archives, This code change ensures the spark yarn distributed cache behaviour is retained i.e. to warn and fail if same files is mentioned in both --files and --archives.
## How was this patch tested?

Manually tested:
1. if same jar is mentioned in --jars and --files it will continue to submit the job.
- basically functionality [SPARK-14423] #12203 is unchanged
  1. if same file is mentioned in --files and --archives it will fail to submit the job.

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.

… under archives and files

Author: Kishor Patil <kpatil@yahoo-inc.com>

Closes #15627 from kishorvpatil/spark18099.

(cherry picked from commit 098e4ca)
Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
asfgit pushed a commit that referenced this pull request Nov 3, 2016
… --files and --archives

## What changes were proposed in this pull request?

During spark-submit, if yarn dist cache is instructed to add same file under --files and --archives, This code change ensures the spark yarn distributed cache behaviour is retained i.e. to warn and fail if same files is mentioned in both --files and --archives.
## How was this patch tested?

Manually tested:
1. if same jar is mentioned in --jars and --files it will continue to submit the job.
- basically functionality [SPARK-14423] #12203 is unchanged
  1. if same file is mentioned in --files and --archives it will fail to submit the job.

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.

… under archives and files

Author: Kishor Patil <kpatil@yahoo-inc.com>

Closes #15627 from kishorvpatil/spark18099.
@kishorvpatil
Copy link
Contributor

@vanzin, @jerryshao
Sorry for breaking this functionality. I have the patch available with more unit tests added to ensure positive test case ensuring submission continues if unique files/archives are mentioned.

#15810

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
… --files and --archives

## What changes were proposed in this pull request?

During spark-submit, if yarn dist cache is instructed to add same file under --files and --archives, This code change ensures the spark yarn distributed cache behaviour is retained i.e. to warn and fail if same files is mentioned in both --files and --archives.
## How was this patch tested?

Manually tested:
1. if same jar is mentioned in --jars and --files it will continue to submit the job.
- basically functionality [SPARK-14423] apache#12203 is unchanged
  1. if same file is mentioned in --files and --archives it will fail to submit the job.

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.

… under archives and files

Author: Kishor Patil <kpatil@yahoo-inc.com>

Closes apache#15627 from kishorvpatil/spark18099.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants