[SPARK-7213] [YARN] Check for read permissions before copying a Hadoop config file #5760

nishkamravi2 · 2015-04-28T22:07:39Z

No description provided.

…extFiles The prefix "file:" is missing in the string inserted as key in HashMap

… HADOOP-10456)

…onsistent with rest of Spark

…nravi

…multiplier (redone to resolve merge conflicts)

…nravi Conflicts: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala

…nravi Conflicts: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

…n to check if maxResultSize > 0)

…nravi

…kManager.stop

…nravi

…nravi Conflicts: bin/spark-class bin/spark-class2.cmd launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java

…nravi

AmplabJenkins · 2015-04-28T22:12:10Z

Can one of the admins verify this patch?

srowen · 2015-04-29T15:46:01Z

When would the permissions not be uniform? Just copying a subset of files seems surprising, instead of failing in this case that sounds like an environment problem. I don't think it's good practice to catch Exception here even if you log it. Continuing is not necessarily the right thing to do.

nishkamravi2 · 2015-04-29T18:16:36Z

Not an an environment problem. Permissions are generally non-uniform. Not a recent document, but might help: https://hadoop.apache.org/docs/r2.5.0/hadoop-project-dist/hadoop-common/SecureMode.html. Catch block is debatable, deleting it to avoid further argument.

srowen · 2015-04-29T22:02:44Z

In my installation, all files are owned by root and world-readable. What's the case where normal users of Hadoop can't see some of the config files?

nishkamravi2 · 2015-04-29T22:33:53Z

In my installation container-executor.cfg is root readable only (generated by CDH/CM). Please see the link in the comment above to help answer your question (go down to the configuration section to see expected permissions).

srowen · 2015-04-29T22:39:16Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

-          hadoopConfStream.putNextEntry(new ZipEntry(name))
-          Files.copy(file, hadoopConfStream)
-          hadoopConfStream.closeEntry()
+          if(file.canRead()) {


OK, yeah I see that in the docs, though it's not set up that way in CDH (at least, maybe my installation never needed to configure that file a certain way, dunno). It seems bad to silently ignore unreadable files, so at least log it maybe? then... should it be a warning? because it sounds like there's one file that could reasonably be expected to be unreadable. Do we special case it and warn on anything else? fail on anything else? i'd rather tighten this up in some way from doing this silently.

If a file which was not supposed to be read by this user is not copied, it's expected behavior. I don't think we should log/flag it as a problem.

srowen · 2015-04-29T22:49:55Z

I'm thinking of this case: whole dir is not readable because it's not set up properly and no files are copied. That could be hard to debug. At least an INFO log? this should be a rare case in any event.

nishkamravi2 · 2015-04-29T22:50:51Z

I think that's a whole different issue. You could have config files altogether missing and the same problem applies.

srowen · 2015-04-29T22:59:52Z

Sure, though in that case, there's no real surprise: X didn't exist to begin with, so never showed up on Y. If X exists I might scratch my head for a long while wondering why only it wasn't copied. I think this needs something more than silence. @sryza @vanzin et al in case you have an opinion.

nishkamravi2 · 2015-04-29T23:08:26Z

I think you are mixing two separate issues. This PR attempts to have Spark work for expected behavior. Sanity checking unexpected behavior (anomalous Hadoop config file setup) is a separate problem and I don't think we want to solve that here.

Also adding @andrewor and @pwendell in case they have something to add.

vanzin · 2015-04-29T23:11:58Z

Don't really have an opinion here; a generic warning ("Some config files could not be copied") might either help or scare users, so not sure it's worth it.

srowen · 2015-05-01T08:06:07Z

A warning is too much, but, info log at least? what's the harm?
In the absence of any other support for that though, I'll go ahead in a day or two. I can fix up the missing space after if.

tgravescs · 2015-05-01T19:55:23Z

I can go either way on this too. I can see an INFO message being useful for debugging if things go wrong but if user has a lot of stuff in there that isn't readable it could just be noise in normal case

…p config file Author: Nishkam Ravi <nravi@cloudera.com> Author: nishkamravi2 <nishkamravi@gmail.com> Author: nravi <nravi@c1704.halxg.cloudera.com> Closes apache#5760 from nishkamravi2/master_nravi and squashes the following commits: eaa13b5 [nishkamravi2] Update Client.scala 981afd2 [Nishkam Ravi] Check for read permission before initiating copy 1b81383 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 0f1abd0 [nishkamravi2] Update Utils.scala 474e3bf [nishkamravi2] Update DiskBlockManager.scala 97c383e [nishkamravi2] Update Utils.scala 8691e0c [Nishkam Ravi] Add a try/catch block around Utils.removeShutdownHook 2be1e76 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 1c13b79 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi bad4349 [nishkamravi2] Update Main.java 36a6f87 [Nishkam Ravi] Minor changes and bug fixes b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument d9658d6 [Nishkam Ravi] Changes for SPARK-6406 ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406) 345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi ac58975 [Nishkam Ravi] spark-class changes 06bfeb0 [nishkamravi2] Update spark-class 35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java 4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java 746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar) bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi d453197 [nishkamravi2] Update NewHadoopRDD.scala 6f41a1d [nishkamravi2] Update NewHadoopRDD.scala 0ce2c32 [nishkamravi2] Update HadoopRDD.scala f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown. 71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 494d8c0 [nishkamravi2] Update DiskBlockManager.scala 3c5ddba [nishkamravi2] Update DiskBlockManager.scala f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop 79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala 535295a [nishkamravi2] Update TaskSetManager.scala 3e1b616 [Nishkam Ravi] Modify test for maxResultSize 9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0) 5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 636a9ff [nishkamravi2] Update YarnAllocator.scala 8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead 35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead 5ac2ec1 [Nishkam Ravi] Remove out dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue 42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue 362da5e [Nishkam Ravi] Additional changes for yarn memory overhead c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead 1cf2d1e [nishkamravi2] Update YarnAllocator.scala ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts) 2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark 2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark 3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark 5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) 6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed) 5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) 681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles

nishkamravi2 and others added 30 commits June 3, 2014 15:28

Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeT…

681b36f

…extFiles The prefix "file:" is missing in the string inserted as key in HashMap

Fix in Spark for the Concurrent thread modification issue (SPARK-1097,…

5108700

… HADOOP-10456)

Undo the fix for SPARK-1758 (the problem is fixed)

6b840f0

Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)

df2aeb1

Merge branch 'master' of https://github.com/apache/spark

eb663ca

Merge branch 'master' of https://github.com/apache/spark

5423a03

Merge branch 'master' of https://github.com/apache/spark

3bf8fad

Accept memory input as "30g", "512M" instead of an int value, to be c…

2b630f9

…onsistent with rest of Spark

Merge branch 'master' of https://github.com/apache/spark

efd688a

Merge branch 'master' of https://github.com/apache/spark into master_…

2e69f11

…nravi

Modify default YARN memory_overhead-- from an additive constant to a …

ebcde10

…multiplier (redone to resolve merge conflicts)

Update YarnAllocator.scala

1cf2d1e

Improving logging for AM memoryOverhead

f00fa31

Merge branch 'master' of https://github.com/apache/spark into master_…

c726bd9

…nravi Conflicts: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala

Additional changes for yarn memory overhead

362da5e

Additional changes for yarn memory overhead issue

42c2c3d

Additional documentation for yarn memory overhead issue

dac1047

Remove out

5ac2ec1

Slight change in the doc for yarn memory overhead

35daa64

Doc change for yarn memory overhead

8f76c8b

Update YarnAllocator.scala

636a9ff

Changes to maxResultSize code (improve error message and add conditio…

9f6583e

…n to check if maxResultSize > 0)

Modify test for maxResultSize

3e1b616

Update TaskSetManager.scala

535295a

Update TaskSetManagerSuite.scala

5c9a4cb

Merge branch 'master' of https://github.com/apache/spark into master_…

b446edc

…nravi

Merge branch 'master' of https://github.com/apache/spark into master_…

79ea8b4

…nravi

Workaround for IllegalStateException caused by recent changes to Bloc…

f0d12de

…kManager.stop

Update DiskBlockManager.scala

3c5ddba

nishkamravi2 and others added 11 commits March 26, 2015 22:38

Merge branch 'master' of https://github.com/apache/spark into master_…

b7f4ae7

…nravi

Minor changes and bug fixes

36a6f87

Update Main.java

bad4349

Merge branch 'master' of https://github.com/apache/spark into master_…

1c13b79

…nravi Conflicts: bin/spark-class bin/spark-class2.cmd launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java

Merge branch 'master' of https://github.com/apache/spark into master_…

2be1e76

…nravi

Add a try/catch block around Utils.removeShutdownHook

8691e0c

Update Utils.scala

97c383e

Update DiskBlockManager.scala

474e3bf

Update Utils.scala

0f1abd0

Merge branch 'master' of https://github.com/apache/spark into master_…

1b81383

…nravi

Check for read permission before initiating copy

981afd2

nishkamravi2 changed the title ~~Check for read permissions before copying a Hadoop config file~~ [SPARK-7213] [YARN] Check for read permissions before copying a Hadoop config file Apr 28, 2015

Update Client.scala

eaa13b5

srowen reviewed Apr 29, 2015
View reviewed changes

asfgit closed this in f53a488 May 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-7213] [YARN] Check for read permissions before copying a Hadoop config file #5760

[SPARK-7213] [YARN] Check for read permissions before copying a Hadoop config file #5760

nishkamravi2 commented Apr 28, 2015

AmplabJenkins commented Apr 28, 2015

srowen commented Apr 29, 2015

nishkamravi2 commented Apr 29, 2015

srowen commented Apr 29, 2015

nishkamravi2 commented Apr 29, 2015

srowen Apr 29, 2015

nishkamravi2 Apr 29, 2015

srowen commented Apr 29, 2015

nishkamravi2 commented Apr 29, 2015

srowen commented Apr 29, 2015

nishkamravi2 commented Apr 29, 2015

vanzin commented Apr 29, 2015

srowen commented May 1, 2015

tgravescs commented May 1, 2015

[SPARK-7213] [YARN] Check for read permissions before copying a Hadoop config file #5760

[SPARK-7213] [YARN] Check for read permissions before copying a Hadoop config file #5760

Conversation

nishkamravi2 commented Apr 28, 2015

AmplabJenkins commented Apr 28, 2015

srowen commented Apr 29, 2015

nishkamravi2 commented Apr 29, 2015

srowen commented Apr 29, 2015

nishkamravi2 commented Apr 29, 2015

srowen Apr 29, 2015

Choose a reason for hiding this comment

nishkamravi2 Apr 29, 2015

Choose a reason for hiding this comment

srowen commented Apr 29, 2015

nishkamravi2 commented Apr 29, 2015

srowen commented Apr 29, 2015

nishkamravi2 commented Apr 29, 2015

vanzin commented Apr 29, 2015

srowen commented May 1, 2015

tgravescs commented May 1, 2015