Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13441] [YARN] Fix NPE in yarn Client.createConfArchive method #11337

Closed
wants to merge 1 commit into from

Conversation

chtyim
Copy link
Member

@chtyim chtyim commented Feb 24, 2016

What changes were proposed in this pull request?

Instead of using result of File.listFiles() directly, which may throw NPE, check for null first. If it is null, log a warning instead

How was the this patch tested?

Ran the ./dev/run-tests locally
Tested manually on a cluster

- Log a warning instead of throwing NPE if either
  $HADOOP_CONF_DIR or $YARN_CONF_DIR is not accessible.
hadoopConfFiles(file.getName()) = file
val files = dir.listFiles()
if (files == null) {
logWarning("Failed to list files under directory " + dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that means it (necessarily) failed; it can return null if it's empty. I think a simple != null check is all you want here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Java API doc, it returns empty array if the directory is empty. It only returns null if it is not a directory or have IO error (permission issue). Without logging a warning, we might be silently ignoring misconfiguration.

Ref: https://docs.oracle.com/javase/7/docs/api/java/io/File.html#listFiles()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair point, it has already been checked with isDirectory. OK this looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this could be s"Failed to list files under directory $dir" but I don't think it's worth changing

@chtyim
Copy link
Member Author

chtyim commented Feb 24, 2016

Thanks @srowen for the review. Do I need to do anything to get this PR merge?

@jerryshao
Copy link
Contributor

Is it better to throw an exception and fail fast? I'm wondering the behavior of application is still correct if we miss these Hadoop related configurations.

@chtyim
Copy link
Member Author

chtyim commented Feb 25, 2016

The reason why failing is bad in this case is because if either the HADOOP_CONF_DIR or the YARN_CONF_DIR is not readable, exception is raised, even if one of those directories already provides all the hadoop conf files that are needed by Spark.

E.g. in one of our use case, we actually launch Spark job from a YARN container, which the HADOOP_CONF_DIR is set to be a private directory that is owned by the NM (that's how Cloudera CM works anyway), which the container fails to read. Even we try to workaround it by setting the YARN_CONF_DIR to a readable directory, however, due to the exception being raised, we can't submit the job even those hadoop files are available.

@srowen
Copy link
Member

srowen commented Feb 25, 2016

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51966 has finished for PR 11337 at commit a8fb7cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 25, 2016

Merged to master/1.6

asfgit pushed a commit that referenced this pull request Feb 25, 2016
## What changes were proposed in this pull request?

Instead of using result of File.listFiles() directly, which may throw NPE, check for null first. If it is null, log a warning instead

## How was the this patch tested?

Ran the ./dev/run-tests locally
Tested manually on a cluster

Author: Terence Yim <terence@cask.co>

Closes #11337 from chtyim/fixes/SPARK-13441-null-check.

(cherry picked from commit fae88af)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in fae88af Feb 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants