[BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables#2819
[BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables#2819wypb wants to merge 1 commit intoapache:masterfrom wypb:BEAM-1491
Conversation
|
Hi @lukecwik, can you please review my PR? thank you. |
lukecwik
left a comment
There was a problem hiding this comment.
Please also investigate the test failures that Jenkins has reported
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/10512/testReport/
|
|
||
| // We just need to load both core-site.xml and hdfs-site.xml to determine the | ||
| // default fs path and the hdfs configuration | ||
| if (new File(hadoopConfPath + "/core-site.xml").exists()) { |
There was a problem hiding this comment.
Please use the two argument constructor for File/Path here and below so we aren't assuming how path resolution works ("/" is common but not for every file system)
new File(hadoopConfPath, "core-site.xml");
new Path(hadoopConfPath, "core-site.xml");
| if (new File(hadoopConfPath + "/core-site.xml").exists()) { | ||
| conf.addResource(new Path(hadoopConfPath + "/core-site.xml")); | ||
|
|
||
| if (LOG.isDebugEnabled()) { |
There was a problem hiding this comment.
Please use parameterized messages
if (LOG.isDebugEnabled()) {
LOG.debug("Log " + x);
}
becomes
LOG.debug("Log {}", x);
| Configuration conf = new Configuration(false); | ||
| List<String> hadoopEnvList = Lists.newArrayList("HADOOP_CONF_DIR", "YARN_CONF_DIR"); | ||
| for (String env : hadoopEnvList) { | ||
| String hadoopConfPath = System.getenv(env); |
There was a problem hiding this comment.
Add unit tests.
Note that System.getenv isn't mockable so the best bet is to make a method on ConfigurationLocator like:
@VisibleForTesting
Map<String, String> getEnvironment() {
return System.getenv(key);
}
and spy it in your unit tests.
See GcpOptions and GcpOptionsTest for an example of how this kind of interaction can be tested
| } | ||
| } | ||
| } | ||
| return Lists.<Configuration>newArrayList(conf); |
There was a problem hiding this comment.
We shouldn't be returning a configuration if we didn't load one from one of the paths.
| // Find default configuration when HADOOP_CONF_DIR or YARN_CONF_DIR is set. | ||
| Configuration conf = new Configuration(false); | ||
| List<String> hadoopEnvList = Lists.newArrayList("HADOOP_CONF_DIR", "YARN_CONF_DIR"); | ||
| for (String env : hadoopEnvList) { |
There was a problem hiding this comment.
If we find a configuration in HADOOP_CONF_DIR and YARN_CONF_DIR, we should be returning them both separately and not having the YARN_CONF_DIR overwriting the properties found in HADOOP_CONF_DIR.
Also, ensure that we only load one configuration if both HADOOP_CONF_DIR and YARN_CONF_DIR point to the same location.
|
Sorry for the churn since the SDK is going through several changes towards the first stable release. |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.