Skip to content
This repository was archived by the owner on Jun 29, 2021. It is now read-only.

Read hadoop configuration files (core-site.xml and hdfs-site.xml) in HdfsResource#78

Closed
kaspersorensen wants to merge 5 commits intoapache:masterfrom
kaspersorensen:METAMODEL-219-hadoop-configuration-files
Closed

Read hadoop configuration files (core-site.xml and hdfs-site.xml) in HdfsResource#78
kaspersorensen wants to merge 5 commits intoapache:masterfrom
kaspersorensen:METAMODEL-219-hadoop-configuration-files

Conversation

@kaspersorensen
Copy link
Copy Markdown
Contributor

Suggested fix for METAMODEL-219.

I did a few minor/additional changes too:

  • Moved the HDFS outputstream and inputstream classes to separate files (HdfsResource was getting too big IMO). The classes are defined as default (non-public) scoped, so not visible outside the package.
  • Changed the FileHelper to support Java 7 AutoCloseable instead of distinguishing between Closeable, Connection, Statement, ResultSet etc.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing EOL

@LosD
Copy link
Copy Markdown
Contributor

LosD commented Dec 11, 2015

Isn't it dangerous reuse existing Hadoop/Yarn environment variables? They may be set globally without being intended for use with whatever is using MetaModel, and they will override the configured fs.defaultFS, as far as I read the [JavaDocs](https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/conf/Configuration.html#addResource%28java.io.InputStream, java.lang.String%29).

@kaspersorensen
Copy link
Copy Markdown
Contributor Author

Fixed the EOLs.

That's a very good question actually... You're right that it would override fs.defaultFS, and initially that was also my intention. I see that HADOOP_CONF_DIR and YARN_CONF_DIR has become standards used by libraries such as Apache Spark and others. I was thinking it would be nice to reuse that standard so that configuration in MetaModel is as easy as possible... BUT of course you're right that it might be that the environment variable is unintentionally set up to configure a different hadoop installation. Hmm then I guess we shouldn't check for environment variables or system properties, but rather only use the constructor argument as a way of configuring it. Then applications that are using MetaModel have to do their own environment variable checking, if it's applicable in their situations.

It is now only done if a newly added system property is set to "true".
@kaspersorensen
Copy link
Copy Markdown
Contributor Author

OK as you can now see from the latest commit (0e0a9fb) I have added a system property metamodel.hadoop.use_hadoop_conf_dir which can be set to "true" in order to enable the HADOOP_CONF_DIR and YARN_CONF_DIR checks. I guess that disables it for anyone that's not explicitly enabling it.

@LosD
Copy link
Copy Markdown
Contributor

LosD commented Dec 12, 2015

Good solution. 👍, then :)

@asfgit asfgit closed this in 2a4b854 Dec 12, 2015
@kaspersorensen kaspersorensen deleted the METAMODEL-219-hadoop-configuration-files branch August 5, 2016 05:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants