Skip to content

load hive-site.xml for flink catalog#1527

Closed
zhangjun0x01 wants to merge 30 commits intoapache:masterfrom
zhangjun0x01:load_hive_conf
Closed

load hive-site.xml for flink catalog#1527
zhangjun0x01 wants to merge 30 commits intoapache:masterfrom
zhangjun0x01:load_hive_conf

Conversation

@zhangjun0x01
Copy link
Contributor

#1437

In order to be compatible with various flink submission modes, FlinkCatalogFactory provides a variable hive-site-path to specify the path of hive-site.xml, which can be a local path or an hdfs path.

private void loadHiveConf(Configuration configuration, Map<String, String> properties) {
String hiveConfPath = properties.get(HIVE_SITE_PATH);
Path path = new Path(hiveConfPath);
if (hiveConfPath.startsWith("hdfs")) {
Copy link
Contributor

@holdenk holdenk Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we support more than just HDFS (like s3)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes ,you are right ,we should support more storage to store hive configuration file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also use a named constant for these string literals for the filesystem scheme? Maybe HDFS_SCHEME in this case would work.

return createCatalog(name, properties, clusterHadoopConf());
Configuration configuration = clusterHadoopConf();
String catalogType = properties.get(ICEBERG_CATALOG_TYPE);
if (catalogType.equals("hive")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note here about a named constant for this string literal.


private void download(Configuration configuration, Path hdfsHiveSitePath) {
try {
File tmpFile = File.createTempFile("hive-site.xml-", "");
Copy link
Contributor

@kbendick kbendick Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the temp file hive-site.xml- supposed to have that trailing dash? Is this possibly a typo?

If it's not a typo and it's used to distinguish from the actual hive-site.xml, perhaps we could use a name that clarifies more the intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial idea was to add a prefix to distinguish it from the actual hive-site.xml. We can add a .tmp suffix to make users understand that it is only a temporary file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds much better to me. .tmp definitely better conveys the intent to me. Thanks for updating that!

Co-authored-by: Marton Bod <mbod@cloudera.com>
@rdblue
Copy link
Contributor

rdblue commented Oct 1, 2020

@zhangjun0x01, looks like this is picking up changes from other PRs. Could you fix that, please?


public static final String HIVE_SITE_PATH = "hive-site-path";
public static final String HIVE_SITE_SCHEMA_FILE = "file";
public static final String HIVE_SITE_SCHEMA_HDFS = "hdfs";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible nit / typo / my own confusion on the above two: I think the term is SCHEME and not SCHEMA, but could be one of those instances where the term is overloaded. I've always heard of that part of the URI referred to as scheme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in your getSchema function that the call to the Hadoop fs API uses scheme so I think it is scheme.

String schema = path.toUri().getScheme();

@zhangjun0x01 zhangjun0x01 deleted the load_hive_conf branch October 4, 2020 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.