[FLIN-12663]Implement HiveTableSource to read Hive tables #8809

zjuwangg · 2019-06-20T13:34:18Z

What is the purpose of the change

Implement HiveTableSource to read Hive tables

Brief change log

add hive table source to read hive tables

Verifying this change

This change added tests and can be verified as follows:

Added HiveTableSourceTest that verify read normally

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): ( no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
The S3 file system connector: ( no)

Documentation

Does this pull request introduce a new feature? (yes )
If yes, how is the feature documented? (JavaDocs)

zjuwangg · 2019-06-20T13:34:47Z

cc @xuefuz @bowenli86 @lirui-apache to review.

flinkbot · 2019-06-20T13:36:47Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flink-connectors/flink-connector-hive/pom.xml

xuefuz

LGTM other than a minor comment.

bowenli86

@zjuwangg Thanks for the PR!

...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java

bowenli86 · 2019-06-20T19:47:53Z

...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java

+	private final String dbName;
+	private final String tableName;
+	private final Boolean isPartitionTable;
+	private final String[] partitionColNames;


use List instead?

Why use List is better than String[]? Is there a benefit by doing so?

I thought of that just to be consistent with HiveTableSink,CatalogTable, and Hive table which all use List to store partition col keys/names. Using String array means you will need to convert the list to array somewhere in the code path (very likely in HiveTableFactory), which is not necessary

bowenli86 · 2019-06-20T19:48:55Z

...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java

+						JobConf jobConf,
+						String dbName,
+						String tableName,
+						String[] partitionColNames) {


...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java

...connector-hive/src/test/java/org/apache/flink/batch/connectors/hive/HiveTableSourceTest.java

bowenli86 · 2019-06-20T19:59:12Z

...connector-hive/src/test/java/org/apache/flink/batch/connectors/hive/HiveTableSourceTest.java

+		//Now we used metaStore client to create hive table instead of using hiveCatalog for it doesn't support set
+		//serDe temporarily.
+		HiveMetastoreClientWrapper client = HiveMetastoreClientFactory.create(hiveConf, null);
+		org.apache.hadoop.hive.metastore.api.Table tbl = new org.apache.hadoop.hive.metastore.api.Table();


minor: why use full path of Table? I didn't find there's any class name conflicts

It's conflicts with org.apache.flink.table.api.Table.

bowenli86 · 2019-06-20T21:17:44Z

The other thing I noticed is that Hive table source and sink impl have quite a few duplication in logic. Maybe that's something we can unify. They don't have to happen immediately though

flink-connectors/flink-connector-hive/pom.xml

KurtYoung · 2019-06-21T01:53:42Z

...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java

+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;


I would suggest to not use org.apache.flink.batch prefix for package name

It's better to be done in another pr.

zjuwangg · 2019-06-24T09:15:04Z

cc @bowenli86 to review again

bowenli86 · 2019-06-24T19:28:25Z

@KurtYoung regarding your comment on the package name, what's your suggestion on a proper name?

It's been brought by @zjffdu too before. I think @zjuwangg named it this way because most connector packages are named as org.apache.flink.streaming.connectors.xxx and he is just following the convention. However, as we are forwarding to streaming-batch unification, we probably don't need "streaming/batch" in the package names any more, coz, like file source/sink, hive source/sink can (doesn't mean we necessarily will) be made as streaming in the future. I'm thinking of just org.apache.flink.connectors.hive. What do you think?

@zjuwangg can you please create a JIRA ticket to track this discussion? We probably need to finalize the package name before releasing 1.9 (not necessarily in this PR), otherwise it's hard to change.

cc @xuefuz @lirui-apache

bowenli86

@zjuwangg Thanks for the update! Only one issue left

bowenli86 · 2019-06-24T19:33:46Z

...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java

+	private final String dbName;
+	private final String tableName;
+	private final Boolean isPartitionTable;
+	private final String[] partitionColNames;


I thought of that just to be consistent with HiveTableSink,CatalogTable, and Hive table which all use List to store partition col keys/names. Using String array means you will need to convert the list to array somewhere in the code path (very likely in HiveTableFactory), which is not necessary

bowenli86 · 2019-06-24T19:33:53Z

...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java

+						JobConf jobConf,
+						String dbName,
+						String tableName,
+						String[] partitionColNames) {


xuefuz · 2019-06-24T21:57:02Z

For package names, I think we can do as a followup. For the last change request, I think I can make the changes as I will have to refactor it a little bit for HiveTableFactory work that I'm doing.

If possible, let's get this in first, as it's kind of blocking me.

bowenli86 · 2019-06-24T22:08:53Z

For package names, I think we can do as a followup. For the last change request, I think I can make the changes as I will have to refactor it a little bit for HiveTableFactory work that I'm doing.

If possible, let's get this in first, as it's kind of blocking me.

Sounds good. @KurtYoung @zjffdu @xuefuz @lirui-apache @zjuwangg I've created FLINK-12966 to track the effort of finalizing package name.

I will merge this PR to unblock @xuefuz given the build has passed

rmetzger added the review=description? label Jun 20, 2019

xuefuz reviewed Jun 20, 2019

View reviewed changes

flink-connectors/flink-connector-hive/pom.xml Outdated Show resolved Hide resolved

xuefuz reviewed Jun 20, 2019

View reviewed changes

bowenli86 reviewed Jun 20, 2019

View reviewed changes

KurtYoung requested changes Jun 21, 2019

View reviewed changes

zjuwangg added 2 commits June 24, 2019 11:05

[FLIN-12663]Implement HiveTableSource to read Hive tables

d76ea9a

address the comments

febda05

zjuwangg force-pushed the FLINK-12663 branch from 9ade073 to febda05 Compare June 24, 2019 07:51

add empty line in pom

afe2d1a

bowenli86 reviewed Jun 24, 2019

View reviewed changes

asfgit closed this in 2949166 Jun 24, 2019

zjuwangg deleted the FLINK-12663 branch June 25, 2019 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLIN-12663]Implement HiveTableSource to read Hive tables #8809

[FLIN-12663]Implement HiveTableSource to read Hive tables #8809

zjuwangg commented Jun 20, 2019

zjuwangg commented Jun 20, 2019

flinkbot commented Jun 20, 2019

xuefuz left a comment

bowenli86 left a comment

bowenli86 Jun 20, 2019

zjuwangg Jun 24, 2019

bowenli86 Jun 24, 2019

bowenli86 Jun 20, 2019

zjuwangg Jun 24, 2019

bowenli86 Jun 24, 2019

bowenli86 Jun 20, 2019

zjuwangg Jun 24, 2019

bowenli86 commented Jun 20, 2019

KurtYoung Jun 21, 2019

zjuwangg Jun 24, 2019

zjuwangg commented Jun 24, 2019

bowenli86 commented Jun 24, 2019 •

edited

Loading

bowenli86 left a comment

bowenli86 Jun 24, 2019

bowenli86 Jun 24, 2019

xuefuz commented Jun 24, 2019

bowenli86 commented Jun 24, 2019 •

edited

Loading

[FLIN-12663]Implement HiveTableSource to read Hive tables #8809

[FLIN-12663]Implement HiveTableSource to read Hive tables #8809

Conversation

zjuwangg commented Jun 20, 2019

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

zjuwangg commented Jun 20, 2019

flinkbot commented Jun 20, 2019

Review Progress

xuefuz left a comment

Choose a reason for hiding this comment

bowenli86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bowenli86 commented Jun 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjuwangg commented Jun 24, 2019

bowenli86 commented Jun 24, 2019 • edited Loading

bowenli86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xuefuz commented Jun 24, 2019

bowenli86 commented Jun 24, 2019 • edited Loading

bowenli86 commented Jun 24, 2019 •

edited

Loading

bowenli86 commented Jun 24, 2019 •

edited

Loading