-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLIN-12663]Implement HiveTableSource to read Hive tables #8809
Conversation
cc @xuefuz @bowenli86 @lirui-apache to review. |
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than a minor comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zjuwangg Thanks for the PR!
...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java
Outdated
Show resolved
Hide resolved
private final String dbName; | ||
private final String tableName; | ||
private final Boolean isPartitionTable; | ||
private final String[] partitionColNames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use List instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use List is better than String[]? Is there a benefit by doing so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of that just to be consistent with HiveTableSink
,CatalogTable
, and Hive table
which all use List to store partition col keys/names. Using String array means you will need to convert the list to array somewhere in the code path (very likely in HiveTableFactory
), which is not necessary
JobConf jobConf, | ||
String dbName, | ||
String tableName, | ||
String[] partitionColNames) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use List?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java
Outdated
Show resolved
Hide resolved
...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java
Outdated
Show resolved
Hide resolved
...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java
Outdated
Show resolved
Hide resolved
...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java
Show resolved
Hide resolved
...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java
Outdated
Show resolved
Hide resolved
...connector-hive/src/test/java/org/apache/flink/batch/connectors/hive/HiveTableSourceTest.java
Outdated
Show resolved
Hide resolved
//Now we used metaStore client to create hive table instead of using hiveCatalog for it doesn't support set | ||
//serDe temporarily. | ||
HiveMetastoreClientWrapper client = HiveMetastoreClientFactory.create(hiveConf, null); | ||
org.apache.hadoop.hive.metastore.api.Table tbl = new org.apache.hadoop.hive.metastore.api.Table(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: why use full path of Table
? I didn't find there's any class name conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's conflicts with org.apache.flink.table.api.Table.
The other thing I noticed is that Hive table source and sink impl have quite a few duplication in logic. Maybe that's something we can unify. They don't have to happen immediately though |
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.flink.batch.connectors.hive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to not use org.apache.flink.batch
prefix for package name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to be done in another pr.
cc @bowenli86 to review again |
@KurtYoung regarding your comment on the package name, what's your suggestion on a proper name? It's been brought by @zjffdu too before. I think @zjuwangg named it this way because most connector packages are named as @zjuwangg can you please create a JIRA ticket to track this discussion? We probably need to finalize the package name before releasing 1.9 (not necessarily in this PR), otherwise it's hard to change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zjuwangg Thanks for the update! Only one issue left
private final String dbName; | ||
private final String tableName; | ||
private final Boolean isPartitionTable; | ||
private final String[] partitionColNames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of that just to be consistent with HiveTableSink
,CatalogTable
, and Hive table
which all use List to store partition col keys/names. Using String array means you will need to convert the list to array somewhere in the code path (very likely in HiveTableFactory
), which is not necessary
JobConf jobConf, | ||
String dbName, | ||
String tableName, | ||
String[] partitionColNames) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
For package names, I think we can do as a followup. For the last change request, I think I can make the changes as I will have to refactor it a little bit for HiveTableFactory work that I'm doing. If possible, let's get this in first, as it's kind of blocking me. |
Sounds good. @KurtYoung @zjffdu @xuefuz @lirui-apache @zjuwangg I've created FLINK-12966 to track the effort of finalizing package name. I will merge this PR to unblock @xuefuz given the build has passed |
What is the purpose of the change
Implement HiveTableSource to read Hive tables
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation