-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-13012][hive] Handle default partition name of Hive table #9088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The newly added test is disabled for now. I'll update this PR once #9039 is in. |
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit cbc2d2b (Wed Aug 07 08:16:29 UTC 2019) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
b22f836 to
64007bc
Compare
|
cc @xuefuz @bowenli86 @zjuwangg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lirui-apache thanks for your contribution!
It would be good to split it into two PRs, one for partition name, one for e2e tests with two commits with their own JIRAs as 1st change dependencies of flink-connector-hive/pom.xml and 2nd add e2e test module. We may need to ask Chesnay to help review how to setup e2e on Travis.
IIUIC, changing from 'provided' to 'compile' should be fine.
BTW, do we have plan to add e2e tests for Hive 1.2.1?
...nnector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableOutputFormat.java
Outdated
Show resolved
Hide resolved
...nnector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableOutputFormat.java
Outdated
Show resolved
Hide resolved
...ink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSource.java
Outdated
Show resolved
Hide resolved
| if (defaultPartitionName.equals(partitionValue)) { | ||
| LogicalTypeRoot typeRoot = type.getLogicalType().getTypeRoot(); | ||
| // while this is inline with Hive, seems it should be null for string columns as well? | ||
| partitionObject = typeRoot == LogicalTypeRoot.CHAR || typeRoot == LogicalTypeRoot.VARCHAR ? defaultPartitionName : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why partitionObject will be null if the type root is not char/varchar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the partition value equals default partition name, it means the partition value should actually be null.
| Object partitionObject; | ||
| if (defaultPartitionName.equals(partitionValue)) { | ||
| LogicalTypeRoot typeRoot = type.getLogicalType().getTypeRoot(); | ||
| // while this is inline with Hive, seems it should be null for string columns as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive also has String type. What's Hive's behavior on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a string partition column, it'll be assigned the default value, e.g. __HIVE_DEFAULT_PARTITION__. But I thought it should simply be null, just like other kinds of partition columns.
...nnector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableOutputFormat.java
Outdated
Show resolved
Hide resolved
|
Re: dep scope change I guess it's a bigger change at this point. If I understand correctly, you are trying to make the pom file for the test project simpler. The motivation seems fine, but it invalidates the purpose of us putting "provided" there in the place, right? What's the consequence to the user with this change? |
efd6e31 to
d8b3088
Compare
|
Changed to test with table env. This PR needs to wait until we have #9181 merged. |
d8b3088 to
cbc2d2b
Compare
xuefuz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
For the Hive behavior, maybe we can create a JIRA, hoping to get what Hive folks say about it.
|
Thanks @xuefuz. |
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lirui-apache , it looks good to me.
Will merge this.
|
Thanks @wuchong for help with the merge! |
What is the purpose of the change
To handle default partition name when reading/writing Hive tables.
Brief change log
Verifying this change
Comes with new test case.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation