Skip to content

Conversation

@lirui-apache
Copy link
Contributor

@lirui-apache lirui-apache commented Jul 11, 2019

What is the purpose of the change

To handle default partition name when reading/writing Hive tables.

Brief change log

  • Handle default partition name.
  • Added a new test case for default partition name.

Verifying this change

Comes with new test case.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? NA

@lirui-apache
Copy link
Contributor Author

The newly added test is disabled for now. I'll update this PR once #9039 is in.

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 11, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit cbc2d2b (Wed Aug 07 08:16:29 UTC 2019)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 11, 2019

CI report:

@lirui-apache lirui-apache force-pushed the FLINK-13012 branch 2 times, most recently from b22f836 to 64007bc Compare July 16, 2019 08:05
@lirui-apache
Copy link
Contributor Author

cc @xuefuz @bowenli86 @zjuwangg
This PR handles default partition name as well as adding e2e test case. Let me know if you think we should split it into separate PRs.
And I changed hive-exec dependency to compile-scope in hive connector. It's because provided deps are not transitive, and therefore anything depending on hive connector needs to declare its own dependency on hive-exec and keep in sync with hive connector. I suppose we don't need to worry about licenses as long as we don't bundle hive in our jars, is that correct?

Copy link
Member

@bowenli86 bowenli86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lirui-apache thanks for your contribution!

It would be good to split it into two PRs, one for partition name, one for e2e tests with two commits with their own JIRAs as 1st change dependencies of flink-connector-hive/pom.xml and 2nd add e2e test module. We may need to ask Chesnay to help review how to setup e2e on Travis.

IIUIC, changing from 'provided' to 'compile' should be fine.

BTW, do we have plan to add e2e tests for Hive 1.2.1?

if (defaultPartitionName.equals(partitionValue)) {
LogicalTypeRoot typeRoot = type.getLogicalType().getTypeRoot();
// while this is inline with Hive, seems it should be null for string columns as well?
partitionObject = typeRoot == LogicalTypeRoot.CHAR || typeRoot == LogicalTypeRoot.VARCHAR ? defaultPartitionName : null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why partitionObject will be null if the type root is not char/varchar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the partition value equals default partition name, it means the partition value should actually be null.

Object partitionObject;
if (defaultPartitionName.equals(partitionValue)) {
LogicalTypeRoot typeRoot = type.getLogicalType().getTypeRoot();
// while this is inline with Hive, seems it should be null for string columns as well?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive also has String type. What's Hive's behavior on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a string partition column, it'll be assigned the default value, e.g. __HIVE_DEFAULT_PARTITION__. But I thought it should simply be null, just like other kinds of partition columns.

@xuefuz
Copy link
Contributor

xuefuz commented Jul 16, 2019

Re: dep scope change

I guess it's a bigger change at this point. If I understand correctly, you are trying to make the pom file for the test project simpler. The motivation seems fine, but it invalidates the purpose of us putting "provided" there in the place, right? What's the consequence to the user with this change?

@lirui-apache
Copy link
Contributor Author

Changed to test with table env. This PR needs to wait until we have #9181 merged.

@lirui-apache
Copy link
Contributor Author

PR rebased.
@xuefuz @zjuwangg please take another look. Thanks.

Copy link
Contributor

@xuefuz xuefuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

For the Hive behavior, maybe we can create a JIRA, hoping to get what Hive folks say about it.

@lirui-apache
Copy link
Contributor Author

Thanks @xuefuz.
@wuchong could you please help review and merge this PR. Travis test has passed: https://travis-ci.com/flink-ci/flink/builds/120691298

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lirui-apache , it looks good to me.

Will merge this.

@asfgit asfgit closed this in 1d81374 Jul 29, 2019
@lirui-apache
Copy link
Contributor Author

Thanks @wuchong for help with the merge!

@lirui-apache lirui-apache deleted the FLINK-13012 branch July 29, 2019 07:17
asfgit pushed a commit that referenced this pull request Jul 29, 2019
wzhero1 pushed a commit to wzhero1/flink that referenced this pull request Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants