Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator #4203

Merged
merged 5 commits into from
Jan 8, 2022

Conversation

codope
Copy link
Member

@codope codope commented Dec 3, 2021

What is the purpose of the pull request

#3944 handled logical timestamp type for complex keygen. This PR fixes HUDI-2909 by handing it for timestamp based keygen.

When the new config hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled is set to true, consistent value will be generated for a logical timestamp type column like timestamp-millis and timestamp-micros, irrespective of whether row-writer is enabled. Disabled by default so as not to break the pipeline that deploy either fully row-writer path or non row-writer path. For example, if it is kept disabled then record key of timestamp type with value 2016-12-29 09:54:00 will be written as timestamp 2016-12-29 09:54:00.0 in row-writer path, while it will be written as long value 1483023240000000 in non row-writer path. If enabled, then the timestamp value will be written in both the cases.

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

Added UT and manually verified with spark datasource.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan
Copy link
Contributor

@codope : is this good to review. If there are any pending work, I can take it up as I am focusing on all issues and jiras. let me know.

@nsivabalan nsivabalan self-assigned this Dec 15, 2021
@nsivabalan nsivabalan added the priority:critical production down; pipelines stalled; Need help asap. label Dec 15, 2021
@codope
Copy link
Member Author

codope commented Dec 31, 2021

@codope : is this good to review. If there are any pending work, I can take it up as I am focusing on all issues and jiras. let me know.

@nsivabalan It is now ready to review. Both row-writer and non-row writer path will emit same value for logical timestamp type column if the config is enabled.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if we can avoid changing public apis in KeyGenerator.

@@ -889,6 +890,13 @@ public String getKeyGeneratorClass() {
return getString(KEYGENERATOR_CLASS_NAME);
}

public boolean isConsistentLogicalTimestampEnabled() {
if (getBoolean(KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED) == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't getBoolean return the default if not found?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@YannByron
Copy link
Contributor

@nsivabalan @codope
I have a discussion related to this implement. In this pr, most of work is just to pass isConsistentLogicalTimestampEnabled to the method HoodieAvroUtils.convertValueForAvroLogicalTypes. What if we have another config need to do this in the future?

@codope
Copy link
Member Author

codope commented Jan 6, 2022

@nsivabalan @codope I have a discussion related to this implement. In this pr, most of work is just to pass isConsistentLogicalTimestampEnabled to the method HoodieAvroUtils.convertValueForAvroLogicalTypes. What if we have another config need to do this in the future?

@YannByron You bring up a good point. Adding another config in future is tedious. However, the intention behind adding a new config was to avoid discrepancy in existing pipelines. @nsivabalan has explained this in more detail on the jira HUDI-2909. I do not expect such changes to be frequent. Nevertheless, i'll try to avoid making incompatible changes to public APIs.

@nsivabalan
Copy link
Contributor

@codope : TestGenericRecordAndRowConsistency test failed in CI. Can you please check.

@codope codope force-pushed the hudi-2909-timestamp-keygen branch 2 times, most recently from fde98bb to cb2d16c Compare January 7, 2022 07:19
@codope
Copy link
Member Author

codope commented Jan 7, 2022

@hudi-bot run azure

Add config to enable consistent logical timestamp values

Address review comments

Add an example in the config doc

Cleanup spark context in test
@hudi-bot
Copy link

hudi-bot commented Jan 8, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@nsivabalan nsivabalan merged commit 8275499 into apache:master Jan 8, 2022
nsivabalan pushed a commit that referenced this pull request Jan 10, 2022
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
@@ -444,15 +445,15 @@ public static Schema generateProjectionSchema(Schema originalSchema, List<String
/**
* Obtain value of the provided field as string, denoted by dot notation. e.g: a.b.c
*/
public static String getNestedFieldValAsString(GenericRecord record, String fieldName, boolean returnNullIfNotFound) {
Object obj = getNestedFieldVal(record, fieldName, returnNullIfNotFound);
public static String getNestedFieldValAsString(GenericRecord record, String fieldName, boolean returnNullIfNotFound, boolean consistentLogicalTimestampEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codope @nsivabalan folks, i don't believe this is the right fix to address the problem: bubbling down this config value we're now exposing every user of this API to be aware of it, which, very likely, will have very little to do with it.

@vinishjail97 vinishjail97 mentioned this pull request Jan 24, 2022
5 tasks
vingov pushed a commit to vingov/hudi that referenced this pull request Jan 26, 2022
…#4203)

* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
liusenhua pushed a commit to liusenhua/hudi that referenced this pull request Mar 1, 2022
…#4203)

* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
…#4203)

* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:critical production down; pipelines stalled; Need help asap.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants