Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1302] Add support for timestamp field in HiveSync #2129

Merged
merged 1 commit into from
Oct 14, 2020

Conversation

satishkotha
Copy link
Member

@satishkotha satishkotha commented Sep 29, 2020

What is the purpose of the pull request

Add support for timestamp field in HiveSync

Brief change log

  • Hudi HiveSyncTool converts int64 fields to 'bigint' hive type.
  • There are use cases where we want to convert this to 'timestamp' type when corresponding parquet original type is TIMESTAMP_MICROS (We can also consider doing this for TIMESTAMP_MILLIS?)
  • This has to be done in backward compatible way, so already synced tables will continue to get hive type as 'bigint'. We can enable 'timestamp' conversion by overriding sync config with a new flag.

Verify this pull request

This change added tests. Also verified with steps in docker demo. Testing in staging in progress.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@cdmikechen
Copy link
Contributor

cdmikechen commented Sep 29, 2020

Can you test on Hive to check if Hive can read data with hudi avro timestamp type?
I think there are more test cases and other things to let Hive support timestamp type.

@satishkotha
Copy link
Member Author

Can you test on Hive to check if Hive can read data with hudi avro timestamp type?
I think there are more test cases and other things to let Hive support timestamp type.

Yes, additional changes are needed in Hive. For short term, we made a temporary patch to Hive to make it work (@s-sanjay plans to open a draft PR soon).
I believe Hive4 also supports this in a different way.

@satishkotha
Copy link
Member Author

@pratyakshsharma will you be able to review this week?

@@ -68,6 +68,9 @@
@Parameter(names = {"--help", "-h"}, help = true)
public Boolean help = false;

@Parameter(names = {"--support-timestamp"}, description = "If true, converts int64(timestamp_micros) to timestamp type")
public Boolean supportTimestamp = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add this option into DataSourceOptions, DataSourceUtils, and HoodieSparkSqlWriter

something like?
"hoodie.datasource.hive_sync.support_timestamp"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bschell thank you for pointing this out. We only use standalone mode, so I missed this DataSourceOptions. Fixed now. PTAL.

@pratyakshsharma
Copy link
Contributor

@pratyakshsharma will you be able to review this week?

Apologies for the delay. Will take a look today.

@vinothchandar
Copy link
Member

cc @umehrot2 as well.

@@ -167,7 +173,10 @@ private static String convertField(final Type parquetType) {
.append(decimalMetadata.getScale()).append(")").toString();
} else if (originalType == OriginalType.DATE) {
return field.append("DATE").toString();
} else if (supportTimestamp && originalType == OriginalType.TIMESTAMP_MICROS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain what happens now if the originalType == OriginalType.TIMESTAMP_MICROS ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash without this change, we get 'bigint' for hive type (goes into parquetPrimitiveTypeName.convert -> convertINT64 method)

Copy link
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@satishkotha Changes looks fine, left 1 comment based on which we could have some possibility of refactoring

@satishkotha
Copy link
Member Author

@pratyakshsharma @n3nash Please take a look.

@satishkotha
Copy link
Member Author

@n3nash @pratyakshsharma @bschell can any of you review this?

@cdmikechen
Copy link
Contributor

@satishkotha
It may have some duplicate parts with HUDI-83. So I added some comments in JIRA.

@pratyakshsharma
Copy link
Contributor

Changes look good to me. @n3nash Do you have any concerns here?

Copy link
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@n3nash n3nash merged commit 7fa641e into apache:master Oct 14, 2020
prashantwason pushed a commit to prashantwason/incubator-hudi that referenced this pull request Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants