Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6776] Replace JSON with Avro bytes for commit metadata #9579

Merged
merged 7 commits into from
Sep 15, 2023

Conversation

codope
Copy link
Member

@codope codope commented Aug 30, 2023

Change Logs

Today we write the commit metadata for most of the actions as avro byte arrays. But, for commit, deltacommit and replacecommits, it is in json. This PR replaces JSON with Avro bytes for all such commit metadata.

Impact

This is a storage format change. We need to make sure that older timeline with json metadata is still readable.

Risk level (write none, low medium or high below)

medium

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@codope codope force-pushed the HUDI-6776-commit-metadata-avro branch 2 times, most recently from fb84dcf to 8a550a0 Compare September 4, 2023 16:17
@codope codope marked this pull request as ready for review September 4, 2023 16:18
@codope
Copy link
Member Author

codope commented Sep 4, 2023

Complete removal of JSON HoodieCommitMetadata to be tackled in HUDI-6816.

@codope codope force-pushed the HUDI-6776-commit-metadata-avro branch 2 times, most recently from 93953ca to 5171318 Compare September 8, 2023 12:48
if (hoodieCommitMetadata instanceof HoodieReplaceCommitMetadata) {
return (T) convertReplaceCommitMetadata((HoodieReplaceCommitMetadata) hoodieCommitMetadata);
}
hoodieCommitMetadata.getPartitionToWriteStats().remove(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some doc why there is a null key in the map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure why this was there even previously. I just kept the same logic.

if (partitionToWriteStats.containsKey(null)) {
LOG.info("partition path is null for " + partitionToWriteStats.get(null));
partitionToWriteStats.remove(null);
}

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, overall looks good ~

@codope codope force-pushed the HUDI-6776-commit-metadata-avro branch from 3622194 to 731c9a5 Compare September 12, 2023 12:47
@apache apache deleted a comment from hudi-bot Sep 13, 2023
Fix avro to json conversion and some tests

Fix compilation issue

Cleanup timeline utils and fix replace commit metadata serialization

Fix import

Fix some test paths

Fix fsview tests

Address comments and fix commit metadata avro schema
This reverts commit 01a6dda.
@codope codope force-pushed the HUDI-6776-commit-metadata-avro branch from 01a6dda to 0a8df01 Compare September 14, 2023 15:10
* <p>This encoder is particularly useful when the standard Avro JSON format's verbosity
* for union types is not desired.
*/
public class JsonEncoder extends ParsingEncoder implements Parser.ActionHandler {
Copy link
Member Author

@codope codope Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the javadoc, we need this to avoid wrapping field type when encoding avro to json. Initially, I had tried subclassing this under org.apache.avro.io package in hudi-common. Bundle validation failed with that appraoch because of illegal access. The constructor in the original code is package-private. So, I had to port over the code with some minor modifications as mentioned in the javadoc of this class. I have already attributed this code in LICENSE file and updated the NOTICE file.

@apache apache deleted a comment from hudi-bot Sep 15, 2023
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit 123a546 into apache:master Sep 15, 2023
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants