-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6776] Replace JSON with Avro bytes for commit metadata #9579
Conversation
fb84dcf
to
8a550a0
Compare
Complete removal of JSON |
93953ca
to
5171318
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Show resolved
Hide resolved
hudi-cli/src/test/java/org/apache/hudi/cli/testutils/HoodieTestCommitMetadataGenerator.java
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/avro/io/NoWrappingJsonEncoder.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/MetadataConversionUtils.java
Show resolved
Hide resolved
if (hoodieCommitMetadata instanceof HoodieReplaceCommitMetadata) { | ||
return (T) convertReplaceCommitMetadata((HoodieReplaceCommitMetadata) hoodieCommitMetadata); | ||
} | ||
hoodieCommitMetadata.getPartitionToWriteStats().remove(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some doc why there is a null
key in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure why this was there even previously. I just kept the same logic.
hudi/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
Lines 226 to 229 in 1862f17
if (partitionToWriteStats.containsKey(null)) { | |
LOG.info("partition path is null for " + partitionToWriteStats.get(null)); | |
partitionToWriteStats.remove(null); | |
} |
hudi-common/src/test/java/org/apache/hudi/common/table/view/TestHoodieTableFileSystemView.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, overall looks good ~
3622194
to
731c9a5
Compare
Fix avro to json conversion and some tests Fix compilation issue Cleanup timeline utils and fix replace commit metadata serialization Fix import Fix some test paths Fix fsview tests Address comments and fix commit metadata avro schema
This reverts commit 01a6dda.
01a6dda
to
0a8df01
Compare
* <p>This encoder is particularly useful when the standard Avro JSON format's verbosity | ||
* for union types is not desired. | ||
*/ | ||
public class JsonEncoder extends ParsingEncoder implements Parser.ActionHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the javadoc, we need this to avoid wrapping field type when encoding avro to json. Initially, I had tried subclassing this under org.apache.avro.io
package in hudi-common. Bundle validation failed with that appraoch because of illegal access. The constructor in the original code is package-private. So, I had to port over the code with some minor modifications as mentioned in the javadoc of this class. I have already attributed this code in LICENSE
file and updated the NOTICE
file.
Change Logs
Today we write the commit metadata for most of the actions as avro byte arrays. But, for commit, deltacommit and replacecommits, it is in json. This PR replaces JSON with Avro bytes for all such commit metadata.
Impact
This is a storage format change. We need to make sure that older timeline with json metadata is still readable.
Risk level (write none, low medium or high below)
medium
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist