Skip to content

refactor: define avro model for parsing commit metadata#477

Merged
xushiyan merged 4 commits intoapache:mainfrom
xushiyan:refactor-commit-metadata
Oct 29, 2025
Merged

refactor: define avro model for parsing commit metadata#477
xushiyan merged 4 commits intoapache:mainfrom
xushiyan:refactor-commit-metadata

Conversation

@xushiyan
Copy link
Member

Description

Define commit metadata structs for commit and replacecommit. And refactor parsing logic using the structs.

How are the changes test-covered

  • N/A
  • Automated tests (unit and/or integration tests)
  • Manual tests
    • Details are described below

@xushiyan xushiyan added this to the release-0.5.0 milestone Oct 28, 2025
@xushiyan xushiyan added the refactor Code refactoring without any functionality or behavior change label Oct 28, 2025
@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 85.52632% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.98%. Comparing base (4bfdee8) to head (e7d3dae).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/schema/resolver.rs 60.00% 6 Missing ⚠️
crates/core/src/file_group/builder.rs 77.27% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #477      +/-   ##
==========================================
+ Coverage   86.72%   86.98%   +0.25%     
==========================================
  Files          49       51       +2     
  Lines        2637     2643       +6     
==========================================
+ Hits         2287     2299      +12     
+ Misses        350      344       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xushiyan xushiyan requested a review from Copilot October 29, 2025 00:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors commit metadata parsing by introducing structured types (HoodieCommitMetadata and HoodieReplaceCommitMetadata) with automatic Avro schema derivation. This replaces manual JSON traversal with strongly-typed parsing using serde and apache-avro-derive.

  • Introduces HoodieCommitMetadata and HoodieReplaceCommitMetadata structs with Avro schema support
  • Refactors schema resolution and file group building to use structured types
  • Adds comprehensive test coverage for new metadata types

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/core/src/metadata/commit.rs New module defining HoodieCommitMetadata and HoodieWriteStat with Avro schema derivation and helper methods
crates/core/src/metadata/replace_commit.rs New module defining HoodieReplaceCommitMetadata for replace commit operations
crates/core/src/metadata/mod.rs Exports new commit metadata modules
crates/core/src/schema/resolver.rs Refactored to use HoodieCommitMetadata instead of manual JSON parsing
crates/core/src/file_group/builder.rs Refactored to use structured metadata types with improved error messages
crates/core/Cargo.toml Added apache-avro-derive dependency
Cargo.toml Enabled "derive" feature for apache-avro and added apache-avro-derive

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@xushiyan xushiyan requested a review from Copilot October 29, 2025 01:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@xushiyan xushiyan requested a review from Copilot October 29, 2025 03:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +164 to +165
// With the new implementation, this returns Ok with an empty HashSet
// because iter_write_stats() returns an empty iterator when partition_to_write_stats is None
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment incorrectly states the behavior is new. This appears to be documenting changed behavior in an existing test. Consider updating the comment to explain why this behavior is correct rather than just describing the implementation detail.

Suggested change
// With the new implementation, this returns Ok with an empty HashSet
// because iter_write_stats() returns an empty iterator when partition_to_write_stats is None
// When partition_to_write_stats is missing from the metadata,
// the correct behavior is to return Ok with an empty HashSet, since there are no file groups to build.

Copilot uses AI. Check for mistakes.
.clone();

let result = build_file_groups(&metadata);
// Serde will fail to parse this and return a deserialization error
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment explains implementation details but doesn't clarify why this test case is important. Consider explaining what invalid data scenario this test is validating.

Suggested change
// Serde will fail to parse this and return a deserialization error
// This test validates that providing a non-string value for the 'fileId' field (here, a number) results in a deserialization error, as 'fileId' is expected to be a string.

Copilot uses AI. Check for mistakes.
.clone();

let result = build_file_groups(&metadata);
// Serde will fail to parse this and return a deserialization error
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment explains implementation details but doesn't clarify why this test case is important. Consider explaining what invalid data scenario this test is validating.

Suggested change
// Serde will fail to parse this and return a deserialization error
// This test verifies that deserialization fails when the "path" field is not a string,
// ensuring that invalid data types in commit metadata are properly rejected.

Copilot uses AI. Check for mistakes.
.clone();

let result = build_replaced_file_groups(&metadata);
// Serde will fail to parse this
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment is incomplete and doesn't explain the test's purpose. Consider explaining what invalid data scenario this test is validating.

Suggested change
// Serde will fail to parse this
// This test verifies that if the file IDs array contains a non-string value (here, a number instead of a string),
// Serde will fail to parse the commit metadata and return the appropriate error.

Copilot uses AI. Check for mistakes.
.clone();

let result = build_replaced_file_groups(&metadata);
// Serde will fail to parse this
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment is incomplete and doesn't explain the test's purpose. Consider explaining what invalid data scenario this test is validating.

Suggested change
// Serde will fail to parse this
// This tests the scenario where the array of file IDs contains a null value,
// which is invalid. Serde should fail to parse this as a valid file ID.

Copilot uses AI. Check for mistakes.
Copy link

@zhangyue19921010 zhangyue19921010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ~

@xushiyan xushiyan merged commit f27feb8 into apache:main Oct 29, 2025
19 checks passed
@xushiyan xushiyan deleted the refactor-commit-metadata branch October 29, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Code refactoring without any functionality or behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants