Skip to content

feat(hudi-sync): Publish HUDI version to Hive metastore (allowing users to infer which HUDI client jar to use for a given dataset)#18307

Merged
nsivabalan merged 1 commit intoapache:masterfrom
kbuci:writer-hms-publish
Mar 13, 2026
Merged

Conversation

@kbuci
Copy link
Contributor

@kbuci kbuci commented Mar 11, 2026

Describe the issue this Pull Request addresses

During hive sync, the Hudi writer version is not published to the Hive Metastore (HMS) table properties. This makes it difficult for downstream consumers and platform tooling to determine which version of the Hudi writer library produced the data for a given table.

Publishing this version info allows users to infer which HUDI jar versions to use when writing to the dataset. This is helpful for cases when a user is performing a rolling HUDI verison upgrade of all their datasets, and has a table service platform for invoking table services (that needs to infer which HUDI jar to use before running a table service against a dataset)

#17954

Summary and Changelog

Publish the Hudi writer version as a table property (hudi_writer_version) in HMS during hive sync.

  • Add updateHoodieWriterVersion(String tableName) default method to HoodieMetaSyncOperations interface
  • Implement updateHoodieWriterVersion in HoodieHiveSyncClient, which reads the current Hudi version via HoodieVersion.get() and sets it as a table parameter in HMS
  • Call updateHoodieWriterVersion in HiveSyncTool.syncHoodieTable after updating the last commit time synced
  • Update unit test assertions in TestHiveSyncTool to validate the new table property is present

Impact

A new table property hudi_writer_version will be set on HMS tables during every hive sync. This is a metadata-only change with no impact on the storage format or read/write path. Existing tables will get the property populated on their next sync.

Risk Level

low — The change only adds a single HMS alter_table call per sync to set a table-level property. No existing behavior is modified. If the call fails, it throws a clear exception consistent with existing error handling in the sync client.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Mar 11, 2026
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.27%. Comparing base (39f1f39) to head (53a1d24).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ava/org/apache/hudi/hive/HoodieHiveSyncClient.java 0.00% 8 Missing ⚠️
...c/main/java/org/apache/hudi/hive/HiveSyncTool.java 0.00% 1 Missing ⚠️
...che/hudi/sync/common/HoodieMetaSyncOperations.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18307   +/-   ##
=========================================
  Coverage     57.27%   57.27%           
- Complexity    18639    18651   +12     
=========================================
  Files          1956     1956           
  Lines        107069   107086   +17     
  Branches      13255    13255           
=========================================
+ Hits          61324    61336   +12     
- Misses        39939    39945    +6     
+ Partials       5806     5805    -1     
Flag Coverage Δ
hadoop-mr-java-client 45.22% <ø> (+<0.01%) ⬆️
spark-java-tests 47.47% <0.00%> (+0.01%) ⬆️
spark-scala-tests 45.56% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...c/main/java/org/apache/hudi/hive/HiveSyncTool.java 0.00% <0.00%> (ø)
...che/hudi/sync/common/HoodieMetaSyncOperations.java 0.00% <0.00%> (ø)
...ava/org/apache/hudi/hive/HoodieHiveSyncClient.java 0.00% <0.00%> (ø)

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit a16d431 into apache:master Mar 13, 2026
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants