feat(common): per-partition event-time rollup; decouple watermark tracking from EVENT_TIME_ORDERING#18778
Conversation
…tracking from EVENT_TIME_ORDERING Expose the per-partition event-time rollup that is already latent on disk and stop gating watermark tracking on EVENT_TIME_ORDERING so freshness observability works for COW / COMMIT_TIME_ORDERING tables. Changes: - HoodieCommitMetadata.getMinAndMaxEventTimePerPartition(): pure aggregation over partitionToWriteStats returning Map<String, Pair<Option<Long>, Option<Long>>>. No persisted bytes, no avro schema change. Partitions whose stats carry no event time are omitted. - HoodieWriteHandle: drop the recordMergeMode == EVENT_TIME_ORDERING check from isTrackingEventTimeWatermark. Tracking now activates whenever the event-time field is configured and hoodie.write.track.event.time.watermark is true, independent of merge mode. - Tests: 5 new TestHoodieCommitMetadata cases for the rollup API; update TestHoodieWriteHandle to assert the new merge-mode-independent behavior and add a missing-event-time-field negative case. Part of apache#17512 (Phase 1 of the reconcile plan). No behavior change for tables that have not opted into hoodie.write.track.event.time.watermark.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18778 +/- ##
============================================
+ Coverage 68.19% 68.20% +0.01%
- Complexity 29225 29252 +27
============================================
Files 2525 2525
Lines 141660 141677 +17
Branches 17591 17594 +3
============================================
+ Hits 96607 96635 +28
+ Misses 37097 37094 -3
+ Partials 7956 7948 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR adds a per-partition event-time rollup accessor on HoodieCommitMetadata and removes the EVENT_TIME_ORDERING gate from watermark tracking so COW / COMMIT_TIME_ORDERING tables can opt in. The aggregation mirrors the semantics of the existing global getter (including handling of partial min/max), is purely additive (no schema or persisted-byte change), and is well covered by the new unit tests. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of small suggestions below — mainly around extracting the duplicated fold logic into a shared helper, and tightening the Javadoc.
cc @yihua
| */ | ||
| public Map<String, Pair<Option<Long>, Option<Long>>> getMinAndMaxEventTimePerPartition() { | ||
| Map<String, Pair<Option<Long>, Option<Long>>> result = new HashMap<>(); | ||
| for (Map.Entry<String, List<HoodieWriteStat>> entry : partitionToWriteStats.entrySet()) { |
There was a problem hiding this comment.
🤖 nit: the sentinel-value fold (null-check → Math.min/Math.max → ternary back to Option) is essentially the same inner loop as in getMinAndMaxEventTime(). Could you extract a private foldEventTimes(List<HoodieWriteStat>) helper that both methods share? That way any future change to null-handling or sentinel semantics only needs to land in one place.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| * over its write stats, mirroring the semantics of {@link #getMinAndMaxEventTime()}. | ||
| * | ||
| * <p>This is a pure aggregation over {@code partitionToWriteStats} — it adds no persisted | ||
| * bytes and does not change the commit avro schema. |
There was a problem hiding this comment.
🤖 nit: "does not change the commit avro schema" reads more like a PR justification than an API contract — a future reader may wonder why a read-only aggregation method needs to promise schema stability, and the note could become misleading if per-partition event times are ever persisted. Have you considered dropping this sentence (or moving it to a code comment inside the method instead)?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
…tracking from EVENT_TIME_ORDERING (apache#18778) Expose the per-partition event-time rollup that is already latent on disk and stop gating watermark tracking on EVENT_TIME_ORDERING so freshness observability works for COW / COMMIT_TIME_ORDERING tables. Changes: - HoodieCommitMetadata.getMinAndMaxEventTimePerPartition(): pure aggregation over partitionToWriteStats returning Map<String, Pair<Option<Long>, Option<Long>>>. No persisted bytes, no avro schema change. Partitions whose stats carry no event time are omitted. - HoodieWriteHandle: drop the recordMergeMode == EVENT_TIME_ORDERING check from isTrackingEventTimeWatermark. Tracking now activates whenever the event-time field is configured and hoodie.write.track.event.time.watermark is true, independent of merge mode. - Tests: 5 new TestHoodieCommitMetadata cases for the rollup API; update TestHoodieWriteHandle to assert the new merge-mode-independent behavior and add a missing-event-time-field negative case. Part of apache#17512 (Phase 1 of the reconcile plan). No behavior change for tables that have not opted into hoodie.write.track.event.time.watermark. Co-authored-by: Xinli Shang <shangxinli@apache.org>
Describe the issue this Pull Request addresses
Part of the freshness-tracking work discussed in #17512. This PR implements Phase 1 of the reconcile plan: expose the per-partition event-time rollup that is already latent on disk, and stop gating watermark tracking on
EVENT_TIME_ORDERINGso freshness observability works for COW /COMMIT_TIME_ORDERINGtables too.This is purely additive — no commit-metadata key added, no avro schema change, no behavior change for tables that have not opted into
hoodie.write.track.event.time.watermark.Summary and Changelog
Today
WriteStatus.markSuccess()already folds min/max event time into eachHoodieWriteStat(and the avro schema already serializes them per stat alongsidepartitionPath). But the only public accessor onHoodieCommitMetadataisgetMinAndMaxEventTime(), which collapses every partition into a single pair — consumers asking "how fresh is partition dt=2026-05-19?" have to walkpartitionToWriteStatsthemselves.Watermark tracking is also currently gated on
recordMergeMode == EVENT_TIME_ORDERING, even though freshness observability is independent of merge semantics. The result is that COW tables withCOMMIT_TIME_ORDERINGsilently get no watermark even when the user explicitly opts in.This PR:
HoodieCommitMetadata.getMinAndMaxEventTimePerPartition()— a pure aggregation overpartitionToWriteStatsthat returnsMap<String, Pair<Option<Long>, Option<Long>>>. Partitions whose stats carry no event time at all are omitted (so the map size reflects partitions with freshness data, not total partitions written). Min/max within a partition are folded withMath.min/Math.max, mirroring the semantics of the existing global getter. No persisted bytes, no avro change.EVENT_TIME_ORDERINGinHoodieWriteHandle. Tracking now activates wheneventTimeFieldName != null && hoodie.write.track.event.time.watermark=true, regardless of merge mode. The unusedEVENT_TIME_ORDERINGstatic import is removed.TestHoodieCommitMetadata(folding across stats within a partition, omitting partitions without event time, handling partial min/max, empty metadata, and a consistency check against the global getter); updates the existingtestShouldTrackEventTimeWaterMarkerAvroRecordTypeWithCommitTimeOrderingto assert the new behavior (now tracks) and adds a negative test for the missing-event-time-field case.Full hudi-common (1897 tests) and hudi-client-common (1026 tests) suites pass locally.
Impact
Public-API addition on
HoodieCommitMetadata: external tools (catalogs, freshness exporters, lineage UIs) can now read per-partition freshness directly without walking write stats.Behavior change for opted-in tables: COW /
COMMIT_TIME_ORDERINGtables withhoodie.write.track.event.time.watermark=trueand an event-time field will now populate min/max on write stats; previously they were silently no-op. Tables that have not set the flag see no change.No performance impact — the rollup is a pure in-memory aggregation that callers invoke on demand; watermark extraction at write time was already gated on the same per-record path.
Risk Level
low
The new method is additive. The behavior change is conditional on a config that is
falseby default and gated on an event-time field name; tables not using the flag are unaffected. Verified by running the fullhudi-commonandhudi-client-commontest suites locally with no regressions.Documentation Update
The
hoodie.write.track.event.time.watermarkconfig description should be updated on the Hudi website to reflect that it no longer requiresEVENT_TIME_ORDERING. The newgetMinAndMaxEventTimePerPartition()API is internally documented via Javadoc; a website page covering per-partition freshness consumption can land alongside Phase 2 (upstream propagation) so users see the end-to-end story in one place.Contributor's checklist