Skip to content

Comments

feat: Support bucket assgin operator fetching inflight instants from coordinator#17885

Merged
danny0405 merged 2 commits intoapache:masterfrom
cshuo:support_bucket_assign_coordination
Jan 19, 2026
Merged

feat: Support bucket assgin operator fetching inflight instants from coordinator#17885
danny0405 merged 2 commits intoapache:masterfrom
cshuo:support_bucket_assign_coordination

Conversation

@cshuo
Copy link
Collaborator

@cshuo cshuo commented Jan 15, 2026

Describe the issue this Pull Request addresses

fixes #17700

Summary and Changelog

  • Introduce BucketAssignOperator to support set up Correspondent.
  • Set Correspondent into MiniBatchBucketAssignOperator.
  • Support cache clean for RecordLevelIndexBackend.

Impact

Support coordination between bucket assign operator and coordianator.

Risk Level

low.

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Jan 15, 2026
@cshuo cshuo force-pushed the support_bucket_assign_coordination branch from f2e9206 to 6d331a5 Compare January 15, 2026 13:27
@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Jan 15, 2026
@cshuo cshuo force-pushed the support_bucket_assign_coordination branch 2 times, most recently from bedb109 to 9ccec51 Compare January 16, 2026 03:12
@cshuo cshuo changed the title [WIP] feat: Support bucket assgin operator fetching inflight instants from coordinator feat: Support bucket assgin operator fetching inflight instants from coordinator Jan 16, 2026
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:XL PR with lines of changes > 1000 labels Jan 16, 2026
NavigableMap<Long, ExternalSpillableMap<String, HoodieRecordGlobalLocation>> subMap;
if (checkpointId == Long.MAX_VALUE) {
// clean all the cache entries for old checkpoint ids, and only keeps the cache for the maximum checkpoint id,
// which aims to clear memory while also ensuring a certain cache hit rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a memory threshold for the cache that can check against?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created an issue to track it. #17912

@cshuo cshuo force-pushed the support_bucket_assign_coordination branch 3 times, most recently from 6a6052c to 0f347cf Compare January 17, 2026 07:47
@cshuo cshuo force-pushed the support_bucket_assign_coordination branch from 0f347cf to aa3f99a Compare January 19, 2026 02:41
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 8f23fc2 into apache:master Jan 19, 2026
72 checks passed
alexr17 pushed a commit to alexr17/hudi that referenced this pull request Jan 30, 2026
alexr17 pushed a commit to alexr17/hudi that referenced this pull request Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add basic infra to bookeep the mappings between checkpoint id to instant

3 participants