Skip to content

Comments

feat: Introduce inflight record index cache for bucket assigning#17802

Merged
danny0405 merged 2 commits intoapache:masterfrom
cshuo:add_rli_cache
Jan 9, 2026
Merged

feat: Introduce inflight record index cache for bucket assigning#17802
danny0405 merged 2 commits intoapache:masterfrom
cshuo:add_rli_cache

Conversation

@cshuo
Copy link
Collaborator

@cshuo cshuo commented Jan 8, 2026

Describe the issue this Pull Request addresses

Add access cache to support efficient lookup of RLI for flink writer, fixes #17809

Summary and Changelog

Add access cache to support efficient lookup of RLI for flink writer.

Impact

Basic component for supporting bucket assign based on RLI.

Risk Level

low.

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Jan 8, 2026
@cshuo cshuo force-pushed the add_rli_cache branch 3 times, most recently from 3f2f24a to cd6e1bb Compare January 9, 2026 01:29
@cshuo cshuo changed the title [WIP] feat: Add access cache to support efficient lookup of RLI for flink w… feat: Introduce inflight record index cache for bucket assigning Jan 9, 2026
new DefaultSizeEstimator<>(),
new DefaultSizeEstimator<>(),
writeConfig.getCommonConfig().getSpillableDiskMapType(),
new DefaultSerializer<>(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to fire a sub-task to introduce efficient serializer in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracked here #17815


@AdvancedConfig
public static final ConfigOption<Long> BUCKET_ASSIGN_INFLIGHT_INDEX_CACHE_SIZE = ConfigOptions
.key("write.bucket_assign.inflight.cache.size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.rli.cache.size ? this cache size is the total bytes for inflight and hotspot(may introuduced in the future).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, fixed.

@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 9, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 6ac6c6f into apache:master Jan 9, 2026
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce inflight record index cache for bucket assign operator

3 participants