Skip to content

Comments

feat: add flink continuous split enumerator#17562

Merged
danny0405 merged 4 commits intoapache:masterfrom
HuangZhenQiu:flink-continuous-enumerator
Dec 18, 2025
Merged

feat: add flink continuous split enumerator#17562
danny0405 merged 4 commits intoapache:masterfrom
HuangZhenQiu:flink-continuous-enumerator

Conversation

@HuangZhenQiu
Copy link
Collaborator

Describe the issue this Pull Request addresses

Add HoodieContinuousSplitEnumerator for reading Hudi table incrementally

Summary and Changelog

Add HoodieContinuousSplitEnumerator, DefaultHoodieContinuousSplitDiscover and HoodieContinuousSplitBatch classes

Impact

none

Risk Level

none

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Dec 11, 2025
@xushiyan xushiyan marked this pull request as draft December 11, 2025 19:51
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch from 138fdb4 to f890ce8 Compare December 12, 2025 03:30
@danny0405 danny0405 self-assigned this Dec 12, 2025
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch from f890ce8 to acaeeca Compare December 12, 2025 05:08
@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Dec 12, 2025
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch from acaeeca to 6ea326b Compare December 12, 2025 06:54
@HuangZhenQiu HuangZhenQiu changed the title feat: add flink continuous split enumerator (WIP) feat: add flink continuous split enumerator Dec 12, 2025
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch 2 times, most recently from cbeb1d0 to 4c483e4 Compare December 12, 2025 17:12
Copy link
Collaborator Author

@HuangZhenQiu HuangZhenQiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xushiyan Please review the PR at your most convenient time.

@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch 2 times, most recently from 0d4abb9 to b338b58 Compare December 12, 2025 20:59
@xushiyan xushiyan marked this pull request as ready for review December 13, 2025 00:49
Copilot AI review requested due to automatic review settings December 13, 2025 00:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for continuous/incremental reading of Hudi tables in Flink by introducing a new HoodieContinuousSplitEnumerator that can discover new splits from the Hudi timeline as commits occur.

Key Changes:

  • Introduces continuous split enumeration infrastructure for streaming incremental reads
  • Adds position tracking to maintain enumeration state across checkpoints
  • Provides a configurable scan context to encapsulate scanning parameters

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
HoodieContinuousSplitEnumerator.java Main enumerator implementation that periodically discovers new splits from Hudi commits
HoodieContinuousSplitBatch.java Data class representing a batch of splits with instant range information
HoodieContinuousSplitDiscover.java Interface for split discovery operations
DefaultHoodieSplitDiscover.java Default implementation that delegates to IncrementalInputSplits
HoodieEnumeratorPosition.java Tracks the last enumerated instant for incremental discovery
ScanContext.java Encapsulates scan configuration including paths, instants, and skip options
HoodieSourceSplit.java Adds static SPLIT_COUNTER for generating unique split IDs
HoodieSplitEnumeratorState.java Extended to include last enumerated instant tracking
IncrementalInputSplits.java Adds inputHoodieSourceSplits method to convert splits to HoodieSourceSplit type
AbstractHoodieSplitEnumerator.java Updated to pass null values for new state parameters
TestHoodieContinuousSplitBatch.java Comprehensive tests for HoodieContinuousSplitBatch
TestDefaultHoodieSplitDiscover.java Tests for split discovery implementation
TestHoodieContinuousSplitEnumerator.java Tests for continuous enumerator with mocks
TestScanContext.java Tests for ScanContext builder and accessors
TestIncrementalInputSplits.java Tests for the new inputHoodieSourceSplits method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@cshuo cshuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thks for the contribution, left some comments.

@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch from b338b58 to 01c2a2e Compare December 15, 2025 05:06
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch 2 times, most recently from d17d9f0 to 29df33c Compare December 15, 2025 19:17
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch 3 times, most recently from f685a23 to 493abcb Compare December 15, 2025 21:25
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch from 493abcb to e474e99 Compare December 16, 2025 04:16
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch from 6bbc20f to 7a74e37 Compare December 17, 2025 04:33
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch from 7a74e37 to 658201a Compare December 17, 2025 04:53
@HuangZhenQiu HuangZhenQiu force-pushed the flink-continuous-enumerator branch from 658201a to 46992bc Compare December 17, 2025 07:15
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

public static HoodieContinuousSplitBatch fromResult(IncrementalInputSplits.Result result) {
List<HoodieSourceSplit> splits = result.getInputSplits().stream().map(split ->
new HoodieSourceSplit(
HoodieSourceSplit.SPLIT_COUNTER.incrementAndGet(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HoodieSourceSplit should not hold SPLIT_COUNTER for itself. better managed by the caller who creates the splits. it could be init'ed here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current toString() is a human-readable form of info to describe the object, not suitable to be used as an id. also this should be able to uniquely identify the split reading the same set of records? it should at least contain file id, which is not present currently

@danny0405 danny0405 merged commit 7357fbb into apache:master Dec 18, 2025
70 checks passed
@xushiyan xushiyan linked an issue Dec 19, 2025 that may be closed by this pull request
@HuangZhenQiu
Copy link
Collaborator Author

The PR also resolved #14417

@xushiyan xushiyan linked an issue Jan 15, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Hudi Split Assigner

5 participants