Skip to content

feat(flink): extend Flink quickstart example to use source v2#18518

Merged
danny0405 merged 4 commits into
apache:masterfrom
HuangZhenQiu:hudi-source-v2-example
Apr 29, 2026
Merged

feat(flink): extend Flink quickstart example to use source v2#18518
danny0405 merged 4 commits into
apache:masterfrom
HuangZhenQiu:hudi-source-v2-example

Conversation

@HuangZhenQiu
Copy link
Copy Markdown
Member

@HuangZhenQiu HuangZhenQiu commented Apr 17, 2026

Describe the issue this Pull Request addresses

Extend Flink quick start example to explicitly use Flink Hudi Source V2

Closes #14428

Summary and Changelog

  1. Add support of configurable source v2 for HoodieFlinkQuickstart
  2. Extend TestHoodieFlinkQuickstart to cover source v2 scenarios

Impact

none

Risk Level

none

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Style & Readability Review — a few small readability suggestions below.

@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Apr 17, 2026
Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Style & Readability Review — a few small readability suggestions: a flipped assertEquals argument order and magic number in the test, a redundant inline comment that duplicates the Javadoc, and a bare // ignored on a caught exception that could use a brief rationale.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for contributing! The change is a straightforward extension of the quickstart to demonstrate Source V2. One small concern around silently swallowing ExecutionException from tableResult.await() — worth a quick look.

Copy link
Copy Markdown
Member Author

@HuangZhenQiu HuangZhenQiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved AI comments.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the follow-up — the cosmetic nits (removed duplicate inline comment, renamed tableresolvedTable, flipped assertEquals args) all look good. One thing to note: my prior question about silently swallowing ExecutionException from tableResult.await() in execBatchSelectSql wasn't addressed in this round and I didn't see a reply explaining why — would still be worth at least logging the cause so job failures aren't masked.

@HuangZhenQiu HuangZhenQiu force-pushed the hudi-source-v2-example branch from 3a65c89 to 3da4553 Compare April 18, 2026 14:04
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR extends the Flink quickstart example to demonstrate the FLIP-27 Source V2 reader for both streaming (continuous) and batch (static) modes, and expands the IT test matrix to cover the new path. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.

Style & Readability — a couple of small readability nits below — one on the silent exception catch and one on the builder call pattern.

cc @yihua

Comment thread hudi-examples/hudi-examples-flink/pom.xml
@HuangZhenQiu HuangZhenQiu force-pushed the hudi-source-v2-example branch from 3da4553 to d716218 Compare April 27, 2026 00:11
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for working on this! The PR extends the Flink quickstart example to demonstrate Source V2 usage via a new useSourceV2 arg and adds a bounded batch query path, with tests parameterized across COW/MOR × V1/V2. No critical correctness issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review. A few small naming and clarity suggestions below.

cc @yihua

List<Row> batchRows = flinkQuickstart.queryBatchData();

// full table scan
assertEquals(8, batchRows.size());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the magic 8 here will silently pass even if the dataset changes size — could you use TestQuickstartData.DATA_SET_SOURCE_INSERT_LATEST_COMMIT.size() (or wherever the expected row count lives) so the assertion stays in sync with the rest of the test?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@github-actions github-actions Bot added size:S PR with lines of changes in (10, 100] and removed size:M PR with lines of changes in (100, 300] labels Apr 27, 2026
@HuangZhenQiu HuangZhenQiu force-pushed the hudi-source-v2-example branch from d716218 to f8b72a7 Compare April 27, 2026 05:44
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR extends the Flink quickstart example to demonstrate Source V2 and adds a null-safety fallback for the split enumerator's metric group on Flink 1.17. No critical issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review. A couple of small naming/readability nits below.

cc @yihua

.setUnassignedSplitsGauge(() -> Long.valueOf(splitProvider.pendingSplitCount()));
this.enumeratorMetrics = new FlinkStreamReadMetrics(enumeratorContext.metricGroup(), tableName);
enumeratorMetrics.registerMetrics();
if (this.enumeratorContext.metricGroup() != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: enumeratorContext.metricGroup() is called three times across lines 75–79 (null check, gauge registration, and metrics construction). Could you cache it in a local — e.g. SplitEnumeratorMetricGroup metricGroup = enumeratorContext.metricGroup(); — before the if, so all three uses reference the same local?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

execConf.setString("restart-strategy", "fixed-delay");
execConf.setString("restart-strategy.fixed-delay.attempts", "0");
this.streamTableEnv = streamTableEnv;
this.tableEnvironment = streamTableEnv;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the local variable streamTableEnv (declared a few lines above) wasn't renamed when the field was renamed to tableEnvironment, so the two names sit side-by-side here. Could you rename the local to tableEnvironment (or tableEnv) as well for consistency?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@github-actions github-actions Bot added size:M PR with lines of changes in (100, 300] and removed size:S PR with lines of changes in (10, 100] labels Apr 27, 2026
this.enumeratorMetrics = new FlinkStreamReadMetrics(this.enumeratorContext.metricGroup(), tableName);
} else {
// The metrics group returned from enumeratorContext is null in Flink 1.17.
this.enumeratorMetrics = new FlinkStreamReadMetrics(UnregisteredMetricsGroup.createSplitEnumeratorMetricGroup(), tableName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an known bug?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is UnregisteredMetricsGroup the right way to fix the issue?

Copy link
Copy Markdown
Member Author

@HuangZhenQiu HuangZhenQiu Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, UnregisteredMetricsGroup is usually used in test only. Shall we leave it as null?

Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR extends the Flink quickstart to demonstrate the FLIP-27 Source V2 reader, parameterizes the existing IT to cover all four COW/MOR × V1/V2 combinations, and adds null-metric-group handling required by Flink 1.17. No new issues flagged from this automated pass beyond what prior rounds and other reviewers already raised — a Hudi committer or PMC member can take it from here for a final review. A couple of small naming suggestions below — overall the changes read cleanly.

cc @yihua

private EnvironmentSettings settings = null;
@Getter
private TableEnvironment streamTableEnv = null;
private TableEnvironment tableEnvironment = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the rename from streamTableEnv to tableEnvironment loses the streaming-specific semantics — this field is the streaming counterpart to getBatchTableEnv(). Keeping it as streamTableEnv (or naming it streamingTableEnv) makes the distinction clearer at call sites.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

execConf.setString("restart-strategy", "fixed-delay");
execConf.setString("restart-strategy.fixed-delay.attempts", "0");
this.streamTableEnv = streamTableEnv;
this.tableEnvironment = streamTableEnv;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the local variable here is still named streamTableEnv while the field was renamed to tableEnvironment — having two different names for the same value in a small method is confusing. Worth aligning them.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

} else {
// The metrics group returned from enumeratorContext is null in Flink 1.17.
enumeratorMetrics = null;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: inconsistent use of this. between the two branches (this.enumeratorMetrics = ... in the if-branch vs. enumeratorMetrics = null in the else-branch). Pick one for consistency in this constructor.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@danny0405
Copy link
Copy Markdown
Contributor

There are test failures:

TestHoodieStaticSplitEnumerator.testConstructorWithNullMetricGroup:238

@HuangZhenQiu HuangZhenQiu force-pushed the hudi-source-v2-example branch from fb8a874 to 01d8944 Compare April 29, 2026 03:30
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR extends the Flink quickstart to optionally route reads through the FLIP-27 Source V2 path and adds a defensive null-check around the enumerator metric group to handle the Flink 1.17 case where metricGroup() returns null. No new issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review. One small readability note in the enumerator constructor; the rest of the changes are clean.

cc @yihua

.setUnassignedSplitsGauge(() -> Long.valueOf(splitProvider.pendingSplitCount()));
this.enumeratorMetrics = new FlinkStreamReadMetrics(enumeratorContext.metricGroup(), tableName);
enumeratorMetrics.registerMetrics();
if (this.enumeratorContext.metricGroup() != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: enumeratorContext.metricGroup() is called three times (null-check, gauge registration, and metric-group constructor). Could you assign it to a local variable once — SplitEnumeratorMetricGroup metricGroup = enumeratorContext.metricGroup(); — to make the null guard and the two usages below read more clearly?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

splitProvider.onDiscoveredSplits(result.getSplits());
position.get().getIssuedInstant().ifPresent(enumeratorMetrics::setIssuedInstant);
if (enumeratorMetrics != null) {
position.get().getIssuedInstant().ifPresent(enumeratorMetrics::setIssuedInstant);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean for flink release under 1.8, this metrics would be unsupported?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. As we can't get the metrics group from the SplitEnumeratorContext. Any better suggestions?

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 33.33333% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.06%. Comparing base (29f9c40) to head (01d8944).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
...udi/examples/quickstart/HoodieFlinkQuickstart.java 0.00% 14 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18518      +/-   ##
============================================
- Coverage     68.90%   68.06%   -0.84%     
- Complexity    28581    28909     +328     
============================================
  Files          2482     2518      +36     
  Lines        137053   140572    +3519     
  Branches      16713    17422     +709     
============================================
+ Hits          94436    95681    +1245     
- Misses        35009    37035    +2026     
- Partials       7608     7856     +248     
Flag Coverage Δ
common-and-other-modules 44.37% <33.33%> (-0.02%) ⬇️
hadoop-mr-java-client 44.83% <ø> (+<0.01%) ⬆️
spark-client-hadoop-common 48.43% <ø> (-0.03%) ⬇️
spark-java-tests 48.65% <ø> (-0.86%) ⬇️
spark-scala-tests 44.70% <ø> (-0.53%) ⬇️
utilities 37.70% <ø> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...urce/enumerator/AbstractHoodieSplitEnumerator.java 76.62% <100.00%> (+6.49%) ⬆️
...ce/enumerator/HoodieContinuousSplitEnumerator.java 100.00% <100.00%> (ø)
...udi/examples/quickstart/HoodieFlinkQuickstart.java 0.00% <0.00%> (ø)

... and 90 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit fd63851 into apache:master Apr 29, 2026
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Hudi Source Flink Example

7 participants