[PIP-30] Improve Paimon committer in Flink by fishfishfishfishaa · Pull Request #7963 · apache/paimon

fishfishfishfishaa · 2026-05-25T17:28:42Z

Purpose

Following PIP-30 (Improvement For Paimon Committer In Flink) , this PR implements the Paimon Write Coordinator (PWC) to replace the current CommitOperator with a JobManager-level OperatorCoordinator, eliminating the network shuffle bottleneck for commit messages.

Key Design Decision: Custom HDFS State

Instead of using Flink's native StateBackend, I chose custom HDFS state management because:

Timing mismatch: Flink's operator state snapshot occurs after snapshotState() returns, but PWC needs to persist aggregated messages before acknowledging WriteOperators.
Explicit recovery control: Enables clean resetToCheckpoint() logic with idempotent commit.
Decoupled cleanup: HDFS state deletion independent of Flink checkpoint retention policies.

State path: <flink-checkpoint-dir>/pwc/<operatorId>/checkpoint-{ckId}.state

Current Scope

Supported: FixedBucketSink

Testing

Added unit tests for recovery logic

Related Issue

fix #2641

JingsongLi

This is a significant architectural change (PIP-30). A few high-level concerns:

Design Questions:

Custom HDFS state vs Flink StateBackend: I understand the timing mismatch rationale, but custom state management introduces its own complexity:
- Who cleans up stale state files when jobs are cancelled/fail without proper cleanup?
- What happens if the HDFS path is not accessible (permissions, HA failover)?
- How does this interact with Flink's checkpoint retention policies?
State path collision: The path <flink-checkpoint-dir>/pwc/<operatorId>/checkpoint-{ckId}.state lives inside Flink's checkpoint directory but is managed independently. Could Flink's checkpoint cleanup accidentally delete these files? Or could stale PWC state files accumulate unboundedly?
Scope limitation: Currently only supports FixedBucketSink. What's the plan for UnawareBucketSink and DynamicBucketSink? The design should be extensible to these without requiring a second large refactoring.

Code Comments:

DiscardingSink for committed stream: In FlinkSink.doCommit(), when coordinator is enabled, the committed stream is discarded via written.sinkTo(new DiscardingSink<>()).name("end").setParallelism(1). Why parallelism 1? If the write operator has high parallelism, this creates a bottleneck for the stream before discarding. Can you just set it to the same parallelism as writers?
Error handling in coordinator: What happens when the coordinator fails to commit (e.g., conflict with another writer)? How is this surfaced to the Flink job? Does it trigger a failover?
Testing: The current tests cover recovery logic, but I'd like to see:
- An end-to-end IT test that validates data correctness after checkpoint/restore
- A test for concurrent writes (multiple subtasks committing in the same checkpoint)
- A test for the cancelled job → stale state scenario

This is promising work but given its complexity, it might benefit from a more detailed design doc on the wiki for community discussion before merging.

JingsongLi

If this feature is strongly bound to HDFS, then there must be a problem, and we definitely need to support object storage.

fishfishfishfishaa and others added 5 commits May 26, 2026 00:48

[PIP-30] Improve Paimon committer in Flink

c45de0b

Merge branch 'master' into yg-pip3-2026

fad7b28

fix conflict

158eeee

fix conflict

0ba4578

fix ut

3a80122

JingsongLi reviewed May 27, 2026

View reviewed changes

docs: regenerate config docs for new options

9b6d521

JingsongLi requested changes May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PIP-30] Improve Paimon committer in Flink#7963

[PIP-30] Improve Paimon committer in Flink#7963
fishfishfishfishaa wants to merge 6 commits into
apache:masterfrom
fishfishfishfishaa:yg-pip3-2026

fishfishfishfishaa commented May 25, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

JingsongLi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fishfishfishfishaa commented May 25, 2026

Purpose

Key Design Decision: Custom HDFS State

Current Scope

Testing

Related Issue

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants