Skip to content

Flink: Add equality delete conversion committer#16874

Merged
huaxingao merged 4 commits into
apache:mainfrom
mxm:resolve-equality-deletes.breakdown4
Jun 20, 2026
Merged

Flink: Add equality delete conversion committer#16874
huaxingao merged 4 commits into
apache:mainfrom
mxm:resolve-equality-deletes.breakdown4

Conversation

@mxm

@mxm mxm commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Adds EqualityConvertCommitter, a two input operator for the equality delete conversion task. It buffers the DVWriteResults from the parallel EqualityConvertDVWriter instances and the EqualityConvertPlan from the planner, then commits once both the plan and its done-timestamp watermark have arrived.

The committer adds both new staging data files and the writer's merged DVs, removes the superseded DVs, and carries over the remaining staging deletes. It validates against the planner's main snapshot, so external changes on the target branch fail the commit. The commit summary records the processed staging snapshot id, which the planner reads on restart to skip already-committed snapshots, to ensure idempotency.

The committer is responsible for sending Trigger records downstream to the TaskResultAggregator of the surrounding table maintenance framework. It emits one after every cycle, including no-op and error, so the task always completes. On an upstream abort or a commit failure it deletes the DVs written this cycle so the failure doesn't leak Puffin files.

This commit is factored out of #15996.

mxm added 2 commits June 19, 2026 07:20
Adds EqualityConvertCommitter, a two input operator for the equality delete
conversion task. It buffers the DVWriteResults from the parallel
EqualityConvertDVWriter instances and the EqualityConvertPlan from the planner,
then commits once both the plan and its done-timestamp watermark have
arrived.

The committer adds both new staging data files and the writer's merged DVs,
removes the superseded DVs, and carries over the remaining staging deletes. It
validates against the planner's main snapshot, so external changes on the target
branch fail the commit. The commit summary records the processed staging
snapshot id, which the planner reads on restart to skip already-committed
snapshots, to ensure idempotency.

The committer is responsible for sending Trigger records downstream to the
TaskResultAggregator of the surrounding table maintenance framework. It emits
one after every cycle, including no-op and error, so the task always completes.
On an upstream abort or a commit failure it deletes the DVs written this cycle
so the failure doesn't leak Puffin files.
@github-actions github-actions Bot added the flink label Jun 19, 2026

@huaxingao huaxingao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huaxingao huaxingao merged commit 8d1f973 into apache:main Jun 20, 2026
29 checks passed
@huaxingao

Copy link
Copy Markdown
Contributor

Thanks @mxm for the PR! Thanks @wombatu-kun for the review!

@nssalian nssalian added this to the Iceberg 1.12.0 milestone Jun 23, 2026
huaxingao pushed a commit to huaxingao/iceberg that referenced this pull request Jun 27, 2026
…OSS path)

Adds a runnable proof that the ConvertEqualityDeletes Flink maintenance
task converts equality deletes on a staging branch into deletion vectors
(DVs) on main, leaving no equality deletes behind, on stock OSS Apache
Iceberg.

- flink/v2.1 PoC test (TestEqualityDeleteToDVPoc): asserts before/after
  state explicitly (staging has the eq-delete; after conversion main has
  a DV, zero eq-deletes, and reads the delete-applied rows).
- poc-eq-delete-to-dv/: README, INVOCATION.md (jar build + task
  invocation), CUSTOMER_FLINK_SETUP.md (customer consumption).

Verified passing against apache/iceberg main with all six
ConvertEqualityDeletes PRs merged (apache#16831, apache#16844, apache#16858, apache#16874,
apache#16889, apache#16948).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
huaxingao pushed a commit to huaxingao/iceberg that referenced this pull request Jun 27, 2026
…OSS path)

Adds a runnable proof that the ConvertEqualityDeletes Flink maintenance
task converts equality deletes on a staging branch into deletion vectors
(DVs) on main, leaving no equality deletes behind, on stock OSS Apache
Iceberg.

- flink/v2.1 PoC test (TestEqualityDeleteToDVPoc): asserts before/after
  state explicitly (staging has the eq-delete; after conversion main has
  a DV, zero eq-deletes, and reads the delete-applied rows).
- poc-eq-delete-to-dv/: README, INVOCATION.md (jar build + task
  invocation), CUSTOMER_FLINK_SETUP.md (customer consumption).

Verified passing against apache/iceberg main with all six
ConvertEqualityDeletes PRs merged (apache#16831, apache#16844, apache#16858, apache#16874,
apache#16889, apache#16948).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
jakelong95 pushed a commit to jakelong95/iceberg that referenced this pull request Jun 29, 2026
* Flink: Add equality delete conversion committer

Adds EqualityConvertCommitter, a two input operator for the equality delete
conversion task. It buffers the DVWriteResults from the parallel
EqualityConvertDVWriter instances and the EqualityConvertPlan from the planner,
then commits once both the plan and its done-timestamp watermark have
arrived.

The committer adds both new staging data files and the writer's merged DVs,
removes the superseded DVs, and carries over the remaining staging deletes. It
validates against the planner's main snapshot, so external changes on the target
branch fail the commit. The commit summary records the processed staging
snapshot id, which the planner reads on restart to skip already-committed
snapshots, to ensure idempotency.

The committer is responsible for sending Trigger records downstream to the
TaskResultAggregator of the surrounding table maintenance framework. It emits
one after every cycle, including no-op and error, so the task always completes.
On an upstream abort or a commit failure it deletes the DVs written this cycle
so the failure doesn't leak Puffin files.

* fixup! Rename stale DV resolver/merger references to DV writer

* fixup! Reword inaccurate watermark-absorption javadoc on the committer

* fixup! Add shared-branch DV-merge regression test for the committer
jakelong95 pushed a commit to jakelong95/iceberg that referenced this pull request Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants