Skip to content

[Bug][Iceberg] IcebergCommitCallback emits "overwrite" for compaction commits; should emit "replace" per Iceberg spec #7683

@asafsneh

Description

@asafsneh

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

1.4

Compute Engine

Flink (Paimon Sink). Affects any engine using Paimon's IcebergCommitCallback (StarRocks in our case)

Minimal reproduce step

  1. Create an append-only Paimon table with the Iceberg metadata committer enabled (e.g. 'metadata.iceberg.storage' = 'rest-catalog' pointed at a REST catalog such as Polaris, or hadoop-catalog).
  2. Stream data in so that Paimon's LSM engine performs its normal level compaction (any long-running streaming ingest will do this within minutes).
  3. After a compaction happens, read the Iceberg metadata for the resulting snapshot:
    gcloud storage cat gs://<warehouse>/<db>/<table>/metadata/v<N>.metadata.json \
      | jq '.snapshots[-5:] | .[] | {id:."snapshot-id", op:.summary.operation, added:.summary["added-records"], deleted:.summary["deleted-records"]}'
  4. Observe that the compaction snapshot is labeled "operation": "overwrite" even though no logical rows were added or deleted (added-records == 0, deleted-records == 0; only files were reorganized).

What doesn't meet your expectations?

Per the Iceberg spec, the four snapshot operation values have distinct semantics:

operation Meaning
append Only new data files added.
replace Files added and removed without changing table data (compaction, format change, relocation).
overwrite Files added and removed and table data may have changed (INSERT OVERWRITE, MERGE, row-level deletes).
delete Only files removed.

Paimon's own LSM compaction is by definition a pure file rewrite with no logical row change — this is exactly what Iceberg's replace operation is for. Native Iceberg writers (RewriteFiles, RewriteManifests) use DataOperations.REPLACE for this case, and all of Iceberg's incremental scan APIs (IncrementalAppendScan, IncrementalChangelogScan, Spark MicroBatchStream, Flink MonitorSource) treat replace as a no-op for incremental reads.
Paimon currently emits overwrite for these compaction snapshots, which is indistinguishable — from a downstream reader's point of view — from a genuine row-changing overwrite. This breaks any downstream consumer that relies on the spec's distinction.

Anything else?

Root cause

IcebergSnapshotSummary only defines two constants, and there is no code path in Paimon that produces "replace":

// paimon-core/src/main/java/org/apache/paimon/iceberg/metadata/IcebergSnapshotSummary.java
public static final IcebergSnapshotSummary APPEND    = new IcebergSnapshotSummary("append");
public static final IcebergSnapshotSummary OVERWRITE = new IcebergSnapshotSummary("overwrite");

IcebergCommitCallback runs after every Paimon commit (both CommitKind.APPEND and CommitKind.COMPACT). It does not inspect the Paimon CommitKind; it just diffs files and falls back to OVERWRITE any time a previously-manifested file was removed:

// paimon-core/src/main/java/org/apache/paimon/iceberg/IcebergCommitCallback.java
// (createWithDeleteManifestFileMetas)
} else {
    // some file is removed, rewrite this file meta
    snapshotSummary = IcebergSnapshotSummary.OVERWRITE;
    ...
}

Compaction — which always removes the old L0/L1/... files and adds the merged result — therefore deterministically lands as overwrite rather than replace.

Downstream impact [Starrocks Example]

StarRocks IVM (Incremental Materialized View) refresh on a Paimon-produced Iceberg table fails on every compaction snapshot with:

com.starrocks.sql.analyzer.SemanticException: Getting analyzing error.
Detail message: TvrTableDeltaTrait is not append-only for base table: <db>.<table>,
delta:DeltaTrait{delta=Delta@[<snap>,<snap>], changeType=RETRACTABLE,
stats=Stats{addedRows=0, addedFileSize=0}}.

StarRocks recently fixed this for native-Iceberg tables in StarRocks#69825, which skips replace snapshots in IcebergMetadata.listTableDeltaTraits(). That fix does not apply to Paimon-written Iceberg tables because Paimon never emits replace. The StarRocks PR author explicitly scoped the fix to Iceberg and noted that Paimon would need a separate change, so the cleanest place for it is upstream in Paimon, where the Iceberg semantics can be made to match the spec.
Related context:

Anything else?

Proposed fix

  1. Add a REPLACE constant to IcebergSnapshotSummary:
    public static final IcebergSnapshotSummary REPLACE = new IcebergSnapshotSummary("replace");
  2. In IcebergCommitCallback, thread the Paimon CommitKind (or the logical "rows unchanged" signal) through to the summary decision. When the underlying Paimon commit is CommitKind.COMPACT — or, equivalently, when the file-level diff adds/removes files but contributes zero net rows — emit REPLACE instead of OVERWRITE.
  3. Keep OVERWRITE for genuine row-changing operations (INSERT OVERWRITE, merge-on-read deletes that actually drop logical rows, etc.).
    This aligns Paimon's Iceberg-compat metadata with the Iceberg spec and lets downstream incremental readers (StarRocks IVM, Spark structured streaming incremental scans, Flink Iceberg source, etc.) correctly treat Paimon compaction as a no-op for incremental refresh.
    Happy to send a PR if a maintainer can confirm the proposed shape (new enum constant + CommitKind-based branch) is acceptable.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions