Delta: Updating Delta to Iceberg conversion - Inserts only by vladislav-sidorovich · Pull Request #15407 · apache/iceberg

vladislav-sidorovich · 2026-02-22T19:55:55Z

Current PRs contains initial version of the code to update of the existing functionality: https://iceberg.apache.org/docs/1.4.3/delta-lake-migration/ to the recent Delta Lake version (read: 3, write: 7). The motivation of the PR is to receive the earliest feedback from the community.

Note: The PR doesn't remove the old logic but adds new Interface implementation, so it will be easier to compare/review. Also base on the usage scenario of the module, such approach will not introduce any issues.

The PR scope:

Support existing interface
Uses Delta Lake kernel library instead of deprecated Delta Lake standalone
Contains the basic flow
Converts all data types
Converts table schema and partitions spec
Support only INSERT operation (Delta Lake Add action)

Future steps:

Support UPDATES and DELETS (Remove Delta Lake action)
Support All Delta Lake actions
Handle Edge cases for partitions and Generated columns
Handle Schema evolution
Support DVs
Incremental Conversion (from/to a specific Delta Version)

Tests:
Unit-tests: contains all supported datatypes including complex arrays and structures.
Integration-tests: contains inserts only scenario with Spark 3.5. The test must be updated for newer Delta Lake version once the previous solution will be deleted from the code.

In the following PRs, I will add all the tables from: Delta golden tables

anoopj

Thank you for the PR. Moving to the Delta kernel is a great improvement. Here is my initial feedback.

anoopj · 2026-02-24T02:03:15Z

...c/integration/java/org/apache/iceberg/delta/DeltaLakeToIcebergMigrationSparkIntegration.java

+        "The table identifier cannot be null, please provide a valid table identifier for the new iceberg table");
+    Preconditions.checkArgument(
+        deltaTableLocation != null,
+        "The delta lake table location cannot be null, please provide a valid location of the delta lake table to be snapshot");


Nit: Replace with Delta Lake and Iceberg in the error messages.

anoopj · 2026-02-24T02:04:56Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+      totalDataFiles =
+          convertEachDeltaVersion(initialDeltaVersion, latestDeltaVersion, transaction);
+    } catch (IOException e) {
+      throw new RuntimeException(e);


Replace with throw new UncheckedIOException(e); (I think there is a CI failure related to this)

anoopj · 2026-02-24T02:08:23Z

delta-lake/src/test/resources/delta/golden/README.md

@@ -0,0 +1,50 @@
+# Delta Lake Golden Tables


There is a CI failure due to missing license header. May want to add this to .rat-excludes

anoopj · 2026-02-24T02:09:06Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+  private long convertEachDeltaVersion(
+      long initialDeltaVersion, long latestDeltaVersion, Transaction transaction)
+      throws IOException {
+    LOG.info(


Did you mean to remove this line?

anoopj · 2026-02-24T02:09:28Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+  }
+
+  /**
+   * Convert each dela log {@code ColumnarBatch} to Iceberg action and commit to the given {@code


Typo in delta log

anoopj · 2026-02-24T02:25:27Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+      appendFiles.commit();
+    }
+
+    tagCurrentSnapshot(deltaVersion, commitTimestamp, transaction);


If dataFilesToAdd is empty, ie line 279 evaluates to false, this line might cause a NPE.

Yes, you are right. It will be no snapshots for empty tables. I will handle this scenario and add a test for it.

anoopj · 2026-02-24T02:28:01Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+import io.delta.kernel.exceptions.TableNotFoundException;
+import io.delta.kernel.internal.DeltaHistoryManager;
+import io.delta.kernel.internal.DeltaLogActionUtils;
+import io.delta.kernel.internal.SnapshotImpl;


We are using internal APIs of the kernel. This is fragile - can we refactor this to use the public APIs instead? Snapshot, Table etc. Or are we doing this because we are trying to preserve the table history during the conversion? I would try to avoid this as much as possible.

No, there are no public API available for these purposes we need.

Yes, I want go through table history step by step, so we will have exactly the same granularity in the history.

At the same time it's quite safe to use an internal API because it's depends on the Delta protocol which is stable.

The internal APIs can change or disappear without any notice. I would think hard about avoiding dependencies on internal APIs, including changing semantics. (e.g. not preserving all the history by default).

anoopj · 2026-02-24T02:38:50Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+      while (rows.hasNext()) {
+        Row row = rows.next();
+        if (DeltaLakeActionsTranslationUtil.isAdd(row)) {
+          AddFile addFile = DeltaLakeActionsTranslationUtil.toAdd(row);


Can we avoid the use of the internal AddFile class and read fields directly from the Row using ordinals defined by the scan file schema?

anoopj · 2026-02-24T02:45:52Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+  public SnapshotDeltaLakeTable deltaLakeConfiguration(Configuration conf) {
+    deltaEngine = DefaultEngine.create(conf);
+    deltaLakeFileIO = new HadoopFileIO(conf);
+    deltaTable = (TableImpl) Table.forPath(deltaEngine, deltaTableLocation);


Unnecessary cast?

It's necessary because I use internal API below. getChanges API is available only in TableImpl but not in the Table interface.

anoopj · 2026-02-24T02:49:18Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+    try (CloseableIterator<Row> rows = columnarBatch.getRows()) {
+      while (rows.hasNext()) {
+        Row row = rows.next();
+        if (DeltaLakeActionsTranslationUtil.isAdd(row)) {


Shouldn't we do a fail fast if we encounter a remove? Otherwise in tables with DML, the conversion will cause silent duplicates.

Not really. I do not expect this code to be merged/used before I will add handling of all other Delta Actions.

So, I will support of remove action quite soon and fail fast will not be needed here.

anoopj · 2026-02-24T04:20:37Z

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

+        latestDeltaVersion,
+        newTableIdentifier);
+
+    Schema icebergSchema = convertToIcebergSchema(initialDeltaSnapshot.getSchema());


This can cause silent data corruption if the table has column mapping enabled. For context, Delta Lake supports three column mapping modes:

none: Parquet files use the same column names as the logical schema.

name: Parquet files use physical names (UUIDs or opaque strings) that differ from the logical names The Delta log stores a mapping between them. This enables column renames without rewriting
data files.

id: Similar to name mode but maps by field ID instead of name.

Here we are mapping the schema based on the logical names. This won't work when column mapping is enabled (name or id). Delta Lake's UniForm feature requires column mapping to be enabled (name or id mode) and carries the physical-to-logical name mapping through to Iceberg via Iceberg's own column mapping (field IDs).

For now, we should at least validate that column mapping is not enabled and fail fast:

Map<String, String> configuration = deltaSnapshot.getMetadata().getConfiguration(); String columnMappingMode = configuration.getOrDefault("delta.columnMapping.mode", "none"); if (!"none".equals(columnMappingMode)) { throw new UnsupportedOperationException("..."); }

Thank you for the input, it's very valuable. I will add it.

vladislav-sidorovich added 11 commits December 8, 2025 22:53

Add Delta to Iceberg types conversion

1ff13bc

Add golden Delta lake tables for tests

b84fcde

Add table creation only from delta table source

4d8e2ce

Add delta properties from SnapshotImpl

1dd1f0b

Use Earliest delta version for initial Iceberg transaction

f706d2d

add append only conversion support

afc9d70

Add Spark conversion test

08320eb

Fix code style

a5c0a43

Fix code style

2d729ed

Delete delta golden tables

987fe5c

Update tests for inserts only conversion

cf4a590

github-actions bot added docs build labels Feb 22, 2026

anoopj reviewed Feb 24, 2026

View reviewed changes

Comments

Conversation

vladislav-sidorovich commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anoopj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vladislav-sidorovich commented Feb 22, 2026 •

edited

Loading