Skip to content

SinkDelta(Append) fails with 'already exists' on existing Delta tables (Azure Blob non-HNS) #15

@omarmciver

Description

@omarmciver

Description

LazyFrame.SinkDelta() with DeltaSaveMode.Append fails on existing Delta tables with:

Create table error: A Delta Lake table already exists at that location

This occurs on Azure ADLS Gen2 (HNS-enabled) storage, but the root cause appears to be in the native_shim/src/delta/write.rs logic that applies regardless of storage backend.

Reproduction

var df = DataFrame.From(new[] { new { Id = 1, Value = "test" } });

// First write — succeeds (creates table)
df.Lazy().SinkDelta(tablePath, mode: DeltaSaveMode.Append);

// Second write — fails with "A Delta Lake table already exists"
df.Lazy().SinkDelta(tablePath, mode: DeltaSaveMode.Append);

Expected: Second Append should add rows to the existing table.
Actual: Throws PolarsError::ComputeError("Create table error: A Delta Lake table already exists at that location")

Root Cause Analysis

In native_shim/src/delta/write.rs (L104–L128):

let dt = DeltaTable::try_from_url_with_storage_options(
    table_url.clone(), delta_storage_options.clone()
).await...?;

if dt.version() >= Some(0) {
    match save_mode {
        SaveMode::ErrorIfExists => { return Err(...); },
        SaveMode::Ignore => { skip_write = true; },
        _ => {}   // Append falls through here — OK
    }
}

Then at L210:

if table.version() < Some(0) {
    // Auto-create path — calls table.create() which rejects existing tables
    table = table.create()...

The guard at L112 (dt.version() >= Some(0)) should prevent the auto-create path for existing tables. However, table.version() appears to return None or < Some(0) at L210 despite the table existing and try_from_url_with_storage_options calling .load() internally.

The issue may be that the DeltaTable state is lost or not refreshed after the staging write phase writes .tmp_write_* Parquet files to the table directory — by the time table.version() is checked at L210, the state is stale.

Impact

Any SinkDelta(Append) call to an existing Delta table fails. SinkDelta(Overwrite) works correctly, suggesting the bug is specific to the Append → auto-create code path.

Workaround

Use DataFrame.MergeDelta() instead of SinkDelta(Append). MergeDelta correctly loads the table snapshot via delta-rs's MergeBuilder and doesn't hit the auto-create path:

// Instead of: df.Lazy().SinkDelta(path, mode: DeltaSaveMode.Append, ...);
// Use:
df.MergeDelta(
    path,
    mergeKeys: new[] { "your_key_column" },
    matchedUpdateCond: null,              // Update if matched
    notMatchedInsertCond: null,           // Insert if not matched
    notMatchedBySourceDeleteCond: null,   // Retain target-only rows
    cloudOptions: opts);

Environment

  • Polars.NET: v0.3.0 (also confirmed present in v0.3.1 — write.rs unchanged)
  • delta-rs: 0.31.0 (pinned by Polars.NET)
  • Storage: Azure ADLS Gen2 (HNS-enabled), using abfss:// URI scheme
  • .NET: 8.0
  • OS: Linux (Docker container on AKS)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions