-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Description
LazyFrame.SinkDelta() with DeltaSaveMode.Append fails on existing Delta tables with:
Create table error: A Delta Lake table already exists at that location
This occurs on Azure ADLS Gen2 (HNS-enabled) storage, but the root cause appears to be in the native_shim/src/delta/write.rs logic that applies regardless of storage backend.
Reproduction
var df = DataFrame.From(new[] { new { Id = 1, Value = "test" } });
// First write — succeeds (creates table)
df.Lazy().SinkDelta(tablePath, mode: DeltaSaveMode.Append);
// Second write — fails with "A Delta Lake table already exists"
df.Lazy().SinkDelta(tablePath, mode: DeltaSaveMode.Append);Expected: Second Append should add rows to the existing table.
Actual: Throws PolarsError::ComputeError("Create table error: A Delta Lake table already exists at that location")
Root Cause Analysis
In native_shim/src/delta/write.rs (L104–L128):
let dt = DeltaTable::try_from_url_with_storage_options(
table_url.clone(), delta_storage_options.clone()
).await...?;
if dt.version() >= Some(0) {
match save_mode {
SaveMode::ErrorIfExists => { return Err(...); },
SaveMode::Ignore => { skip_write = true; },
_ => {} // Append falls through here — OK
}
}Then at L210:
if table.version() < Some(0) {
// Auto-create path — calls table.create() which rejects existing tables
table = table.create()...The guard at L112 (dt.version() >= Some(0)) should prevent the auto-create path for existing tables. However, table.version() appears to return None or < Some(0) at L210 despite the table existing and try_from_url_with_storage_options calling .load() internally.
The issue may be that the DeltaTable state is lost or not refreshed after the staging write phase writes .tmp_write_* Parquet files to the table directory — by the time table.version() is checked at L210, the state is stale.
Impact
Any SinkDelta(Append) call to an existing Delta table fails. SinkDelta(Overwrite) works correctly, suggesting the bug is specific to the Append → auto-create code path.
Workaround
Use DataFrame.MergeDelta() instead of SinkDelta(Append). MergeDelta correctly loads the table snapshot via delta-rs's MergeBuilder and doesn't hit the auto-create path:
// Instead of: df.Lazy().SinkDelta(path, mode: DeltaSaveMode.Append, ...);
// Use:
df.MergeDelta(
path,
mergeKeys: new[] { "your_key_column" },
matchedUpdateCond: null, // Update if matched
notMatchedInsertCond: null, // Insert if not matched
notMatchedBySourceDeleteCond: null, // Retain target-only rows
cloudOptions: opts);Environment
- Polars.NET: v0.3.0 (also confirmed present in v0.3.1 —
write.rsunchanged) - delta-rs: 0.31.0 (pinned by Polars.NET)
- Storage: Azure ADLS Gen2 (HNS-enabled), using
abfss://URI scheme - .NET: 8.0
- OS: Linux (Docker container on AKS)