feat: OverwriteAction support (replace all / partial overwrite)#106
Merged
Conversation
Update iceberg-rust rev to a4a353577ad7414b065770ba970c1353325a3adb (RelationalAI/iceberg-rust#76, adds OverwriteAction). Fix three API breaks exposed by the rev bump: - UnzippedIncrementalBatchRecordStream renamed to UnzippedIncrementalScanResult (struct with .appends/.deletes fields instead of a tuple alias) - to_unzipped_arrow() now returns that struct, not a bare tuple - ArrowReader::read() now returns ScanResult; call .stream() to unwrap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New FFI surface for atomic table overwrites: - IcebergOverwriteAction: accumulates added + deleted DataFile lists - iceberg_overwrite_action_new / _free - iceberg_overwrite_action_add_data_files: move new files into action - iceberg_overwrite_action_delete_data_files: move files-to-delete into action - iceberg_overwrite_action_apply: calls Transaction::overwrite().apply() - iceberg_table_list_data_files: async walk of manifest list to collect all live DataFile records from the current snapshot (needed so Julia can supply the delete list for a full-table replace) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- OverwriteAction struct + constructor/free - add_data_files(action, files) / delete_data_files(action, files) - apply(action, tx) / with_overwrite(f, tx) convenience helper - list_data_files(table) -> DataFiles (async, walks manifest list) - All new symbols exported from RustyIceberg Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Six testsets, all self-contained with mktempdir + catalog_create_memory: - OverwriteAction lifecycle (new/free/double-free) - list_data_files on empty table - list_data_files after append - Overwrite replaces all existing files (2 appended files → 2 new rows) - Overwrite add-only produces a new snapshot - Two sequential overwrites converge correctly - OverwriteAction error handling (freed action, null DataFiles, consumed tx) All tables freed in finally blocks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
with_data_file_writer is not exported from RustyIceberg, so bare usage in test file caused UndefVarError at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l overwrite - Qualify with_transaction as RustyIceberg.with_transaction (not exported) - Fix "apply on consumed transaction" test: apply() doesn't consume tx, only commit() does; now we commit first then try apply - Add "partial overwrite" testset: delete all + re-add kept rows + new rows, verifies mixed add_data_files calls and selective deletion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion Instead of awkwardly re-adding "kept" rows, list_data_files on an earlier snapshot (v1) gives just the first file. The overwrite deletes only those files; the second file (appended in v2) is not in the delete list and survives intact — directly testing the expected semantics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous code used bare IcebergException(UNEXPECTED, ...) which
references an undefined constant, causing UndefVarError at runtime.
Switch to parse_and_throw (same pattern as FastAppendAction) which
extracts the error code from the Rust-encoded message string, and
use Ref{Ptr{Cchar}} (not Ptr{Ptr{Cchar}}) to match the calling convention.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er-overwrite test - Add #[derive(Default)] to IcebergOverwriteAction to satisfy clippy - cargo fmt reformatting of transaction.rs and incremental_pipeline.rs - Bump crate version 0.8.1 → 0.8.2 - Add testset: fast append after full overwrite clears table then re-populates it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both iceberg_overwrite_action_add/delete_data_files drain the Vec<DataFile> via std::mem::take but leave the IcebergDataFiles box alive. Wrap the ccall in try/finally and call free_data_files! to match the FastAppendAction pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Picks up the fix that relaxes SnapshotProducer's precondition check to allow Overwrite snapshots that only delete files without adding new ones. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
read_table_data returns nothing when there are no record batches, which is the expected state after clearing a table via delete-only overwrite. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rust FFI:
- iceberg_data_files_len: return file count without consuming the handle
- iceberg_data_files_to_json: serialize all DataFile metadata fields to a
JSON array (content, file_path, file_format, record_count,
file_size_in_bytes, column/value/null/nan counts, bounds, split_offsets,
sort_order_id, equality_ids, first_row_id, referenced_data_file,
content_offset, content_size_in_bytes)
Julia:
- Base.length(df::DataFiles): wraps iceberg_data_files_len
- data_file_info(df::DataFiles): returns Vector{Dict{String,Any}} via JSON
Tests updated to assert file counts and metadata instead of just checking
for non-null handles.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rgankema
approved these changes
May 29, 2026
Co-authored-by: Richard Gankema <richardgankema@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds atomic overwrite snapshot support to RustyIceberg.jl, enabling callers to replace all (or a subset of) existing Parquet files with a new set in a single Iceberg
Operation::Overwritesnapshot.Depends on: RelationalAI/iceberg-rust#76 (cherry-pick of upstream apache/iceberg-rust#2185, which adds
OverwriteActiontoiceberg-rust).Changes
FFI (
iceberg_rust_ffi/src/transaction.rs)IcebergOverwriteAction— accumulates added + deletedDataFilelistsiceberg_overwrite_action_new/_freeiceberg_overwrite_action_add_data_files— move new files into actioniceberg_overwrite_action_delete_data_files— move files-to-delete into actioniceberg_overwrite_action_apply— callsTransaction::overwrite().apply()iceberg_table_list_data_files— async walk of manifest list to collect all liveDataFilerecords from the current snapshotJulia bindings (
src/transaction.jl)OverwriteActionstruct + constructor /free_overwrite_action!add_data_files(action, files)/delete_data_files(action, files)apply(action, tx)/with_overwrite(f, tx)convenience helperlist_data_files(table) -> DataFilesRustyIcebergTests (
test/overwrite_tests.jl)Self-contained, no Docker — all tests use
mktempdir+catalog_create_memory:list_data_fileson empty tablelist_data_filesafter appendUsage
Test plan
make run-containers && make testpasses (all overwrite testsets green)🤖 Generated with Claude Code