feat(update): add MergingSnapshotUpdate#682
Conversation
e950bed to
0eebdbf
Compare
Abstract base for merge-based snapshot operations (MergeAppend, OverwriteFiles, RowDelta, etc.), implementing the filter → write → merge pipeline consistent with Java's MergingSnapshotProducer. Also fixes SnapshotSummaryBuilder manifest count fields and a use-after-free bug in SnapshotUpdate::DeleteFile.
0eebdbf to
3085e61
Compare
| return InvalidArgument("Cannot append manifest with deleted files: {}", | ||
| manifest.manifest_path); | ||
| } | ||
| if (manifest.added_snapshot_id != kInvalidSnapshotId) { |
There was a problem hiding this comment.
Java copies append manifests that already have a snapshot id (or assigned first_row_id) instead of rejecting them. This should go through CopyManifest for parity.
|
|
||
| // Step 1: Filter data manifests. | ||
| ICEBERG_ASSIGN_OR_RAISE(auto filtered_data, | ||
| data_filter_manager_.FilterManifests( |
There was a problem hiding this comment.
Java uses SnapshotUtil.schemaFor(base, targetBranch()) here. Using metadata.Schema() underneath can evaluate filters with the wrong schema when committing to a branch behind current.
| metadata_to_update, snapshot, tracked_factory)); | ||
|
|
||
| // Track deleted data files in the summary builder. | ||
| for (const auto& file : data_filter_manager_.FilesToBeDeleted()) { |
There was a problem hiding this comment.
This counts requested deletes, not actual filtered deletes. Java builds the summary from deleted entries in filtered manifests, so missing optional deletes should not affect totals.
| } | ||
|
|
||
| ICEBERG_ASSIGN_OR_RAISE(auto schema, metadata.Schema()); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto ancestors, |
There was a problem hiding this comment.
Java validationHistory fails if it cannot reach startingSnapshotId. These helpers should not silently validate a truncated ancestor range.
| return !delete_paths_.empty(); | ||
| // Also open delete manifests when a minimum sequence number is set for cleanup. | ||
| return !delete_paths_.empty() || !removed_data_file_paths_.empty() || | ||
| (manifest_content_ == ManifestContent::kDeletes && min_sequence_number_ > 0); |
There was a problem hiding this comment.
Java does not open delete manifests solely because minSequenceNumber is set; old deletes are dropped only when a delete manifest is already being filtered.
| ICEBERG_ASSIGN_OR_RAISE( | ||
| auto merged, FlushBin(bin, snapshot_id, metadata, file_io, writer_factory)); | ||
| // Each manifest consumed into the merged output (beyond the 1 output) is replaced. | ||
| replaced_manifests_count_ += static_cast<int32_t>(bin.size()) - 1; |
There was a problem hiding this comment.
Java counts each previous-snapshot manifest consumed by the merge as replaced, not bin.size() - 1. This can undercount old-only bins and overcount current-snapshot inputs.
Abstract base for merge-based snapshot operations (MergeAppend, OverwriteFiles, RowDelta, etc.), implementing the filter → write → merge pipeline consistent with Java's MergingSnapshotProducer.
Also fixes SnapshotSummaryBuilder manifest count fields and a use-after-free bug in SnapshotUpdate::DeleteFile.