-
Notifications
You must be signed in to change notification settings - Fork 371
Description
Apache Iceberg version
main (development)
Please describe the bug 🐞
Snapshot OVERWRITE
operation can calculate the wrong summary fields when the table is partially updated.
update_snapshot_summaries
assumes that all OVERWRITE
operations are full table overwrite
truncate_full_table=self._operation == Operation.OVERWRITE, |
iceberg-python/pyiceberg/table/snapshots.py
Lines 358 to 359 in 322ebdd
if truncate_full_table and summary.operation == Operation.OVERWRITE and previous_summary is not None: | |
summary = _truncate_table_summary(summary, previous_summary) |
This is likely an oversight when we implemented partial write.
Thankfully the table/transaction's overwrite
function is currently implemented as a delete+append.
The only place where OVERWRITE
operation is used is during partial deletes.
iceberg-python/pyiceberg/table/__init__.py
Line 678 in 322ebdd
with self.update_snapshot(snapshot_properties=snapshot_properties).overwrite() as overwrite_snapshot: |
Original thread apache/iceberg-go#356 (comment) (thanks @arnaudbriche and @zeroshade )
Partial overwrite reproduced in #1840
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time