-
Notifications
You must be signed in to change notification settings - Fork 373
Open
Description
Apache Iceberg version
0.10.0
Please describe the bug 🐞
Encountering a TypeError: BaseModel.init() takes 1 positional argument but 2 were given during snapshot expiration when calling table.maintenance.expire_snapshots().older_than(...).commit().
The issue is located in .venv/lib/python3.12/site-packages/pyiceberg/table/update/init.py within this code:
@_apply_table_update.register(RemoveSnapshotsUpdate)
def _(update: RemoveSnapshotsUpdate, base_metadata: TableMetadata, context: _TableMetadataUpdateContext) -> TableMetadata:
for remove_snapshot_id in update.snapshot_ids:
if not any(snapshot.snapshot_id == remove_snapshot_id for snapshot in base_metadata.snapshots):
raise ValueError(f"Snapshot with snapshot id {remove_snapshot_id} does not exist: {base_metadata.snapshots}")
snapshots = [
(
snapshot.model_copy(update={"parent_snapshot_id": None})
if snapshot.parent_snapshot_id in update.snapshot_ids
else snapshot
)
for snapshot in base_metadata.snapshots
if snapshot.snapshot_id not in update.snapshot_ids
]
snapshot_log = [
snapshot_log_entry
for snapshot_log_entry in base_metadata.snapshot_log
if snapshot_log_entry.snapshot_id not in update.snapshot_ids
]
remove_ref_updates = (
RemoveSnapshotRefUpdate(ref_name=ref_name)
for ref_name, ref in base_metadata.refs.items()
if ref.snapshot_id in update.snapshot_ids
)
remove_statistics_updates = (
RemoveStatisticsUpdate(statistics_file.snapshot_id)
for statistics_file in base_metadata.statistics
if statistics_file.snapshot_id in update.snapshot_ids
)
updates = itertools.chain(remove_ref_updates, remove_statistics_updates)
new_metadata = base_metadata
for upd in updates:
new_metadata = _apply_table_update(upd, new_metadata, context)
context.add_update(update)
return new_metadata.model_copy(update={"snapshots": snapshots, "snapshot_log": snapshot_log})
The problem is that RemoveStatisticsUpdate (which inherits from a Pydantic BaseModel) is instantiated with a positional argument instead of a keyword argument, causing the BaseModel init to reject the call.
To fix, the instantiation line should be changed from:
RemoveStatisticsUpdate(statistics_file.snapshot_id)
to
RemoveStatisticsUpdate(snapshot_id=statistics_file.snapshot_id)
This would comply with Pydantic’s requirement that model fields be passed as keyword arguments.
Environment:
Python 3.12
PyIceberg version: 0.10.0
Steps to reproduce:
Load an Iceberg table with multiple snapshots
Call table.maintenance.expire_snapshots().older_than(cutoff_datetime).commit()
Observe the TypeError traceback related to BaseModel.init
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels