Skip to content

TypeError in snapshot expiration due to positional argument to BaseModel in table/update/__init__.py #2558

@vndv

Description

@vndv

Apache Iceberg version

0.10.0

Please describe the bug 🐞

Encountering a TypeError: BaseModel.init() takes 1 positional argument but 2 were given during snapshot expiration when calling table.maintenance.expire_snapshots().older_than(...).commit().

The issue is located in .venv/lib/python3.12/site-packages/pyiceberg/table/update/init.py within this code:

@_apply_table_update.register(RemoveSnapshotsUpdate)
def _(update: RemoveSnapshotsUpdate, base_metadata: TableMetadata, context: _TableMetadataUpdateContext) -> TableMetadata:
    for remove_snapshot_id in update.snapshot_ids:
        if not any(snapshot.snapshot_id == remove_snapshot_id for snapshot in base_metadata.snapshots):
            raise ValueError(f"Snapshot with snapshot id {remove_snapshot_id} does not exist: {base_metadata.snapshots}")

    snapshots = [
        (
            snapshot.model_copy(update={"parent_snapshot_id": None})
            if snapshot.parent_snapshot_id in update.snapshot_ids
            else snapshot
        )
        for snapshot in base_metadata.snapshots
        if snapshot.snapshot_id not in update.snapshot_ids
    ]
    snapshot_log = [
        snapshot_log_entry
        for snapshot_log_entry in base_metadata.snapshot_log
        if snapshot_log_entry.snapshot_id not in update.snapshot_ids
    ]

    remove_ref_updates = (
        RemoveSnapshotRefUpdate(ref_name=ref_name)
        for ref_name, ref in base_metadata.refs.items()
        if ref.snapshot_id in update.snapshot_ids
    )
    remove_statistics_updates = (
        RemoveStatisticsUpdate(statistics_file.snapshot_id)
        for statistics_file in base_metadata.statistics
        if statistics_file.snapshot_id in update.snapshot_ids
    )
    updates = itertools.chain(remove_ref_updates, remove_statistics_updates)
    new_metadata = base_metadata
    for upd in updates:
        new_metadata = _apply_table_update(upd, new_metadata, context)

    context.add_update(update)
    return new_metadata.model_copy(update={"snapshots": snapshots, "snapshot_log": snapshot_log})

The problem is that RemoveStatisticsUpdate (which inherits from a Pydantic BaseModel) is instantiated with a positional argument instead of a keyword argument, causing the BaseModel init to reject the call.

To fix, the instantiation line should be changed from:

RemoveStatisticsUpdate(statistics_file.snapshot_id)

to

RemoveStatisticsUpdate(snapshot_id=statistics_file.snapshot_id)

This would comply with Pydantic’s requirement that model fields be passed as keyword arguments.

Environment:

Python 3.12

PyIceberg version: 0.10.0

Steps to reproduce:

Load an Iceberg table with multiple snapshots

Call table.maintenance.expire_snapshots().older_than(cutoff_datetime).commit()

Observe the TypeError traceback related to BaseModel.init

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions