Skip to content

_SnapshotProducer._summary() unreasonably slow #2673

@Anton-Tarazi

Description

@Anton-Tarazi

Apache Iceberg version

Latest

Please describe the bug 🐞

_SnapshotProducer.commit(), which is called whenever adding / deleting rows from a table, is surprisingly slow. I traced this to _SnapshotProducer._summary(): for every added/ deleted DataFile _summary calls the self._transaction.table_metadata property, unnecessarily copying the metadata.

#1903 introduced this regression, and I don't believe the performance impacts were as insignificant as stated there. O(# number of added / deleted data files) metadata copies is expensive for large writes.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions