-
Notifications
You must be signed in to change notification settings - Fork 382
Open
Description
Apache Iceberg version
Latest
Please describe the bug 🐞
_SnapshotProducer.commit(), which is called whenever adding / deleting rows from a table, is surprisingly slow. I traced this to _SnapshotProducer._summary(): for every added/ deleted DataFile _summary calls the self._transaction.table_metadata property, unnecessarily copying the metadata.
#1903 introduced this regression, and I don't believe the performance impacts were as insignificant as stated there. O(# number of added / deleted data files) metadata copies is expensive for large writes.
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels