-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Feature Request / Improvement
The reason why we care about this is when we have parquet manifests we cannot re-use the immutable list returned by the "get" method from base file. That means we leak an object for every manifest. Not a huge deal but we should probably do something there.
--
BaseFile stores split offsets internally as a long[], but splitOffsets() wraps it in a new List<Long> via ArrayUtil.toUnmodifiableLongList on every invocation. When file metadata is being read and rewritten (e.g., during manifest rewriting or format conversion), this means each entry needlessly allocates a list that is immediately consumed and discarded.
Other fields like partitionData are stored and returned as-is. Split offsets could similarly cache or reuse the List<Long> representation, or callers within the core module could use the existing package-private splitOffsetArray() to pass the raw long[] through without conversion.
Query engine
None
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time