Skip to content

Cache splitOffsets list to avoid repeated allocations#15625

Open
kuldeep27396 wants to merge 4 commits intoapache:mainfrom
kuldeep27396:fix/splitoffsets-caching
Open

Cache splitOffsets list to avoid repeated allocations#15625
kuldeep27396 wants to merge 4 commits intoapache:mainfrom
kuldeep27396:fix/splitoffsets-caching

Conversation

@kuldeep27396
Copy link

Why this change

Resolves #15622

Problem: BaseFile.splitOffsets() was creating a new List wrapper on every invocation via ArrayUtil.toUnmodifiableLongList(). When file metadata is read and rewritten (e.g., during manifest rewriting or format conversion), each entry needlessly allocates a list that is immediately consumed and discarded.

Solution: Cache the List representation after the first conversion, similar to how other fields like partitionData are handled. The cache is transient so it won't be serialized.

Changes

  • Added private transient List splitOffsetsList field to cache the converted list
  • Modified splitOffsets() to return the cached list instead of creating a new one on each call
  • The cached list is created lazily on first access

Performance Impact

This eliminates unnecessary object allocations when:

  • Reading and rewriting manifest files
  • Format conversions
  • Any scenario where splitOffsets() is called multiple times on the same BaseFile instance

The change is minimal and focused, following the existing pattern used for other cached fields in the class.

BaseFile.splitOffsets() was creating a new List<Long> wrapper on every
invocation via ArrayUtil.toUnmodifiableLongList(). When file metadata is
read and rewritten (e.g., during manifest rewriting or format conversion),
each entry needlessly allocates a list that is immediately consumed and
discarded.

This change caches the List<Long> representation after the first conversion,
similar to how other fields like partitionData are handled. The cache is
transient so it won't be serialized.

Fixes apache#15622
@github-actions github-actions bot added the core label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BaseFile.splitOffsets() allocates a new List on every call

1 participant