Skip to content

Conversation

@EnricoMi
Copy link
Contributor

@EnricoMi EnricoMi commented Feb 11, 2026

What changes were proposed in this pull request?

This introduces a generic FileSystemSegmentManagedBuffer, which wraps a segment of a file on an Hadoop FileSystem. This is then used by the FallbackStorage to read block data lazily.

Why are the changes needed?

The ShuffleBlockFetcherIterator iterates over various sources of block data: local, host-local, push-merged local, remote and fallback storage blocks. It makes large efforts to keep the memory consumed during iteration low. On creation of the iterator, ShuffleBlockFetcherIterator.initialize() creates ManagedBuffers for each local, host-local and push-merged local block. Only on ShuffleBlockFetcherIterator.next(), the ManagedBuffer actually reads the block data of the next block.

Remote blocks are fetched synchronously and only up to a specific amount of bytes.

Currently, method FallbackStorage.read returns a ManagedBuffer that already stores the data. Therefore, fallback storage blocks are fully read in ShuffleBlockFetcherIterator.initialize(). The entire shuffle data of the iterator that originates on the fallback storage is hold in memory before the iteration starts.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests for FileSystemSegmentManagedBuffer and ShuffleBlockFetcherIterator.

This now explicitly tests ShuffleBlockFetcherIterator with fallback storage blocks.

Was this patch authored or co-authored using generative AI tooling?

No.

@EnricoMi EnricoMi force-pushed the fallback-storage-lazy-read branch from 2abc314 to 851362e Compare February 11, 2026 13:50
@EnricoMi
Copy link
Contributor Author

EnricoMi commented Feb 11, 2026

@dongjoon-hyun this fixes a memory issue in ShuffleBlockFetcherIterator when reading from fallback storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant