Clip read-ahead cache reads to file size for small/last blocks#13264
Merged
akankshamahajan15 merged 1 commit intoMay 25, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes AsyncFileReadAheadCache::read_impl to avoid issuing per-block reads that extend past EOF, which previously caused oversized cache allocations for small/trailing blocks and could break backends that reject out-of-range reads (e.g., HTTP range reads).
Changes:
- Compute each block’s start offset and clip the read length to
min(m_block_size, fileSize - blockStart). - Use the clipped per-block length when invoking
readBlock(...), preventing over-allocation and out-of-range reads.
akankshamahajan15
approved these changes
May 22, 2026
Contributor
Author
|
@akankshamahajan15 do we need to trigger the tests to run to get this merged? |
Contributor
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Contributor
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Contributor
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Contributor
Result of foundationdb-pr-clang on Linux RHEL 9
|
Contributor
Result of foundationdb-pr on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
This was referenced May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
AsyncFileReadAheadCache::read_implissues every block read withlength = m_block_size, regardless of where the block sits in the file.readBlockthen allocates aCacheBlockof exactly that size and callsm_f->read(..., m_block_size, offset).For a file smaller than one block, or the trailing block of any file, this means:
m_block_size = 1 MiBallocates a full 1 MiB buffer.IAsyncFile::readis asked for bytes past EOF. Most backends tolerate this (they return a short read), but some, notably HTTP-range blob store reads, can reject or misbehave on a range that extends past the object's end, surfacing as a spurious read failure on small files.The function already clips the overall request to fileSize - offset near the top, but the per-block reads still happen at the full
m_block_size.Solution
Clip each per-block read to the bytes remaining in the file, matching the
blockStart/std::minpattern already used a few lines further down in the copy loop:fileSizeis captured at the top of the function andlastBlockToStartis already clamped tolastBlockNumInFile, sofileSize - blockStartis strictly positive within this loop.readLengthis bounded above bym_block_size (int), so the narrowing intoreadBlock(..., int length, ...)is safe.I would also like to backport this to 7.3 (and 7.4) if it is acceptable, since we have run into an issue with the blob store reads failing.