Skip to content

Clip read-ahead cache reads to file size for small/last blocks#13264

Merged
akankshamahajan15 merged 1 commit into
apple:mainfrom
arnav-ag:arnav/fix-read-ahead-cache-small-files
May 25, 2026
Merged

Clip read-ahead cache reads to file size for small/last blocks#13264
akankshamahajan15 merged 1 commit into
apple:mainfrom
arnav-ag:arnav/fix-read-ahead-cache-small-files

Conversation

@arnav-ag
Copy link
Copy Markdown
Contributor

@arnav-ag arnav-ag commented May 22, 2026

Problem

AsyncFileReadAheadCache::read_impl issues every block read with length = m_block_size, regardless of where the block sits in the file. readBlock then allocates a CacheBlock of exactly that size and calls m_f->read(..., m_block_size, offset).

For a file smaller than one block, or the trailing block of any file, this means:

  • The cache over-allocates: e.g. a 100-byte file with m_block_size = 1 MiB allocates a full 1 MiB buffer.
  • The underlying IAsyncFile::read is asked for bytes past EOF. Most backends tolerate this (they return a short read), but some, notably HTTP-range blob store reads, can reject or misbehave on a range that extends past the object's end, surfacing as a spurious read failure on small files.

The function already clips the overall request to fileSize - offset near the top, but the per-block reads still happen at the full m_block_size.

Solution

Clip each per-block read to the bytes remaining in the file, matching the blockStart/std::min pattern already used a few lines further down in the copy loop:

int64_t blockStart = (int64_t)blockNum * f->m_block_size;
int readLength = std::min<int64_t>(f->m_block_size, fileSize - blockStart);
fblock = readBlock(f.getPtr(), readLength, blockStart);

fileSize is captured at the top of the function and lastBlockToStart is already clamped to lastBlockNumInFile, so fileSize - blockStart is strictly positive within this loop. readLength is bounded above by m_block_size (int), so the narrowing into readBlock(..., int length, ...) is safe.


I would also like to backport this to 7.3 (and 7.4) if it is acceptable, since we have run into an issue with the blob store reads failing.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes AsyncFileReadAheadCache::read_impl to avoid issuing per-block reads that extend past EOF, which previously caused oversized cache allocations for small/trailing blocks and could break backends that reject out-of-range reads (e.g., HTTP range reads).

Changes:

  • Compute each block’s start offset and clip the read length to min(m_block_size, fileSize - blockStart).
  • Use the clipped per-block length when invoking readBlock(...), preventing over-allocation and out-of-range reads.

@arnav-ag
Copy link
Copy Markdown
Contributor Author

@akankshamahajan15 do we need to trigger the tests to run to get this merged?

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 3184187
  • Duration 0:21:06
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 3184187
  • Duration 0:34:25
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 3184187
  • Duration 0:44:05
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 3184187
  • Duration 0:46:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 3184187
  • Duration 0:51:04
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 3184187
  • Duration 0:54:45
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 3184187
  • Duration 1:00:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@akankshamahajan15 akankshamahajan15 merged commit 5ec236e into apple:main May 25, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants