Skip to content

[fix](filecache) avoid BE crash when finalize misses local cache writer#62389

Merged
hello-stephen merged 1 commit into
apache:masterfrom
freemandealer:task-master-pick-pr-8321-avoid-be-crash-when-fin
May 12, 2026
Merged

[fix](filecache) avoid BE crash when finalize misses local cache writer#62389
hello-stephen merged 1 commit into
apache:masterfrom
freemandealer:task-master-pick-pr-8321-avoid-be-crash-when-fin

Conversation

@freemandealer
Copy link
Copy Markdown
Member

Problem

FSFileCacheStorage::finalize() relied on DCHECK to assume the writer entry always existed in _key_to_writer. In release builds a missing writer entry could still fall through, dereference end(), and crash BE.

Root Cause

When (hash, offset) is missing from _key_to_writer, the current finalize path moves and erases through an invalid iterator. The target branch already uses the newer finalize(key, size) signature, so this PR ports the hardening change to that path.

Solution

  • return InternalError immediately when finalize cannot find the writer entry
  • include hash, offset, cache type, and expiration time in the error message for diagnosis
  • keep the normal finalize path unchanged when the writer exists
  • rely on the existing FileBlock::set_downloaded() cleanup path to abort the pending cache write and reset block state

Tests

  • BlockFileCacheTest.fs_file_cache_storage_finalize_missing_writer_returns_error
  • DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 sh run-be-ut.sh --clean --run --filter=BlockFileCacheTest.fs_file_cache_storage_finalize_missing_writer_returns_error -j 39
  • Source patch: selectdb/selectdb-core#8321

FSFileCacheStorage::finalize() relied on DCHECK to assume the writer entry was always present in _key_to_writer. In release builds a missing entry could still fall through, dereference end(), and crash BE.

Return InternalError when finalize cannot find the writer instead of touching an invalid iterator. The existing FileBlock::set_downloaded() failure path already aborts the pending cache write and resets the block state, so the failure becomes recoverable and diagnosable.

Add a BE unit test that covers the missing-writer finalize path.
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 11, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@freemandealer
Copy link
Copy Markdown
Member Author

run buildall

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.06% (20154/37984)
Line Coverage 36.60% (189526/517833)
Region Coverage 32.85% (147115/447780)
Branch Coverage 33.99% (64427/189523)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.53% (27352/37198)
Line Coverage 57.19% (295231/516249)
Region Coverage 54.41% (245902/451912)
Branch Coverage 56.02% (106488/190105)

@freemandealer
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.56% (27515/37406)
Line Coverage 57.37% (297720/518924)
Region Coverage 54.56% (247631/453843)
Branch Coverage 56.15% (107190/190909)

@freemandealer
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.33% (20367/38193)
Line Coverage 36.87% (191907/520521)
Region Coverage 33.18% (149211/449686)
Branch Coverage 34.30% (65278/190336)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.56% (27515/37406)
Line Coverage 57.37% (297713/518924)
Region Coverage 54.56% (247628/453843)
Branch Coverage 56.15% (107191/190909)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27527/37406)
Line Coverage 57.41% (297925/518924)
Region Coverage 54.59% (247766/453843)
Branch Coverage 56.17% (107229/190909)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27528/37406)
Line Coverage 57.41% (297933/518924)
Region Coverage 54.60% (247803/453843)
Branch Coverage 56.17% (107232/190909)

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27528/37406)
Line Coverage 57.41% (297933/518924)
Region Coverage 54.60% (247803/453843)
Branch Coverage 56.17% (107232/190909)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27528/37406)
Line Coverage 57.41% (297932/518924)
Region Coverage 54.60% (247793/453843)
Branch Coverage 56.17% (107232/190909)

@hello-stephen
Copy link
Copy Markdown
Contributor

skip buildall

@hello-stephen hello-stephen merged commit 9b4382a into apache:master May 12, 2026
31 of 33 checks passed
github-actions Bot pushed a commit that referenced this pull request May 12, 2026
…er (#62389)

## Problem

`FSFileCacheStorage::finalize()` relied on `DCHECK` to assume the writer
entry always existed in `_key_to_writer`. In release builds a missing
writer entry could still fall through, dereference `end()`, and crash
BE.

## Root Cause

When `(hash, offset)` is missing from `_key_to_writer`, the current
finalize path moves and erases through an invalid iterator. The target
branch already uses the newer `finalize(key, size)` signature, so this
PR ports the hardening change to that path.

## Solution

- return `InternalError` immediately when finalize cannot find the
writer entry
- include hash, offset, cache type, and expiration time in the error
message for diagnosis
- keep the normal finalize path unchanged when the writer exists
- rely on the existing `FileBlock::set_downloaded()` cleanup path to
abort the pending cache write and reset block state

## Tests

-
`BlockFileCacheTest.fs_file_cache_storage_finalize_missing_writer_returns_error`
- `DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON
ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 sh
run-be-ut.sh --clean --run
--filter=BlockFileCacheTest.fs_file_cache_storage_finalize_missing_writer_returns_error
-j 39`
- Source patch: `selectdb/selectdb-core#8321`
github-actions Bot pushed a commit that referenced this pull request May 12, 2026
…er (#62389)

## Problem

`FSFileCacheStorage::finalize()` relied on `DCHECK` to assume the writer
entry always existed in `_key_to_writer`. In release builds a missing
writer entry could still fall through, dereference `end()`, and crash
BE.

## Root Cause

When `(hash, offset)` is missing from `_key_to_writer`, the current
finalize path moves and erases through an invalid iterator. The target
branch already uses the newer `finalize(key, size)` signature, so this
PR ports the hardening change to that path.

## Solution

- return `InternalError` immediately when finalize cannot find the
writer entry
- include hash, offset, cache type, and expiration time in the error
message for diagnosis
- keep the normal finalize path unchanged when the writer exists
- rely on the existing `FileBlock::set_downloaded()` cleanup path to
abort the pending cache write and reset block state

## Tests

-
`BlockFileCacheTest.fs_file_cache_storage_finalize_missing_writer_returns_error`
- `DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON
ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 sh
run-be-ut.sh --clean --run
--filter=BlockFileCacheTest.fs_file_cache_storage_finalize_missing_writer_returns_error
-j 39`
- Source patch: `selectdb/selectdb-core#8321`
yiguolei pushed a commit that referenced this pull request May 13, 2026
…l cache writer #62389 (#63179)

Cherry-picked from #62389

Co-authored-by: zhengyu <zhangzhengyu@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants