Skip to content

branch-4.1: [fix](cloud) fix segment footer CORRUPTION not triggering file cache retry #61386#61426

Merged
yiguolei merged 1 commit intobranch-4.1from
auto-pick-61386-branch-4.1
Mar 18, 2026
Merged

branch-4.1: [fix](cloud) fix segment footer CORRUPTION not triggering file cache retry #61386#61426
yiguolei merged 1 commit intobranch-4.1from
auto-pick-61386-branch-4.1

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #61386

…retry (#61386)

## Proposed changes

The three-tier retry logic in `Segment::_open()` (static method) was
structured as `if-else-if`, so when `open_file()` succeeded but
`_parse_footer()` returned `CORRUPTION`, the retry branch was
unreachable.

### Root cause

```cpp
// Before: if-else-if structure
auto st = fs->open_file(path, &file_reader, &reader_options);
if (st) {                                  // open_file succeeded (almost always)
    segment->_file_reader = ...;
    st = segment->_open(stats);            // _parse_footer() → CORRUPTION stored in st
} else if (st.is<CORRUPTION>() && ...) {   // UNREACHABLE: already entered `if` branch
    // Tier 1: clear cache, retry
    // Tier 2: bypass cache, read remote directly
}
RETURN_IF_ERROR(st);  // CORRUPTION returned without any retry
```

`open_file()` only opens a file handle and rarely returns `CORRUPTION`.
The actual footer checksum validation happens inside `_parse_footer()`
(called via `segment->_open()`). Because the retry was in an `else if`
guarded by the same `st` from `open_file()`, it was never reachable for
the common `_parse_footer()` corruption case.

### Fix

Change `else if` to a separate `if` block, so CORRUPTION from either
`open_file()` or `_parse_footer()` triggers the three-tier retry:

```cpp
// After: independent if blocks
if (st) {
    segment->_file_reader = ...;
    st = segment->_open(stats);   // _parse_footer() → CORRUPTION stored in st
}
// NOW reachable regardless of where CORRUPTION came from
if (st.is<CORRUPTION>() && reader_options.cache_type == FILE_BLOCK_CACHE) {
    // Tier 1: clear file cache, re-download from remote
    // Tier 2: bypass cache entirely, read remote directly
    // Tier 3: remote source itself corrupt (logs warning)
}
```

### Issue

Observed in cloud (S3) deployments: schema change fails with
`CORRUPTION: Bad segment file footer checksum not match`. Log analysis
confirmed that no retry log messages were ever emitted, consistent with
this code bug.

## Tests

- Added `TestFooterCorruptionTriggersRetry` to
`segment_corruption_test.cpp`
- Uses the existing `Segment::parse_footer:magic_number_corruption` sync
point to corrupt the footer magic number on the first `_parse_footer()`
call only (simulating file cache corruption)
- Verifies the segment opens successfully via the retry path
@github-actions github-actions bot requested a review from yiguolei as a code owner March 17, 2026 07:15
@hello-stephen
Copy link
Contributor

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.88% (19290/36478)
Line Coverage 36.20% (180144/497590)
Region Coverage 32.70% (139262/425863)
Branch Coverage 33.75% (60611/179614)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.49% (25532/35713)
Line Coverage 54.22% (269292/496706)
Region Coverage 51.66% (222197/430138)
Branch Coverage 53.20% (95895/180259)

@yiguolei yiguolei closed this Mar 18, 2026
@yiguolei yiguolei reopened this Mar 18, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 18, 2026
@github-actions
Copy link
Contributor Author

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor Author

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 71ecff2 into branch-4.1 Mar 18, 2026
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants