branch-4.0: [fix](cloud) fix segment footer CORRUPTION not triggering file cache retry#61387
Closed
Hastyshell wants to merge 1 commit intoapache:branch-4.0from
Closed
Conversation
…retry The three-tier retry logic in Segment::_open() was structured as if-else-if, so when open_file() succeeded but _parse_footer() returned CORRUPTION, the retry branch was unreachable. Change to independent if blocks so that CORRUPTION from _parse_footer() also triggers cache eviction and retry. Add TestFooterCorruptionTriggersRetry unit test to cover this path.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Backport of #61386 to
branch-4.0.The three-tier retry logic in
Segment::_open()(static method) was structured asif-else-if, so whenopen_file()succeeded but_parse_footer()returnedCORRUPTION, the retry branch was unreachable.Root cause
open_file()only opens a file handle and rarely returnsCORRUPTION. The actual footer checksum validation happens inside_parse_footer()(called viasegment->_open()). Because the retry was in anelse ifguarded by the samestfromopen_file(), it was never reachable for the common_parse_footer()corruption case.Fix
Change
else ifto a separateifblock, so CORRUPTION from eitheropen_file()or_parse_footer()triggers the three-tier retry.Issue
Observed in cloud (S3) deployments (
branch-4.0): schema change fails withCORRUPTION: Bad segment file footer checksum not match. Log analysis confirmed that no retry log messages were ever emitted, consistent with this code bug.Tests
TestFooterCorruptionTriggersRetrytosegment_corruption_test.cppSegment::parse_footer:magic_number_corruptionsync point to corrupt the footer magic number on the first_parse_footer()call onlyChecklist
_parse_footer()was never retried; now it correctly follows the three-tier retry