Skip to content

release-26.2: kvserver: stop treating split/merge trigger errors as replica corruption#167377

Merged
trunk-io[bot] merged 1 commit intocockroachdb:release-26.2from
tbg:blathers/backport-release-26.2-167289
Apr 7, 2026
Merged

release-26.2: kvserver: stop treating split/merge trigger errors as replica corruption#167377
trunk-io[bot] merged 1 commit intocockroachdb:release-26.2from
tbg:blathers/backport-release-26.2-167289

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Apr 2, 2026

Backport 1/1 commits from #167289 on behalf of @tbg.


Previously, maybeWrapReplicaCorruptionError in RunCommitTrigger
escalated any unrecognized error from split/merge trigger evaluation to
a ReplicaCorruptionError, which crashes the process via
setCorruptRaftMuLocked. This meant that transient I/O errors (e.g.
cloud storage network timeouts during MVCCIsSpanEmpty) would fatal
the node despite not indicating actual data corruption.

Remove the corruption wrapping so that these errors simply fail the
split or merge, which will be retried.

This is a minimal fix suitable for backporting. Follow-up work can
remove the now-no-op maybeWrapReplicaCorruptionError wrapper entirely.

Fixes #165558
Epic: CRDB-61447

Release note (bug fix): Fixed a bug where transient I/O errors (such
as cloud storage network timeouts) during split or merge trigger
evaluation were misidentified as replica corruption, causing the node
to crash. These errors now correctly fail the operation, which is
retried automatically.


Release justification: Bug fix that prevents spurious replica corruption errors on split/merge trigger failures.

Previously, `maybeWrapReplicaCorruptionError` in `RunCommitTrigger`
escalated any unrecognized error from split/merge trigger evaluation to
a `ReplicaCorruptionError`, which crashes the process via
`setCorruptRaftMuLocked`. This meant that transient I/O errors (e.g.
cloud storage network timeouts during `MVCCIsSpanEmpty`) would fatal
the node despite not indicating actual data corruption.

Remove the corruption wrapping so that these errors simply fail the
split or merge, which will be retried.

Informs: cockroachdb#165558
Epic: CRDB-61447

Release note (bug fix): Fixed a bug where transient I/O errors (such
as cloud storage network timeouts) during split or merge trigger
evaluation were misidentified as replica corruption, causing the node
to crash. These errors now correctly fail the operation, which is
retried automatically.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@tbg tbg requested a review from a team as a code owner April 2, 2026 06:32
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Apr 2, 2026
@blathers-crl blathers-crl bot requested review from dt and stevendanna April 2, 2026 06:32
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 2, 2026

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-kv KV Team labels Apr 2, 2026
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 3, 2026

/trunk merge

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io bot commented Apr 3, 2026

😎 Merged successfully - details.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 7, 2026

/trunk merge

@trunk-io trunk-io bot merged commit 5c36d52 into cockroachdb:release-26.2 Apr 7, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. T-kv KV Team v26.2.0-prerelease

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants