Skip to content

release-25.4: kvserver: stop treating split/merge trigger errors as replica corruption#168013

Merged
trunk-io[bot] merged 1 commit intocockroachdb:release-25.4from
tbg:backport25.4-167289
Apr 10, 2026
Merged

release-25.4: kvserver: stop treating split/merge trigger errors as replica corruption#168013
trunk-io[bot] merged 1 commit intocockroachdb:release-25.4from
tbg:backport25.4-167289

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Apr 9, 2026

Backport 1/1 commits from #167289.

/cc @cockroachdb/release


Previously, maybeWrapReplicaCorruptionError in RunCommitTrigger
escalated any unrecognized error from split/merge trigger evaluation to
a ReplicaCorruptionError, which crashes the process via
setCorruptRaftMuLocked. This meant that transient I/O errors (e.g.
cloud storage network timeouts during MVCCIsSpanEmpty) would fatal
the node despite not indicating actual data corruption.

Remove the corruption wrapping so that these errors simply fail the
split or merge, which will be retried.

This is a minimal fix suitable for backporting. Follow-up work can
remove the now-no-op maybeWrapReplicaCorruptionError wrapper entirely.

Fixes-26.2: #165558
Epic: CRDB-61447

Release note (bug fix): Fixed a bug where transient I/O errors (such
as cloud storage network timeouts) during split or merge trigger
evaluation were misidentified as replica corruption, causing the node
to crash. These errors now correctly fail the operation, which is
retried automatically.

Release justification: bug fix: transient I/O errors during split/merge incorrectly crash the node

Previously, `maybeWrapReplicaCorruptionError` in `RunCommitTrigger`
escalated any unrecognized error from split/merge trigger evaluation to
a `ReplicaCorruptionError`, which crashes the process via
`setCorruptRaftMuLocked`. This meant that transient I/O errors (e.g.
cloud storage network timeouts during `MVCCIsSpanEmpty`) would fatal
the node despite not indicating actual data corruption.

Remove the corruption wrapping so that these errors simply fail the
split or merge, which will be retried.

Informs: cockroachdb#165558
Epic: CRDB-61447

Release note (bug fix): Fixed a bug where transient I/O errors (such
as cloud storage network timeouts) during split or merge trigger
evaluation were misidentified as replica corruption, causing the node
to crash. These errors now correctly fail the operation, which is
retried automatically.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@tbg tbg requested a review from a team as a code owner April 9, 2026 07:46
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Apr 9, 2026

😎 Merged successfully - details.

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Apr 9, 2026

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes OR fixes for serious issues. Non-production includes test-only changes, build system changes, etc. Serious issues are defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to. Reference the approved ENGREQ ticket in the PR body (e.g., "Fixes ENGREQ-123").

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl Bot added backport Label PR's that are backports to older release branches T-kv KV Team labels Apr 9, 2026
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg requested a review from arulajmani April 9, 2026 07:47
@trunk-io trunk-io Bot merged commit b828081 into cockroachdb:release-25.4 Apr 10, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches T-kv KV Team target-release-25.4.10

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants