Skip to content

staging-v24.3.33: release-24.3: kvserver: stop wrapping AbortSpan errors as ReplicaCorruptionError#170688

Merged
rail merged 1 commit into
cockroachdb:staging-v24.3.33from
rail:backportstaging-v24.3.33-168856
May 20, 2026
Merged

staging-v24.3.33: release-24.3: kvserver: stop wrapping AbortSpan errors as ReplicaCorruptionError#170688
rail merged 1 commit into
cockroachdb:staging-v24.3.33from
rail:backportstaging-v24.3.33-168856

Conversation

@rail
Copy link
Copy Markdown
Member

@rail rail commented May 20, 2026

Backport 1/1 commits from #168856.

/cc @cockroachdb/release


Backport 1/2 commits from #167295.

/cc @cockroachdb/release


AbortSpan read errors were wrapped as ReplicaCorruptionError, causing the
node to fatal via setCorruptRaftMuLocked. This was overly aggressive: a
failure to read from the AbortSpan is not indicative of replica corruption.
Transient I/O errors would crash the node instead of being returned to the
caller.

This is the last remaining production call site that produces
ReplicaCorruptionError (the split/merge trigger wrapping was removed in
#167289).

Informs: #165558
Epic: none

Release justification: Low-risk bug fix. One-line change that stops wrapping
a non-corruption error as ReplicaCorruptionError, preventing unnecessary
node crashes on transient I/O errors during AbortSpan reads.

Release justification:

This is the last remaining production call site that wraps errors as
`ReplicaCorruptionError`. A failure to read from the AbortSpan is not
indicative of replica corruption and should not crash the node.

This is a minimal fix suitable for backporting. A follow-up commit
removes the now-dead `ReplicaCorruptionError` infrastructure entirely.

Informs: cockroachdb#165558
Release note (bug fix): Fixed a bug where transient I/O errors reading
from the AbortSpan were misidentified as replica corruption, causing
the node to crash. These errors are now returned to the caller as
regular errors.
Epic: none

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rail rail requested a review from a team as a code owner May 20, 2026 21:56
@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 20, 2026

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes OR fixes for serious issues. Non-production includes test-only changes, build system changes, etc. Serious issues are defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to. Reference the approved ENGREQ ticket in the PR body (e.g., "Fixes ENGREQ-123").

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl Bot added backport Label PR's that are backports to older release branches T-kv KV Team labels May 20, 2026
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@rail rail merged commit 4553626 into cockroachdb:staging-v24.3.33 May 20, 2026
18 checks passed
@rail rail deleted the backportstaging-v24.3.33-168856 branch May 20, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches T-kv KV Team target-release-24.3.33

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants