Skip to content

Comments

staging-v25.2.13: release-25.2: keys: handle case where keys targeted by GC request straddle header#163773

Merged
rail merged 3 commits intocockroachdb:staging-v25.2.13from
rail:backportstaging-v25.2.13-163450
Feb 17, 2026
Merged

staging-v25.2.13: release-25.2: keys: handle case where keys targeted by GC request straddle header#163773
rail merged 3 commits intocockroachdb:staging-v25.2.13from
rail:backportstaging-v25.2.13-163450

Conversation

@rail
Copy link
Member

@rail rail commented Feb 17, 2026

Backport 3/3 commits from #163450.

/cc @cockroachdb/release


Backport 3/3 commits from #162271 (on behalf of @arulajmani).

/cc @cockroachdb/kv


Individual commits:

  1. keys: add TestRangeBatchWithGCRequests — adds test for GC requests where targeted keys don't fall within request header bounds.
  2. keys: handle case where keys targeted by GC request straddle header — adds special case handling to return RangeKeyMismatchError for erroneous GC requests instead of silently proceeding.
  3. kvserver: account for LocalRangeID keys extendRangeForGCRequest — handles LocalRangeID keys that can't be addressed in the global key map.

Release justification: bug fix for potential data loss via incorrect GC requests.


Release note: Fixes a rare bug where a racing split and GC request could result in the GC of data on the post-split RHS. This could, in rare cases, lead to lost writes on the RHS.

Made with Cursor

Release justification:

As we've seen in cockroachdb#162085, it's possible for the MVCC GC queue to
construct GC requests where the keys being targeted do not fall within
the request header bounds. While we should never be constructing such
requests, we'll introduce some defence in depth checks in a later
commit. This test will evolve as we do that.

Epic: none

Release note: None
As we saw in cockroachdb#162085, it is possible for the MVCC GC queue to target
keys outside of the request headers bound. To guard against this bug,
we add special case handling when determining the span touched by a
request. This should result in a RangeKeyMismatchError when such
erroneous requests are constructed by the GC queue, as opposed to
letting requests that may silently lead to data loss through.

Epic: none

Release note: None
The need for doing so was evidenced by the failure of
TestMVCCGCQueueTransactionTable. These are tricky, as they can't
be addressed in the global key map. By extension, they can't be
checked against a replica's key bounds to see if we've routed them
to the right place or not. Trying to address them returns an error,
so we explicitly need to skip them in extendRangeForGCRequest. Instead,
they need to be handled during GC command evaluation time. We already
do this for point keys; we don't for clear ranges, and I think that's
a bug we need to first prove and then fix -- see the TODO as part of
this commit.

Epic: none

Release note: None
@rail rail requested a review from a team as a code owner February 17, 2026 19:53
@blathers-crl
Copy link

blathers-crl bot commented Feb 17, 2026

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes OR fixes for serious issues. Non-production includes test-only changes, build system changes, etc. Serious issues are defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to. Reference the approved ENGREQ ticket in the PR body (e.g., "Fixes ENGREQ-123").

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-code-systems labels Feb 17, 2026
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl
Copy link

blathers-crl bot commented Feb 17, 2026

✅ PR #163773 is compliant with backport policy

Confidence: high
Critical bug criteria met: [Data corruption/loss]
Backward compatible: true
Explanation: The pull request addresses a critical bug related to data loss. The change aims to fix issues with GC requests that could potentially result in the incorrect garbage collection of data. This is specifically noted in the PR body as a bug fix for potential data loss, aligning with the critical bug criteria which mentions data corruption/loss as a critical concern. The changes include modifications in production code to handle erroneous conditions and testing updates to confirm behavior. Given that the PR resolves a critical issue and is part of necessary bug fixes, it is exempt from the feature flag requirement. The PR text includes a 'Release justification: bug fix for potential data loss via incorrect GC requests', providing explicit reasoning and aligning with backport policy exceptions. Additionally, no backward incompatibilities are introduced.

ENGREQ Check Failed: Please check one of the backport category checkboxes.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@rail rail merged commit 0702327 into cockroachdb:staging-v25.2.13 Feb 17, 2026
19 of 20 checks passed
@rail rail deleted the backportstaging-v25.2.13-163450 branch February 17, 2026 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches T-code-systems v25.2.13

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants