Fix 166091 gate range tombstone gc#168275
Open
thecitymouse wants to merge 2 commits intocockroachdb:masterfrom
Open
Fix 166091 gate range tombstone gc#168275thecitymouse wants to merge 2 commits intocockroachdb:masterfrom
thecitymouse wants to merge 2 commits intocockroachdb:masterfrom
Conversation
… failures processReplicatedRangeTombstones assumes point-key GC fully succeeded. When a clear-range request fails (error swallowed at gc.go:978), that invariant is violated and the range tombstone phase trips the 'hiding key' assertion. Skip range tombstone GC when info.ClearRangeSpanFailures > 0. Next GC cycle retries with a fresh snapshot. Fixes cockroachdb#166091. Release note (bug fix): Fixed an assertion failure in the MVCC GC queue under concurrent writes. Signed-off-by: Kevin Leung <leungke@oregonstate.edu>
Reproduces the snapshot-vs-live-engine race from cockroachdb#166091 where a concurrent above-threshold write causes MVCCGarbageCollectPointsWithClearRange to fail. Signed-off-by: Kevin Leung <leungke@oregonstate.edu>
Contributor
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #166091.
gc.Rundoes point-key GC then range-tombstone GC. If point-key GChits a clear-range error (concurrent write above threshold lands after
the planner snapshot), the error gets swallowed at gc.go:978 and
ClearRangeSpanFailures bumps. Range-tombstone GC runs anyway, assumes
point keys got cleaned up, and blows up with the "hiding key" assertion.
Fix: check ClearRangeSpanFailures > 0 before running range-tombstone GC.
Skip it for this cycle, let the next pass retry with a fresh snapshot.
Txn record and abort span GC still run.
This doesn't fix the underlying snapshot/live-engine divergence in the
clear-range path — just prevents the cascade. Happy to take a different
direction if the team prefers a deeper fix here.
Testing
snapshot/live-engine race.
on master,0/40 with fix (gate fired once, race triggered,
no hiding-key error).
Note: the unit test covers the clear-range failure path; an end-to-end test through gc.Run asserting the skip would be stronger, happy to add if you guys want it.