Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: clear RHS state machine when moving past a split using a snapshot #73462

Open
2 tasks
sumeerbhola opened this issue Dec 4, 2021 · 0 comments
Open
2 tasks
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication KV Replication Team

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Dec 4, 2021

(discussion in https://cockroachlabs.slack.com/archives/C02KHQMF2US/p1638412757069600)

Consider a range split where this node is lagging and has not yet applied the split, and then receives a post-split snapshot for the LHS.

  • The RHS got rebalanced away from the node. We did not delete any state machine state since the split has not happened yet. But we did write a RangeTombstone (to be used in splitPreApply later in the rightRepl == nil code path here
    if rightRepl == nil || rightRepl.isNewerThanSplit(&split) {
    // We're in the rare case where we know that the RHS has been removed
    // and re-added with a higher replica ID (and then maybe removed again).
    //
    ).
  • But this node never executes the split (so splitPreApply does not execute) and instead receives and applies a post-split snapshot for this range in order to catchup. This post-split snapshot only contains the LHS state.
    • The multiSSTWriter used to clear the existing range state uses the RangeDescriptor in the snapshot, which is only the LHS
      // At the moment we'll write at most five SSTs.
      // TODO(jeffreyxiao): Re-evaluate as the default range size grows.
      keyRanges := rditer.MakeReplicatedKeyRanges(header.State.Desc)
      msstw, err := newMultiSSTWriter(ctx, kvSS.scratch, keyRanges, kvSS.sstChunkSize)
    • So the RHS range local and range global keys would still exist after applying the snapshot (I am assuming the RHS is not considered a subsumed replica in applySnapshot since the span of the snapshot, which is the LHS, does not subsume the RHS).

Result: we have leaked state in the engine.

Steps:

  • Verify that there is a bug with a unit test.
  • Use the wider of the range spans of the snapshot RangeDescriptor and the existing RangeDescriptor of the range (if any), in deciding what state to clear. This is safe to do even if the RHS has not been rebalanced away since the RHS must be uninitialized.

cc: @tbg

Jira issue: CRDB-11599
Epic: CRDB-220

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Dec 4, 2021
@erikgrinaker erikgrinaker added this to Incoming in Replication via automation Feb 25, 2022
@tbg tbg moved this from Incoming to 22.2 in Replication Mar 4, 2022
@lunevalex lunevalex added the T-kv-replication KV Replication Team label Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication KV Replication Team
Projects
None yet
Development

No branches or pull requests

3 participants