Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: Raft snapshots required when they should not be #87553

Closed
andrewbaptist opened this issue Sep 8, 2022 · 8 comments
Closed

kv: Raft snapshots required when they should not be #87553

andrewbaptist opened this issue Sep 8, 2022 · 8 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects

Comments

@andrewbaptist
Copy link
Collaborator

andrewbaptist commented Sep 8, 2022

Describe the problem

Raft snapshots are occasionally sent in healthy systems. They should only be required in the case of failures or slowness of nodes.

To Reproduce

(Note this test is included in the PR)

  • Create a cluster with 3 nodes and the raft snapshot queue disabled.
  • Create a range on just 1 node.
  • Add a second node as a replica.
  • Have the replicate queue attempt to add the third node (since it doesn't like an even number of replicas)

Expected behavior
Most of the time this succeeds with the third node added. Occasionally it will fail.

Jira issue: CRDB-19406

@andrewbaptist andrewbaptist added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team branch-release-22.2 Used to mark release blockers and technical advisories for 22.2 labels Sep 8, 2022
@blathers-crl blathers-crl bot added this to Incoming in KV Sep 8, 2022
@andrewbaptist
Copy link
Collaborator Author

Short gist that covers this issue: https://gist.github.com/b4dc8e7303c6894f14bfc462e0a6169c

@blathers-crl
Copy link

blathers-crl bot commented Sep 8, 2022

cc @cockroachdb/replication

@andrewbaptist andrewbaptist removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-22.2 Used to mark release blockers and technical advisories for 22.2 labels Sep 8, 2022
@andrewbaptist andrewbaptist self-assigned this Sep 12, 2022
@nvanbenschoten
Copy link
Member

@blathers-crl
Copy link

blathers-crl bot commented Sep 19, 2022

cc @cockroachdb/replication

@andrewbaptist
Copy link
Collaborator Author

Test that will pass once this is addressed. It could be merged with disabled, but probably makes more sense to include as part of the fix to this issue.

#87554

andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Jan 24, 2023
Add testing for snapshot priorities. The test is currently disabled due
to issue cockroachdb#87553, however it is useful to run this test manually for now
and once that issue is fixed this test will make sure that we don't have
regressions related to snapshot priorities.

Release note: None
Epic: none
tbg added a commit to tbg/cockroach that referenced this issue Mar 21, 2023
Like many other tests, this test could flake because we'd sometimes
catch a "cannot remove learner while snapshot is in flight" error.

I think the root cause is that sometimes there are errant Raft snapshots
in the system[^1] and these get mistaken for LEARNERs that are still
being caught up by the replicate queue. I tried to address this general
class of issues by making the check for in-flight learner snapshots not
care about *raft* snapshots.

I was able to stress TestReplicaTombstone for 30+ minutes without a
failure using that approach, whereas previously it usually failed within
a few minutes.

```
./dev test --stress pkg/kv/kvserver/ --filter TestReplicaTombstone 2>&1 | tee stress.log
[...]
2461 runs so far, 0 failures, over 35m45s
```

[^1]: cockroachdb#87553

Fixes cockroachdb#98883.

Epic: none
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Mar 22, 2023
Like many other tests, this test could flake because we'd sometimes
catch a "cannot remove learner while snapshot is in flight" error.

I think the root cause is that sometimes there are errant Raft snapshots
in the system[^1] and these get mistaken for LEARNERs that are still
being caught up by the replicate queue. I tried to address this general
class of issues by making the check for in-flight learner snapshots not
care about *raft* snapshots.

I was able to stress TestReplicaTombstone for 30+ minutes without a
failure using that approach, whereas previously it usually failed within
a few minutes.

```
./dev test --stress pkg/kv/kvserver/ --filter TestReplicaTombstone 2>&1 | tee stress.log
[...]
2461 runs so far, 0 failures, over 35m45s
```

[^1]: cockroachdb#87553

Fixes cockroachdb#98883.

Epic: none
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Mar 22, 2023
Like many other tests, this test could flake because we'd sometimes
catch a "cannot remove learner while snapshot is in flight" error.

I think the root cause is that sometimes there are errant Raft snapshots
in the system[^1] and these get mistaken for LEARNERs that are still
being caught up by the replicate queue. I tried to address this general
class of issues by making the check for in-flight learner snapshots not
care about *raft* snapshots.

I was able to stress TestReplicaTombstone for 30+ minutes without a
failure using that approach, whereas previously it usually failed within
a few minutes.

```
./dev test --stress pkg/kv/kvserver/ --filter TestReplicaTombstone 2>&1 | tee stress.log
[...]
2461 runs so far, 0 failures, over 35m45s
```

[^1]: cockroachdb#87553

Fixes cockroachdb#98883.

Epic: none
Release note: None
craig bot pushed a commit that referenced this issue Mar 22, 2023
98741: ci: update bazel builder image r=rickystewart a=cockroach-teamcity

Release note: None
Epic: None


98878: backupccl: fix occassional TestRestoreErrorPropagates flake r=stevendanna a=adityamaru

Very rarely under stress race another automatic job would race with the restore and increment the error count. This would result in the count being greater than our expected value of 1. This disables all the automatic jobs eliminating the chance of this race.

Fixes: #98037

Release note: None

99099: kvserver: deflake TestReplicaTombstone r=andrewbaptist a=tbg

Like many other tests, this test could flake because we'd sometimes
catch a "cannot remove learner while snapshot is in flight" error.

I think the root cause is that sometimes there are errant Raft snapshots
in the system[^1] and these get mistaken for LEARNERs that are still
being caught up by the replicate queue. I tried to address this general
class of issues by making the check for in-flight learner snapshots not
care about *raft* snapshots.

I was able to stress TestReplicaTombstone for 30+ minutes without a
failure using that approach, whereas previously it usually failed within
a few minutes.

```
./dev test --stress pkg/kv/kvserver/ --filter TestReplicaTombstone 2>&1 | tee stress.log
[...]
2461 runs so far, 0 failures, over 35m45s
```

[^1]: #87553

Fixes #98883.

Epic: none
Release note: None


99126: kv: return error on locking request in LeafTxn r=nvanbenschoten a=miraradeva

Previously, as noted in #94290, it was possible for a LeafTxn to issue locking requests as part of SELECT FOR UPDATE. This behavior was unexpected and the RootTxn wasn't properly cleaning up the locks, resulting in others waiting for those locks to be released. The issue was resolved, in #94399, by ensuring non-default locking strength transactions don't use the streamer API and always run as RootTxn.

This patch adds an assertion on the kv side to prevent other existing or future attempts of LeafTxn issuing locking requests. We don't expect that there are such existing cases, so we don't expect this assertion to fail, but will keep an eye on the nightly tests to make sure.

Fixes: #97817
Release note: None

99150: backupccl: stop logging unsanitized backup stmt in schedule executor r=stevendanna a=msbutler

Informs #99145

Release note: None

Co-authored-by: cockroach-teamcity <teamcity@cockroachlabs.com>
Co-authored-by: adityamaru <adityamaru@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Mira Radeva <mira@cockroachlabs.com>
Co-authored-by: Michael Butler <butler@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this issue Apr 25, 2023
Like many other tests, this test could flake because we'd sometimes
catch a "cannot remove learner while snapshot is in flight" error.

I think the root cause is that sometimes there are errant Raft snapshots
in the system[^1] and these get mistaken for LEARNERs that are still
being caught up by the replicate queue. I tried to address this general
class of issues by making the check for in-flight learner snapshots not
care about *raft* snapshots.

I was able to stress TestReplicaTombstone for 30+ minutes without a
failure using that approach, whereas previously it usually failed within
a few minutes.

```
./dev test --stress pkg/kv/kvserver/ --filter TestReplicaTombstone 2>&1 | tee stress.log
[...]
2461 runs so far, 0 failures, over 35m45s
```

[^1]: #87553

Fixes #98883.

Epic: none
Release note: None
tbg added a commit to tbg/raft that referenced this issue Jul 14, 2023
Prior to this commit, the leader would not take into account snapshots reported
by a follower unless they matched or exceeded the tracked PendingSnapshot index
(which is the leader's last index at the time of requesting the snapshot). This
is too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR makes that change.

In doing so, it addresses long-standing problems that we've encountered in
CockroachDB. Unless you create the snapshot immediately and locally when raft
emits an MsgSnap, it's difficult/impossible to later synthesize a snapshot at
the requested index. It is possible to get one above the requested index which
raft always accepted, but CockroachDB delegates snapshots to followers who
might be behind on applying the log, and it is awkward to have wait for log
application to send the snapshot just to satisfy an overly strict condition in
raft. Additionally, CockroachDB also sends snapshots preemptively when adding a
new replica since there are qualitative differences between an initial snapshot
and one needed to reconnect to the log and one does not want to wait for raft
to round-trip to the follower to realize that a snapshot is needed. In this
case, the sent snapshot is commonly behind the PendingSnapshot since the leader
transitions the follower into StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
@tbg
Copy link
Member

tbg commented Jul 25, 2023

I think this is mostly done as of #106793 (the test is merged in that PR), just need #106813.

@andrewbaptist
Copy link
Collaborator Author

I agree - Closing since the test for TestAddVotersWithoutRaftQueue is merged.

KV automation moved this from Incoming to Closed Jul 25, 2023
erikgrinaker pushed a commit to erikgrinaker/raft that referenced this issue Nov 17, 2023
Prior to this commit, the leader would not take into account snapshots reported
by a follower unless they matched or exceeded the tracked PendingSnapshot index
(which is the leader's last index at the time of requesting the snapshot). This
is too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR makes that change.

In doing so, it addresses long-standing problems that we've encountered in
CockroachDB. Unless you create the snapshot immediately and locally when raft
emits an MsgSnap, it's difficult/impossible to later synthesize a snapshot at
the requested index. It is possible to get one above the requested index which
raft always accepted, but CockroachDB delegates snapshots to followers who
might be behind on applying the log, and it is awkward to have wait for log
application to send the snapshot just to satisfy an overly strict condition in
raft. Additionally, CockroachDB also sends snapshots preemptively when adding a
new replica since there are qualitative differences between an initial snapshot
and one needed to reconnect to the log and one does not want to wait for raft
to round-trip to the follower to realize that a snapshot is needed. In this
case, the sent snapshot is commonly behind the PendingSnapshot since the leader
transitions the follower into StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker pushed a commit to erikgrinaker/raft that referenced this issue Nov 17, 2023
Prior to this commit, the leader would not take into account snapshots
reported by a follower unless they matched or exceeded the tracked
PendingSnapshot index (which is the leader's last index at the time of
requesting the snapshot). This is too inflexible: the leader should take
into account any snapshot that reconnects the follower to its log. This
PR adds a config option ResumeReplicateBelowPendingSnapshot that enables
this behavior.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker added a commit to erikgrinaker/raft that referenced this issue Nov 17, 2023
A leader will not take into account snapshots reported by a follower
unless they match or exceed the tracked PendingSnapshot index (which is
the leader's last index at the time of requesting the snapshot). This is
too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR adds a config option
ResumeReplicateBelowPendingSnapshot that enables this behavior.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker added a commit to erikgrinaker/raft that referenced this issue Nov 17, 2023
A leader will not take into account snapshots reported by a follower
unless they match or exceed the tracked PendingSnapshot index (which is
the leader's last indexat the time of requesting the snapshot). This is
too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR adds a config option
ResumeReplicateBelowPendingSnapshot that enables this behavior.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker added a commit to erikgrinaker/raft that referenced this issue Nov 17, 2023
A leader will not take into account snapshots reported by a follower
unless they match or exceed the tracked PendingSnapshot index (which is
the leader's last indexat the time of requesting the snapshot). This is
too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR adds a config option
ResumeReplicateBelowPendingSnapshot that enables this behavior.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker added a commit to erikgrinaker/raft that referenced this issue Jan 4, 2024
A leader will not take into account snapshots reported by a follower
unless they match or exceed the tracked PendingSnapshot index (which is
the leader's last indexat the time of requesting the snapshot). This is
too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR adds a config option
ResumeReplicateBelowPendingSnapshot that enables this behavior.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker added a commit to erikgrinaker/raft that referenced this issue Jan 4, 2024
A leader will not take into account snapshots reported by a follower
unless they match or exceed the tracked PendingSnapshot index (which is
the leader's last indexat the time of requesting the snapshot). This is
too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR adds a config option
ResumeReplicateBelowPendingSnapshot that enables this behavior.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker added a commit to erikgrinaker/raft that referenced this issue Jan 5, 2024
A leader will not take into account snapshots reported by a follower
unless they match or exceed the tracked PendingSnapshot index (which is
the leader's last indexat the time of requesting the snapshot). This is
too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR adds a config option
ResumeReplicateBelowPendingSnapshot that enables this behavior.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker added a commit to erikgrinaker/raft that referenced this issue Jan 9, 2024
A leader will not take into account snapshots reported by a follower
unless they match or exceed the tracked PendingSnapshot index (which is
the leader's last indexat the time of requesting the snapshot). This is
too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR makes that change.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
erikgrinaker added a commit to erikgrinaker/raft that referenced this issue Jan 9, 2024
A leader will not take into account snapshots reported by a follower
unless they match or exceed the tracked PendingSnapshot index (which is
the leader's last indexat the time of requesting the snapshot). This is
too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR makes that change.

In doing so, it addresses long-standing problems that we've encountered
in CockroachDB. Unless you create the snapshot immediately and locally
when raft emits an MsgSnap, it's difficult/impossible to later
synthesize a snapshot at the requested index. It is possible to get one
above the requested index which raft always accepted, but CockroachDB
delegates snapshots to followers who might be behind on applying the
log, and it is awkward to have to wait for log application to send the
snapshot just to satisfy an overly strict condition in raft.
Additionally, CockroachDB also sends snapshots preemptively when adding
a new replica since there are qualitative differences between an initial
snapshot and one needed to reconnect to the log and one does not want to
wait for raft to round-trip to the follower to realize that a snapshot
is needed. In this case, the sent snapshot is commonly behind the
PendingSnapshot since the leader transitions the follower into
StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
Touches cockroachdb/cockroach#114349.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 12, 2024
Add testing for snapshot priorities. The test is currently disabled due
to issue cockroachdb#87553, however it is useful to run this test manually for now
and once that issue is fixed this test will make sure that we don't have
regressions related to snapshot priorities.

Release note: None
Epic: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
KV
Closed
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants