-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: drop raftentry.Cache data in applySnapshot #37055
storage: drop raftentry.Cache data in applySnapshot #37055
Conversation
This PR could benefit from a storage test |
a6fa12e
to
e917cc3
Compare
|
083666b
to
7c7d0a9
Compare
5546e70
to
9a6b4fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod two nits. Thanks for getting this up so quickly.
Reviewed 2 of 2 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
pkg/storage/replica_raftstorage.go, line 861 at r2 (raw file):
// Clear the cached raft log entries to ensure that old or uncommitted // entries don't impact the in-memory state. r.store.raftEntryCache.Drop(r.RangeID)
You might consider adding a TODO to the method to use it in more places.
pkg/storage/raftentry/cache_test.go, line 456 at r1 (raw file):
var wg sync.WaitGroup // Test using both Clear and Drop to remove the added entries. testutils.RunTrueAndFalse(t, "dropOrClear", func(t *testing.T, useClear bool) {
nit: this isn't a great use of RunTrueAndFalse
because it will be printed as:
TestConcurrentUpdates/dropOrClear=false
TestConcurrentUpdates/dropOrClear=true
It's not really clear what either sub-test does. Maybe consider:
for _, t := range struct{
name string
clear func()
}{
{"drop", func() { c.Drop(1) }},
{"clear", func() { c.Clear(1, 22) }},
}{
...
Release note: None
Add logic to clear cached raft log entries when applying a snapshot. Release note: None
9a6b4fa
to
f8de052
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick review and for writing the test!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_raftstorage.go, line 861 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You might consider adding a TODO to the method to use it in more places.
Anywhere in particular come to mind? A quick glance at the uses of Clear didn't reveal much to me in terms of replacement outside of some tests. One place we could use it is in replica_destroy. Where would the TODO go? Here or on the method? Either one feels like pollution without something specific in mind. If the thought was just about replica gc, I can just type a PR instead of a TODO.
pkg/storage/raftentry/cache_test.go, line 456 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: this isn't a great use of
RunTrueAndFalse
because it will be printed as:TestConcurrentUpdates/dropOrClear=false TestConcurrentUpdates/dropOrClear=true
It's not really clear what either sub-test does. Maybe consider:
for _, t := range struct{ name string clear func() }{ {"drop", func() { c.Drop(1) }}, {"clear", func() { c.Clear(1, 22) }}, }{ ...
Done
@nvanbenschoten cockroach/pkg/storage/replica_destroy.go Lines 77 to 81 in 8c4ba4b
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this where you had in mind for the additional Drop call?
I was thinking one level above that, in Replica.destroyRaftMuLocked
. After the batch
is actually committed.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to make that change now though. We'll want this to be small enough to backport.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
bors r=nvanbenschoten |
37055: storage: drop raftentry.Cache data in applySnapshot r=nvanbenschoten a=ajwerner This PR adds a new `.Drop` method to the `raftentry.Cache` which will clear all data associated with a range more efficiently than calling `.Clear` with a large index. The second commit then uses this call when applying a snapshot to ensure that stale cached raft entries are never used. Fixes #37056. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
37058: storage: create new TestSnapshotAfterTruncationWithUncommittedTail test r=nvanbenschoten a=nvanbenschoten This PR adds a regression test for #37056. In doing so, it also confirms the theory that #37055 is the proper fix for that bug. Before #37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After #37055, the test passes. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
This PR adds a new
.Drop
method to theraftentry.Cache
which will clear all data associated with a range more efficiently than calling.Clear
with a large index. The second commit then uses this call when applying a snapshot to ensure that stale cached raft entries are never used.Fixes #37056.