Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: avoid hanging proposal after leader goes down #46045

Merged
merged 4 commits into from
Mar 17, 2020

Conversation

tbg
Copy link
Member

@tbg tbg commented Mar 12, 2020

Deflakes gossip/chaos/nodes=9, i.e.

Fixes #38829.

There was a bug in range quiescence due to which commands would hang in
raft for minutes before actually getting replicated. This would occur
whenever a range was quiesced but a follower replica which didn't know
the (Raft) leader would receive a request. This request would be
evaluated and put into the Raft proposal buffer, and a ready check would
be enqueued. However, no ready would be produced (since the proposal got
dropped by raft; leader unknown) and so the replica would not unquiesce.

This commit prevents this by always waking up the group if the proposal
buffer was initially nonempty, even if an empty Ready is produced.

It goes further than that by trying to ensure that a leader is always
known while quiesced. Previously, on an incoming request to quiesce, we
did not verify that the raft group had learned the leader's identity.

One shortcoming here is that in the situation in which the proposal
would originally hang "forever", it will now hang for one heartbeat
or election timeout where ideally it would be proposed more reactively. Since
this is so rare I didn't try to address this. Instead, refer to
the ideas in

#37906 (comment)

and

#21849

for future changes that could mitigate this.

Without this PR, the test would fail around 10% of the time. With this
change, it passed 40 iterations in a row without a hitch, via:

./bin/roachtest run -u tobias --count 40 --parallelism 10 --cpu-quota 1280 gossip/chaos/nodes=9

Release justification: bug fix
Release note (bug fix): a rare case in which requests to a quiesced
range could hang in the KV replication layer was fixed. This would
manifest as a message saying "have been waiting ... for proposing" even
though no loss of quorum occurred.

petermattis and others added 3 commits March 12, 2020 20:23
Deflake `gossip/chaos` by adding a missing
`waitForFullReplication`. This test loops, killing a node and then
verifying that the remaining nodes in the cluster stabilize on the same
view of gossip connectivity. Periodically the test was failing because
gossip wasn't stabilizing. The root issue was that the SQL query to
retrieve the gossip connectivity from one node was hanging. And that
query was hanging due to unavailability of a range. Logs show that the
leaseholder for that range was on a down node and that the range only
seemed to contain a single replica. This could happen near the start of
the test if we started killing nodes before full replication was
achieved.

Fixes cockroachdb#38829

Release note: None
Release justification: testing change
Release note: None
Release justification: comment-only change
Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for tracking this down! Might want to get another set of #kv eyes on this in order to spread the knowledge.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: nice find! How far back do we need to backport this?

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 3 of 3 files at r4.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvserver/replica_raft.go, line 415 at r4 (raw file):

	err := r.withRaftGroupLocked(true, func(raftGroup *raft.RawNode) (bool, error) {
		numFlushed := r.mu.proposalBuf.Len()
		if err := r.mu.proposalBuf.FlushLockedWithRaftGroup(raftGroup); err != nil {

I would lean towards returning the number of commands that were flushed from this function (i.e. used in FlushLockedWithRaftGroup). That avoids extra atomic accesses that seem a little racy, though I can't construct a scenario where anything actually goes wrong.


pkg/kv/kvserver/replica_raft.go, line 415 at r4 (raw file):

	err := r.withRaftGroupLocked(true, func(raftGroup *raft.RawNode) (bool, error) {
		numFlushed := r.mu.proposalBuf.Len()
		if err := r.mu.proposalBuf.FlushLockedWithRaftGroup(raftGroup); err != nil {

An alternative approach is to propagate raft.ErrProposalDropped errors up to here and unquiesce in that case as well. I'll defer to your preference.


pkg/kv/kvserver/store_raft.go, line 230 at r4 (raw file):

		// quiescing if there's outstanding work.
		r.mu.Lock()
		status := r.raftStatusRLocked()

Would the BasicStatus() do?

@tbg tbg force-pushed the gossip-chaos branch 2 times, most recently from 903f1e5 to 71e5ab5 Compare March 16, 2020 21:01
There was a bug in range quiescence due to which commands would hang in
raft for minutes before actually getting replicated. This would occur
whenever a range was quiesced but a follower replica which didn't know
the (Raft) leader would receive a request.  This request would be
evaluated and put into the Raft proposal buffer, and a ready check would
be enqueued. However, no ready would be produced (since the proposal got
dropped by raft; leader unknown) and so the replica would not unquiesce.

This commit prevents this by always waking up the group if the proposal
buffer was initially nonempty, even if an empty Ready is produced.

It goes further than that by trying to ensure that a leader is always
known while quiesced. Previously, on an incoming request to quiesce, we
did not verify that the raft group had learned the leader's identity.

One shortcoming here is that in the situation in which the proposal
would originally hang "forever", it will now hang for one heartbeat
timeout where ideally it would be proposed more reactively. Since
this is so rare I didn't try to address this. Instead, refer to
the ideas in

cockroachdb#37906 (comment)

and

cockroachdb#21849

for future changes that could mitigate this.

Without this PR, the test would fail around 10% of the time. With this
change, it passed 40 iterations in a row without a hitch, via:

    ./bin/roachtest run -u tobias --count 40 --parallelism 10 --cpu-quota 1280 gossip/chaos/nodes=9

Release justification: bug fix
Release note (bug fix): a rare case in which requests to a quiesced
range could hang in the KV replication layer was fixed. This would
manifest as a message saying "have been waiting ... for proposing" even
though no loss of quorum occurred.
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r=nvanbenschoten,petermattis

How far back do we need to backport this?

I don't see why this wasn't always a problem (though the roachtest didn't always fail...)
I'll definitely backport to 19.2 and there should be a similar simplified patch we can make to 19.1.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @nvanbenschoten and @petermattis)


pkg/kv/kvserver/replica_raft.go, line 415 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

An alternative approach is to propagate raft.ErrProposalDropped errors up to here and unquiesce in that case as well. I'll defer to your preference.

I'll leave as is.

@tbg
Copy link
Member Author

tbg commented Mar 17, 2020

bors pls

bors r=nvanbenschoten,petermattis

@craig
Copy link
Contributor

craig bot commented Mar 17, 2020

Build succeeded

@craig craig bot merged commit 021781a into cockroachdb:master Mar 17, 2020
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)

@tbg tbg deleted the gossip-chaos branch April 6, 2020 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: gossip/chaos/nodes=9 failed
4 participants