New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: re-enqueue Raft groups on paginated application #31568

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@nvanbenschoten
Member

nvanbenschoten commented Oct 18, 2018

Fixes #31330.

This change re-enqueues Raft groups for processing immediately if they
still have more to do after a Raft ready iteration. This comes up in
practice when a Range has sufficient load to force Raft application
pagination. See #31330 for a discussion on the symptoms this can
cause.

Release note (bug fix): Fix bug where Raft followers could fall behind
leaders will entry application, causing stalls during splits.

@nvanbenschoten nvanbenschoten requested review from bdarnell and petermattis Oct 18, 2018

@nvanbenschoten nvanbenschoten requested a review from cockroachdb/core-prs as a code owner Oct 18, 2018

@cockroach-teamcity

This comment has been minimized.

Show comment
Hide comment
@cockroach-teamcity

cockroach-teamcity Oct 18, 2018

Member

This change is Reviewable

Member

cockroach-teamcity commented Oct 18, 2018

This change is Reviewable

@petermattis

:lgtm:

Have you stressed this new test? Mucking with proposal application like that seems fragile. I don't have a better suggestion, though.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/replica_test.go, line 9622 at r1 (raw file):

	repl := tc.repl

	// Propose a command to Raft and block its application.

Technically, you block application and then propose the command.

storage: re-enqueue Raft groups on paginated application
Fixes #31330.

This change re-enqueues Raft groups for processing immediately if they
still have more to do after a Raft ready iteration. This comes up in
practice when a Range has sufficient load to force Raft application
pagination. See #31330 for a discussion on the symptoms this can
cause.

Release note (bug fix): Fix bug where Raft followers could fall behind
leaders will entry application, causing stalls during splits.
@nvanbenschoten

TFTR.

Have you stressed this new test?

I stressed it for 10 minutes without issue.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/replica_test.go, line 9622 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Technically, you block application and then propose the command.

Done.

@tschottdorf

:lgtm:

Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)


pkg/storage/replica_test.go, line 9586 at r2 (raw file):

// TestApplyPaginatedCommittedEntries tests that a Raft group's committed
// entries are quickly applied, even if their application is paginated due to
// the RaftMaxSizePerMsg configuration. This is a regession test for #31330.

nit: regession


pkg/storage/replica_test.go, line 9659 at r2 (raw file):

	// small RaftMaxSizePerMsg.
	close(blockRaftApplication)
	const maxWait = 5 * time.Second

5s might be too aggressive to apply 50 commands when stressed sufficiently. Perhaps bump this a bit. Or we'll just leave it and find out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment