Skip to content

raft: scaffolding for store liveness and leader leases#123789

Draft
nvb wants to merge 30 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/leaderLeasePrototype
Draft

raft: scaffolding for store liveness and leader leases#123789
nvb wants to merge 30 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/leaderLeasePrototype

Conversation

@nvb
Copy link
Contributor

@nvb nvb commented May 7, 2024

We can use this to split up the prototyping effort. There's work to do below the new StoreLiveness interface, in Raft using StoreLiveness, and above Raft using the new Status.LeadSupportUntil field.

To collaborate on this, we can push new commits to this branch. Try to avoid force pushing to prevent skew.

We can use this to split up the prototyping effort. There's work to do
below the new StoreLiveness interface, in Raft using StoreLiveness, and
above Raft using the new Status.LeadSupportUntil field.

To collaborate on this, we can push new commits to this branch. Try to
avoid force pushing to prevent skew.
@blathers-crl
Copy link

blathers-crl bot commented May 7, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

nvb and others added 4 commits May 7, 2024 22:09
This commit creates a new replicaRLockedStoreLiveness intermediary type
which translates replica IDs to store IDs using an rlocked replica's
RangeDescriptor and uses the resulting store ID to call into a real (and
as-of-yet unimplemented) StoreLiveness instance.
This patch adds and implements two new message types -- MsgFortify and
MsgFortifyResp. Using these, the leader broadcasts a fortification
request to all its followers when a vote is won. It handles responses
by updating its leadSupport map.

Things still missing -
1. We're not evaluating if we're supported by a majority of followers,
and till when.
2. We're not persisting lead and leadEpoch to disk.
3. Re-fortification isn't handled.
4. Testing.

Release note: None
@arulajmani arulajmani force-pushed the nvanbenschoten/leaderLeasePrototype branch from d4662c5 to 7fdc252 Compare May 14, 2024 20:02
This patch mocks store liveness in datadriven tests and adds a few
directives to bump epochs and withdraw support. It then constructs
an "interesting" store liveness state and runs a new election. In doing
so, we ensure that MsgFortifyResp are populated correctly based on the
store liveness state.

Release note: None
@blathers-crl
Copy link

blathers-crl bot commented May 15, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

arulajmani and others added 3 commits May 16, 2024 14:59
This patch build re-fortification. Now, on every tick, the leader tries
to re-fortify any peers that were never fortified or who have withdrawn
support from a fortified epoch. We only do this if the peer's store is
currently supporting the leader's; otherwise, we'll have to wait until
this happens.

Release note: None
This commit adds a ClockTimestamp to RaftMessageRequestBatch.
@miraradeva miraradeva force-pushed the nvanbenschoten/leaderLeasePrototype branch 2 times, most recently from 489fde9 to d20e4c2 Compare May 22, 2024 20:32
This patch is based on Sumeer's prototype in
cockroachdb#122547. The main
differences are many simplifications due to the algorithm
changes between take 3 and take 4.

This patch adds only the basic heartbeating capability and structure of
the store liveness fabric. The algorithms logic will come in separate
commits.
@miraradeva miraradeva force-pushed the nvanbenschoten/leaderLeasePrototype branch from d20e4c2 to 80f5be6 Compare May 23, 2024 17:57
miraradeva and others added 3 commits May 29, 2024 14:17
Two main changes:
- The transport is not longer RPC, but streaming instead, with a
  per-node send queue for outgoing heartbeat requests, and separate
  per-store receive queues for incoming heartbeat requests, and
  responses.
- The algorithm logic is in a very basic state. Persistence is not
  implemented yet (there are TODOs). There is zero testing.
This makes the transport look more like etcd/raft.
@nvb nvb force-pushed the nvanbenschoten/leaderLeasePrototype branch from cf7420e to 4d36801 Compare May 30, 2024 02:27
nvb and others added 2 commits May 29, 2024 22:33
This patch introduces a new SupportTracker struct that tracks support
provided by followers to a leader. The leader can then use this tracked
support to calculate its QSE, which is used by higher layers.

Release note: None
@arulajmani arulajmani force-pushed the nvanbenschoten/leaderLeasePrototype branch from 4c2d7d7 to 3b969e3 Compare May 30, 2024 18:29
nvb and others added 2 commits May 30, 2024 18:47
The state machine should only campaign if it isn't supporting the
leader. The only exception is if the leader has explicitly asked a
follower to do so by initiating a campaignTransfer.

Epic: none

Release note: None
@arulajmani arulajmani force-pushed the nvanbenschoten/leaderLeasePrototype branch from 7a2d26c to 3a35370 Compare May 31, 2024 14:57
@nvb nvb force-pushed the nvanbenschoten/leaderLeasePrototype branch from 95ffbfb to 2adef81 Compare June 7, 2024 19:43
@nvb nvb force-pushed the nvanbenschoten/leaderLeasePrototype branch from 2067153 to f927061 Compare June 7, 2024 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants