Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: reject lease proposal locally when leader not known #118435

Open
kvoli opened this issue Jan 29, 2024 · 7 comments · Fixed by #127082
Open

kv: reject lease proposal locally when leader not known #118435

kvoli opened this issue Jan 29, 2024 · 7 comments · Fixed by #127082
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-leader-leases Related to the introduction of leader leases C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Jan 29, 2024

Is your feature request related to a problem? Please describe.

If a replica doesn't know who the leader is (possible just rejoined cluster) we still let the lease acquisition request be proposed, despite knowing it will fail.

// - if the leader is not known, we don't do anything special here to

This can hang if the replica is behind and is still catching up e.g. just rejoined the cluster.

Describe the solution you'd like

Reject the proposal, similar to the known leader case and return a NotLeaseHolderError.

if leaderKnownAndEligible && !ownsCurrentLease && !b.testing.allowLeaseProposalWhenNotLeader {

Additional context
Related #118266

Jira issue: CRDB-35731

Epic CRDB-37522

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs A-kv-replication Relating to Raft, consensus, and coordination. P-2 Issues/test failures with a fix SLA of 3 months labels Jan 29, 2024
Copy link

blathers-crl bot commented Jan 29, 2024

cc @cockroachdb/replication

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Feb 2, 2024

I'm not sure this will work. If the range does not have quorum nor a lease, and we don't attempt to propose a lease, then the replica circuit breakers will never trip and the client will keep retrying indefinitely -- exactly the scenario replica circuit breakers are meant to avoid.

If we are to do this, then we'll have to implement circuit breakers at the DistSender level instead. That may come for free with #105168.

@Schtick Schtick added the O-postmortem Originated from a Postmortem action item. label Feb 8, 2024
@nvanbenschoten
Copy link
Member

If the range does not have quorum nor a lease, and we don't attempt to propose a lease, then the replica circuit breakers will never trip and the client will keep retrying indefinitely -- exactly the scenario replica circuit breakers are meant to avoid.

How does proposing a lease help the range reach a quorum and avoid the indefinite retry? When a replica is in this state where it does not know the leader, it is simultaneously in the process of campaigning and trying to establish or learn leadership. Placing a proposal to sit idle in Replica.mu.proposals and wedging a client doesn't really help anybody. Returning a NotLeaseHolderError allows the client to retry on other replicas, allowing it to search for the leaseholder.

Are you saying that we will need client-side intervention in the case where all replicas are disconnected and no leader can be elected, to avoid an infinite NotLeaseHolderError retry loop? If so, I agree. The DistSender inTransferRetry backoff loop can help this from spinning, but we probably also want to integrate this with client-side circuit breakers.

This does feel like the client's responsibility though, right?

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Feb 20, 2024

Yeah, I think that's a better way to handle it.

Today though, we rely on replica circuit breakers to avoid clients getting permanently stuck on an unavailable range (e.g. one that's lost quorum). If we simply remove this lease acquisition without also adding client-side circuit breakers then we're effectively removing that protection, leaving clients to get stuck on unavailable ranges again.

As a short-term, backportable fix I think we should get #118737 in now. Then once we have range-level DistSender circuit breakers we can reject these proposals (and consider removing replica circuit breakers entirely). Note that range-level circuit breakers are out of scope for #118943, and probably not landing in 24.1 given stability is nearly upon us.

@erikgrinaker
Copy link
Contributor

How does proposing a lease help the range reach a quorum and avoid the indefinite retry? When a replica is in this state where it does not know the leader, it is simultaneously in the process of campaigning and trying to establish or learn leadership.

One additional aspect here is that if the range is quiesced but has lost its leaseholder/leader, we have to wake up the replica so that it can actually hold an election and gain leadership. Proposing a lease acquisition will indirectly do that. If we reject lease acquisitions then we need to make sure we still unquiesce the range -- it's possible that it already does so, given we only reject this in the proposal buffer, but we should check and add a test case for this.

@nvanbenschoten
Copy link
Member

One additional aspect here is that if the range is quiesced but has lost its leaseholder/leader, we have to wake up the replica so that it can actually hold an election and gain leadership. Proposing a lease acquisition will indirectly do that. If we reject lease acquisitions then we need to make sure we still unquiesce the range -- it's possible that it already does so, given we only reject this in the proposal buffer, but we should check and add a test case for this.

+1 to all of this. There is a call to maybeUnquiesceLocked in propBuf.flushLocked and some discussion around unquiesceAndWakeLeader in handleRaftReadyRaftMuLocked's call to propBuf.FlushLockedWithRaftGroup. We would need to make sure that we properly handle rejecting the lease request but still unquiescing in both of these paths. From a quick reading of the code, it looks like we do. A test case would be worthwhile.

andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Mar 12, 2024
Previously we could propose a lease change if the leader was unknown.
This request could stall as we waited for the lease request to fail. Not
knowing the leader means we can't immediately submit this lease
proposal. We rely on the client to retry requests until a new leader has
been established.

Epic: none
Fixes: cockroachdb#120073
Fixes: cockroachdb#118435

Release note: None
@nvanbenschoten nvanbenschoten added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 labels Mar 13, 2024
@nicktrav nicktrav added the P-3 Issues/test failures with no fix SLA label Apr 29, 2024
@nvanbenschoten nvanbenschoten changed the title kvserver: reject lease proposal locally when leader not known kv: reject lease proposal locally when leader not known Apr 30, 2024
@nvanbenschoten nvanbenschoten added the A-leader-leases Related to the introduction of leader leases label Jun 6, 2024
@exalate-issue-sync exalate-issue-sync bot added A-leader-leases Related to the introduction of leader leases and removed A-leader-leases Related to the introduction of leader leases labels Jun 6, 2024
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 17, 2024
Fixes cockroachdb#125676.

This commit sets the `kv.lease.reject_on_leader_unknown.enabled` cluster
setting to true in the c2c roachtests. This should avoid lease requests
from getting stuck after a restart with intentionally high replication
lag in `c2c/disconnect`.

We can remove this when we complete cockroachdb#118435.

Release note: None
craig bot pushed a commit that referenced this issue Jun 18, 2024
125813: kv: reject lease request on unknown leader in c2c roachtests r=nvanbenschoten a=nvanbenschoten

Fixes #125676.

This commit sets the `kv.lease.reject_on_leader_unknown.enabled` cluster setting to true in the c2c roachtests. This should avoid lease requests from getting stuck after a restart with intentionally high replication lag in `c2c/disconnect`.

We can remove this when we complete #118435.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 25, 2024
Closes cockroachdb#123498.
Informs cockroachdb#118435.

This commit extracts lease request and lease transfer validation into
the `kv/kvserver/leases` package which was introduced in cockroachdb#124682. This
avoids the scattered leases validity checks which were previously spread
between `replica_range_lease.go`, `replica_proposal_buf.go`
`cmd_lease_request.go`, `cmd_lease_transfer.go`.

This makes the logic easier to understand and easier to test. It will
also make the logic easier to adjust when we make the following two
changes in service of cockroachdb#118435:
- default to rejecting lease requests on unknown leaders
- perform lease request validation above raft, like we do for lease
  transfers

This second change also ties in with Leader leases (cockroachdb#123847). Until we
make that change, we will occasionally need to propose expiration-based
leases (e.g. when we're not the leader) just for them to be rejected by
the proposal buffer.

Release note: None
@arulajmani arulajmani assigned nvanbenschoten and unassigned pav-kv Jul 3, 2024
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 8, 2024
Closes cockroachdb#123498.
Informs cockroachdb#118435.

This commit extracts lease request and lease transfer validation into
the `kv/kvserver/leases` package which was introduced in cockroachdb#124682. This
avoids the scattered leases validity checks which were previously spread
between `replica_range_lease.go`, `replica_proposal_buf.go`
`cmd_lease_request.go`, `cmd_lease_transfer.go`.

This makes the logic easier to understand and easier to test. It will
also make the logic easier to adjust when we make the following two
changes in service of cockroachdb#118435:
- default to rejecting lease requests on unknown leaders
- perform lease request validation above raft, like we do for lease
  transfers

This second change also ties in with Leader leases (cockroachdb#123847). Until we
make that change, we will occasionally need to propose expiration-based
leases (e.g. when we're not the leader) just for them to be rejected by
the proposal buffer.

Release note: None
craig bot pushed a commit that referenced this issue Jul 8, 2024
126182: kv: consolidate range lease validation r=arulajmani a=nvanbenschoten

Closes #123498.
Informs #118435.

This commit extracts lease request and lease transfer validation into the `kv/kvserver/leases` package which was introduced in #124682. This avoids the scattered leases validity checks which were previously spread between `replica_range_lease.go`, `replica_proposal_buf.go` `cmd_lease_request.go`, `cmd_lease_transfer.go`.

This makes the logic easier to understand and easier to test. It will also make the logic easier to adjust when we make the following two changes in service of #118435:
- default to rejecting lease requests on unknown leaders
- perform lease request validation above raft, like we do for lease transfers

This second change also ties in with Leader leases (#123847). Until we make that change, we will occasionally need to propose expiration-based leases (e.g. when we're not the leader) just for them to be rejected by the proposal buffer.

This is a fairly large PR, and I'm willing to split it up if that makes sense.

Release note: None

126845: orchestration: released CockroachDB version 24.1.2. Next version: 24.1.3 r=ZhouXing19 a=cockroach-teamcity


Release note: None
Epic: None
Release justification: non-production (release infra) change.


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Justin Beaver <teamcity@cockroachlabs.com>
asg0451 pushed a commit to asg0451/cockroach that referenced this issue Jul 10, 2024
Closes cockroachdb#123498.
Informs cockroachdb#118435.

This commit extracts lease request and lease transfer validation into
the `kv/kvserver/leases` package which was introduced in cockroachdb#124682. This
avoids the scattered leases validity checks which were previously spread
between `replica_range_lease.go`, `replica_proposal_buf.go`
`cmd_lease_request.go`, `cmd_lease_transfer.go`.

This makes the logic easier to understand and easier to test. It will
also make the logic easier to adjust when we make the following two
changes in service of cockroachdb#118435:
- default to rejecting lease requests on unknown leaders
- perform lease request validation above raft, like we do for lease
  transfers

This second change also ties in with Leader leases (cockroachdb#123847). Until we
make that change, we will occasionally need to propose expiration-based
leases (e.g. when we're not the leader) just for them to be rejected by
the proposal buffer.

Release note: None
craig bot pushed a commit that referenced this issue Jul 15, 2024
127082: kv: don't propose lease acquisition if raft leader is unknown, by default r=nvanbenschoten a=nvanbenschoten

Fixes #118435.
Fixes #120073.

This commit updates the `kv.lease.reject_on_leader_unknown.enabled` cluster setting added in 282d953 to be enabled by default. This setting instructs a Raft follower who is not aware of the current Raft leader to reject lease acquisitions immediately, rather than waiting for the local replica establishing connectivity to its quorum (which may never happen).

In cases where a range truly has no leader, we now rely on the client to retry requests (with backoff) until a new leader has been established.

This behavior is is still configurable with the cluster setting, so we can disable it if we find that it causes issues.

Release note: None

127169: roachtest: fix broken failover test naming r=nvanbenschoten a=nvanbenschoten

failover roachtest names were recently broken by 04e093f, which unintentionally introduced variable aliasing and caused the "/read-only" and "/read-write" strings to be appended to every test name. This commit fixes the names and avoids the variable shadowing to make the code less error-prone.

The roachtest names are now fixed:
```sh
➜ roachtest list ^failover
Listing tests which match regex "^failover" and are compatible with cloud "gce".

failover/chaos/read-only [kv]
failover/chaos/read-only/lease=expiration [kv]
failover/chaos/read-write [kv]
failover/chaos/read-write/lease=expiration [kv]
failover/liveness/blackhole [kv]
failover/liveness/blackhole-recv [kv]
failover/liveness/blackhole-recv/lease=expiration [kv]
failover/liveness/blackhole-send [kv]
failover/liveness/blackhole-send/lease=expiration [kv]
failover/liveness/blackhole/lease=expiration [kv]
failover/liveness/crash [kv]
failover/liveness/crash/lease=expiration [kv]
failover/liveness/deadlock [kv]
failover/liveness/deadlock/lease=expiration [kv]
failover/liveness/disk-stall [kv]
failover/liveness/disk-stall/lease=expiration [kv]
failover/liveness/pause [kv]
failover/liveness/pause/lease=expiration [kv]
failover/non-system/blackhole [kv]
failover/non-system/blackhole-recv [kv]
failover/non-system/blackhole-recv/lease=expiration [kv]
failover/non-system/blackhole-send [kv]
failover/non-system/blackhole-send/lease=expiration [kv]
failover/non-system/blackhole/lease=expiration [kv]
failover/non-system/crash [kv]
failover/non-system/crash/lease=expiration [kv]
failover/non-system/deadlock [kv]
failover/non-system/deadlock/lease=expiration [kv]
failover/non-system/disk-stall [kv]
failover/non-system/disk-stall/lease=expiration [kv]
failover/non-system/pause [kv]
failover/non-system/pause/lease=expiration [kv]
failover/partial/lease-gateway [kv]
failover/partial/lease-gateway/lease=expiration [kv]
failover/partial/lease-leader [kv]
failover/partial/lease-leader/lease=expiration [kv]
failover/partial/lease-liveness [kv]
failover/partial/lease-liveness/lease=expiration [kv]
failover/system-non-liveness/blackhole [kv]
failover/system-non-liveness/blackhole-recv [kv]
failover/system-non-liveness/blackhole-recv/lease=expiration [kv]
failover/system-non-liveness/blackhole-send [kv]
failover/system-non-liveness/blackhole-send/lease=expiration [kv]
failover/system-non-liveness/blackhole/lease=expiration [kv]
failover/system-non-liveness/crash [kv]
failover/system-non-liveness/crash/lease=expiration [kv]
failover/system-non-liveness/deadlock [kv]
failover/system-non-liveness/deadlock/lease=expiration [kv]
failover/system-non-liveness/disk-stall [kv]
failover/system-non-liveness/disk-stall/lease=expiration [kv]
failover/system-non-liveness/pause [kv]
failover/system-non-liveness/pause/lease=expiration [kv]
```

Epic: None
Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in 6c784d4 Jul 15, 2024
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 9, 2024
Closes cockroachdb#128642.
Reopens cockroachdb#118435.
Reopens cockroachdb#120073.

This commit is a partial revert of cockroachdb#127082. It switches the default value for
the `kv.lease.reject_on_leader_unknown.enabled` cluster setting back to `false`.
This avoids the regression we see in cockroachdb#128642 and may also help stabilize cockroachdb#127724.

When manually testing range failover with epoch-based leases, we see that the
cluster setting is causing hung raft elections. As hypothesized in cockroachdb#128642, by
redirecting immediately after unquiescing a range, finding a dead leader, and
campaigning (`shouldCampaignOnWake`), instead of just sitting tight and waiting
for a raft leader election to complete, we trigger multiple replicas to campaign
for leadership in quick succession. This leads to a hung election and requires a
new election to be called after an additional (jittered) election timeout. We
don't see this behavior with the setting disabled.

Release note: None
craig bot pushed a commit that referenced this issue Aug 9, 2024
128708: kv: propose lease acquisition if raft leader is unknown, by default r=nvanbenschoten a=nvanbenschoten

Closes #128642.
Reopens #118435.
Reopens #120073.

This commit is a partial revert of #127082. It switches the default value for the `kv.lease.reject_on_leader_unknown.enabled` cluster setting back to `false`. This avoids the regression we see in #128642 and may also help stabilize #127724.

When manually testing range failover with epoch-based leases, we see that the cluster setting is causing hung raft elections. As hypothesized in #128642, by redirecting immediately after unquiescing a range, finding a dead leader, and campaigning (`shouldCampaignOnWake`), instead of just sitting tight and waiting for a raft leader election to complete, we trigger multiple replicas to campaign for leadership in quick succession. This leads to a hung election and requires a new election to be called after an additional (jittered) election timeout. We don't see this behavior with the setting disabled.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@nvanbenschoten nvanbenschoten removed their assignment Aug 12, 2024
@nvanbenschoten
Copy link
Member

In #128642, we found that this caused a regression for quiesced ranges. One idea is to tie the default for this setting to the per-range lease type preference (Replica.desiredLeaseTypeRLocked). We can default this value to true for ranges that don't quiesce (those using expiration and leader leases) and false for ranges that do quiesce (those using epoch-based leases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-leader-leases Related to the introduction of leader leases C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-kv KV Team
Projects
No open projects
Status: Incoming
Development

Successfully merging a pull request may close this issue.

6 participants