New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: retry ReplicaUnavailableError
against other replicas
#118737
kvcoord: retry ReplicaUnavailableError
against other replicas
#118737
Conversation
I have to look into what the error type addition means for backports. It's a bit unfortunate, but I don't know if it can be avoided. @nvanbenschoten I don't think your idea of rejecting lease requests if we're not the leader will work with replica circuit breakers in their current form, see #118435 (comment). |
47ba91e
to
e2f1316
Compare
This shouldn't affect backports, afaict. We just implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 2673 at r3 (raw file):
if rue := (*kvpb.ReplicaUnavailableError)(nil); errors.As(err, &rue) { if lh.IsSame(rue.Replica) { return nil, replicaUnavailableError
There is a problematic case worth considering as it could be common.
Initial state - r1-r5 healthy, r2 is the leaseholder, and r3 is lagging.
Fault: r2 fails (becomes unavailable)
New state:
r1 - leaseholder
r2 - unavailable
r3 - lagging
r4, r5 - not-lagging
Client has ordered replicas as r2, r3, ...
Hits r2 - gets replica unavailable
Hits r3 - gets NotLeaseholder with leaseholder = r2.
Client hits this check and returns the ReplicaUnavailableError
.
Future requests will hit the same replicas (r2, r3) and keep returning error until r3 catches up.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 2941 at r3 (raw file):
return noMoreReplicasErr( ambiguousError, nil, // ignore the replicaUnavailableError, retry with new routing info
I'm not sure if this matters, but should we also clear the replicaUnavailableError
since we are re-creating the routing? I'm not sure what the implications of not clearing it would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 2673 at r3 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
There is a problematic case worth considering as it could be common.
Initial state - r1-r5 healthy, r2 is the leaseholder, and r3 is lagging.
Fault: r2 fails (becomes unavailable)
New state:
r1 - leaseholder
r2 - unavailable
r3 - lagging
r4, r5 - not-laggingClient has ordered replicas as r2, r3, ...
Hits r2 - gets replica unavailable
Hits r3 - gets NotLeaseholder with leaseholder = r2.Client hits this check and returns the
ReplicaUnavailableError
.Future requests will hit the same replicas (r2, r3) and keep returning error until r3 catches up.
I didn't think this case would be any different than what already happens with other errors, e.g. in the case of a dead node where we'll hit a connection refused. But I think it is, because MoveToFront()
will simply swap the assumed leaseholder with the next pending replica. So if we hit r3 -> r2, and r2 fails again, we'll move onto r4 instead of spinning on r3.
There is another way we could handle this, by ignoring NLHEs that point back to a RUE, and instead keep trying replicas until we run out. I don't think this should end up in NLHE loops either (if nodes disagree on the NLHE leaseholder), but should doublecheck.
Does that sound reasonable? I'll update the PR when I get a chance.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 2941 at r3 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
I'm not sure if this matters, but should we also clear the
replicaUnavailableError
since we are re-creating the routing? I'm not sure what the implications of not clearing it would be.
We're in a different scope, so we can't clear it here. It doesn't matter either, since the error will be returned back up past sendToReplicas
, and we'll re-enter it with a new, empty replicaUnavailableError
.
e2f1316
to
f94e018
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 2673 at r3 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
I didn't think this case would be any different than what already happens with other errors, e.g. in the case of a dead node where we'll hit a connection refused. But I think it is, because
MoveToFront()
will simply swap the assumed leaseholder with the next pending replica. So if we hit r3 -> r2, and r2 fails again, we'll move onto r4 instead of spinning on r3.There is another way we could handle this, by ignoring NLHEs that point back to a RUE, and instead keep trying replicas until we run out. I don't think this should end up in NLHE loops either (if nodes disagree on the NLHE leaseholder), but should doublecheck.
Does that sound reasonable? I'll update the PR when I get a chance.
Updated the PR to simply skip NLHEs pointing back to a RUE. I believe this should avoid retry loops since we only update the leaseholder with newer generations and don't retry replicas that haven't had a NLHE point to them.
Nice catch, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I never feel particularly confident about changes here, and I forget how it works between each time I touch it. Could definitely do with a better test harness. TFTR! bors r+ |
Hm, I'll have to doublecheck the interactions with AmbiguousError here -- I think we have to wrap these as ambiguous if they are commits, since the breaker can trip while the command is in flight. bors r- |
Canceled. |
Epic: none Release note: None
This allows accessing the error via `kvpb.Error.GetDetail()`. Epic: none Release note: None
Previously, a `ReplicaUnavailableError` returned by a tripped replica circuit breaker was not retried by the DistSender, instead failing fast back to the client. This was sort of by design, since these errors are intended to avoid clients getting stuck in infinite retry loops. However, a single replica having a tripped circuit breaker does not imply that the range is unavailable -- it's possible that e.g. this replica is partially partitioned away from the quorum, or is a stale replica waiting to be caught up. If a DistSender attempts this replica first, e.g. because it was the last-known leaseholder or it's the lowest latency replica, then it will always error out, giving the appearance of an unavailable range to the client. This patch adds DistSender retry handling of `ReplicaUnavailableError`, which attempts to contact other replicas in case a functional lease exists elsewhere, while also taking care to avoid retry loops where other replicas return a `NotLeaseHolderError` pointing back to the unavailable replica. Epic: none Release note (bug fix): if an individual replica's circuit breaker had tripped but the range was otherwise functional, e.g. because the replica was partially partitioned away from the leaseholder, it was possible for a gateway to persistently error when contacting this replica instead of retrying against a functional leaseholder elsewhere. The gateway will now retry such errors against other replicas once.
f94e018
to
2523c1c
Compare
Indeed, updated the PR to mark these as ambiguous when used with commits. |
bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from d83eed2 to blathers/backport-release-23.1-118737: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
kvserver: add test cases for
ReplicaUnavailableError
markkvpb: add error type for
ReplicaUnavailableError
This allows accessing the error via
kvpb.Error.GetDetail()
.kvcoord: retry
ReplicaUnavailableError
against other replicasPreviously, a
ReplicaUnavailableError
returned by a tripped replica circuit breaker was not retried by the DistSender, instead failing fast back to the client. This was sort of by design, since these errors are intended to avoid clients getting stuck in infinite retry loops.However, a single replica having a tripped circuit breaker does not imply that the range is unavailable -- it's possible that e.g. this replica is partially partitioned away from the quorum, or is a stale replica waiting to be caught up. If a DistSender attempts this replica first, e.g. because it was the last-known leaseholder or it's the lowest latency replica, then it will always error out, giving the appearance of an unavailable range to the client.
This patch adds DistSender retry handling of
ReplicaUnavailableError
, which attempts to contact other replicas in case a functional lease exists elsewhere, while also taking care to avoid retry loops where other replicas return aNotLeaseHolderError
pointing back to the unavailable replica.Resolves #118266.
Resolves #105167.
Epic: none
Release note (bug fix): if an individual replica's circuit breaker had tripped but the range was otherwise functional, e.g. because the replica was partially partitioned away from the leaseholder, it was possible for a gateway to persistently error when contacting this replica instead of retrying against a functional leaseholder elsewhere. The gateway will now retry such errors against other replicas once.