-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvclient: separate out proxy sending #124066
base: master
Are you sure you want to change the base?
kvclient: separate out proxy sending #124066
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
83b9103
to
931aac6
Compare
931aac6
to
a97a68f
Compare
Note the only test failure was on TestProtectedTimestampManagement which is unrelated and tracked as #124167 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of this new approach.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @arulajmani)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1311 at r1 (raw file):
The proxy node inserts any range information
Can we update this comment to reflect the conditional nature of this insert? We won't replace newer information with stale information, will we?
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1312 at r1 (raw file):
but doesn't use it for routing decisions
It would be useful to give a few examples of what this means in practice. For instance, if the proxy node knows that the range has been split, it will still send the request. Similarly, if it knows that the lease has been moved, it will still send to the proxy-requested leaseholder.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1312 at r1 (raw file):
// information, the client will update its cache and routing information and // retry locally. The proxy node inserts any range information but doesn't use // it for routing decisions.
Do we have any tests that demonstrate the change in behavior for proxying that this change is making? If not, should we?
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1335 at r1 (raw file):
requestToSend.ProxyRangeInfo = nil requestToSend.Replica = transport.NextReplica() br, err := transport.SendNext(ctx, requestToSend)
Should we be consulting dist sender circuit breakers here, like we do in sendToReplicas
?
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1340 at r1 (raw file):
if err != nil { // Convert the error from an error on the transport to a // ProxyFailedError This can then be sent over the wire and decoded by
"ProxyFailedError This"
missing punctuation.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 2045 at r1 (raw file):
// a sendError. Node.maybeProxyRequest specifically excludes this error from // propagation over the wire and instead return the NLHE from local evaluation. var ProxyFailedWithSendError = kvpb.NewErrorf("proxy request failed with send error")
It looks like this is no longer used. Is that intentional? If so, can be remove it?
a97a68f
to
39f5191
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - It would have been easier if I had done this first, but I'm glad this worked out this way.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1311 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The proxy node inserts any range information
Can we update this comment to reflect the conditional nature of this insert? We won't replace newer information with stale information, will we?
I added some clarification.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1312 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
but doesn't use it for routing decisions
It would be useful to give a few examples of what this means in practice. For instance, if the proxy node knows that the range has been split, it will still send the request. Similarly, if it knows that the lease has been moved, it will still send to the proxy-requested leaseholder.
I added more context here.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1312 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we have any tests that demonstrate the change in behavior for proxying that this change is making? If not, should we?
Yes we should - let me try and create a test for this. I am going to focus on just the case where the proxy node has newer information that includes a split and validate the request is not split.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1335 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we be consulting dist sender circuit breakers here, like we do in
sendToReplicas
?
Good catch. I added dist sender tracking. I didn't pull all of the logic, and I added a TODO to refactor this a little bit. There is a special case that the normal logic handles related to "local" requests since they don't cancel the same way. This doesn't apply to proxy requests.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1340 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"ProxyFailedError This"
missing punctuation.
Fixed
pkg/kv/kvclient/kvcoord/dist_sender.go
line 2045 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It looks like this is no longer used. Is that intentional? If so, can be remove it?
Removed
39f5191
to
cc134ae
Compare
cc134ae
to
38e0461
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1312 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Yes we should - let me try and create a test for this. I am going to focus on just the case where the proxy node has newer information that includes a split and validate the request is not split.
I added a test for this case. I put it as a separate commit for now to validate that it indeed reproduces the issue (with an error because of different proxy results). I can squash the commits or leave them separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @arulajmani)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1311 at r1 (raw file):
The proxy node inserts any newer range information but doesn't use it for routing decisions
The meaning of "it" is ambiguous here. Consider splitting into:
The request carries range information, which may be newer, older, or the same as that in the proxy node's range cache. If the request carries newer information than what is known on the proxy node, the proxy node's cache is updated. However, even if the request carries old information, the proxy node's cache is never used to inform routing decisions.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1335 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Good catch. I added dist sender tracking. I didn't pull all of the logic, and I added a TODO to refactor this a little bit. There is a special case that the normal logic handles related to "local" requests since they don't cancel the same way. This doesn't apply to proxy requests.
I'm confused how we're able to get away with not calling replicaCircuitBreakerToken.Done
. Aren't we tracking state in the ReplicaCircuitBreaker
that requires a Done call, like inflightReqs
?
Also, which special case are you referring to, and why does it not apply to proxy requests?
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1313 at r3 (raw file):
// retry locally. The proxy node inserts any newer range information but doesn't // use it for routing decisions, outated information is ignored. The transport // is built used based on the information in the request not the local cache. As
"build used based"
pkg/kv/kvclient/kvcoord/dist_sender_test.go
line 6307 at r2 (raw file):
// TestProxyRequestDoesNotSplit tests that a proxy request does not split even // if the proxy node would normally split the request . Spliting the request can
nit: remove space after request.
pkg/kv/kvclient/kvcoord/dist_sender_test.go
line 6361 at r2 (raw file):
// Track how many times the transport is called. sendCount := atomic.Int64{}
Consider splitting sendCount
into a count of non-proxied requests and proxied requests (var sendCount, proxyCount atomic.Int32
). That way, you'll have sendCount.Load() == 0
in the main part of this test, which would avoid confused by readers.
If a proxy request is against a range which is subsequently split, the request should still follow what the original client requested rather than the updated cache. This prevents complex errors from split requests as they can't be processed correctly. Informs: cockroachdb#123965 Release note: None
Previously in DistSender, a proxy request was handled using the same logic within sendPartialBatch and sendToReplicas. There were short circuits to handle the different cases of retries, but in cases with changing range boundaries, it could return the wrong error. This change intercepts proxy request at the start of DistSender.Send and sends to the transport bypassing the rest of the DistSender retry logic. This simplifies the code for proxy requests. Epic: none Fixes: cockroachdb#123965 Informs: cockroachdb#123146 Release note: None
38e0461
to
85595c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1311 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The proxy node inserts any newer range information but doesn't use it for routing decisions
The meaning of "it" is ambiguous here. Consider splitting into:
The request carries range information, which may be newer, older, or the same as that in the proxy node's range cache. If the request carries newer information than what is known on the proxy node, the proxy node's cache is updated. However, even if the request carries old information, the proxy node's cache is never used to inform routing decisions.
Thanks, I rewrite this entire comment, so please take one more look through it. There is a lot of subtly here so I agree its important to make it clear.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1335 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm confused how we're able to get away with not calling
replicaCircuitBreakerToken.Done
. Aren't we tracking state in theReplicaCircuitBreaker
that requires a Done call, likeinflightReqs
?Also, which special case are you referring to, and why does it not apply to proxy requests?
You're right! I do need to call Done here to prevent a leak. This code is somewhat confusing because of the way transport.SendNext
handles cancellation. I originally thought that Done was only necessary to handle the difference between local and remote evaluation, but is also needed to prevent the request leak.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1313 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"build used based"
Fixed
pkg/kv/kvclient/kvcoord/dist_sender_test.go
line 6307 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: remove space after request.
Done
pkg/kv/kvclient/kvcoord/dist_sender_test.go
line 6361 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Consider splitting
sendCount
into a count of non-proxied requests and proxied requests (var sendCount, proxyCount atomic.Int32
). That way, you'll havesendCount.Load() == 0
in the main part of this test, which would avoid confused by readers.
I made the reset explicit. It is hard to cleanly determine in the transport whether it is a proxy request or not and rather than trying to parse it out, it is simpler and more resilient to explicitly reset it.
Previously in DistSender, a proxy request was handled using the same logic within sendPartialBatch and sendToReplicas. There were short circuits to handle the different cases of retries, but in cases with changing range boundaries, it could return the wrong error. This change intercepts proxy request at the start of DistSender.Send and sends to the transport bypassing the rest of the DistSender retry logic. This simplifies the code for proxy requests.
Epic: none
Fixes: #123965
Informs: #123146
Release note: None