Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvclient: separate out proxy sending #124066

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

andrewbaptist
Copy link
Collaborator

@andrewbaptist andrewbaptist commented May 13, 2024

Previously in DistSender, a proxy request was handled using the same logic within sendPartialBatch and sendToReplicas. There were short circuits to handle the different cases of retries, but in cases with changing range boundaries, it could return the wrong error. This change intercepts proxy request at the start of DistSender.Send and sends to the transport bypassing the rest of the DistSender retry logic. This simplifies the code for proxy requests.

Epic: none
Fixes: #123965
Informs: #123146

Release note: None

Copy link

blathers-crl bot commented May 13, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andrewbaptist andrewbaptist marked this pull request as ready for review May 13, 2024 23:04
@andrewbaptist andrewbaptist requested a review from a team as a code owner May 13, 2024 23:04
@andrewbaptist
Copy link
Collaborator Author

Note the only test failure was on TestProtectedTimestampManagement which is unrelated and tracked as #124167

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of this new approach.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @arulajmani)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1311 at r1 (raw file):

The proxy node inserts any range information

Can we update this comment to reflect the conditional nature of this insert? We won't replace newer information with stale information, will we?


pkg/kv/kvclient/kvcoord/dist_sender.go line 1312 at r1 (raw file):

but doesn't use it for routing decisions

It would be useful to give a few examples of what this means in practice. For instance, if the proxy node knows that the range has been split, it will still send the request. Similarly, if it knows that the lease has been moved, it will still send to the proxy-requested leaseholder.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1312 at r1 (raw file):

// information, the client will update its cache and routing information and
// retry locally. The proxy node inserts any range information but doesn't use
// it for routing decisions.

Do we have any tests that demonstrate the change in behavior for proxying that this change is making? If not, should we?


pkg/kv/kvclient/kvcoord/dist_sender.go line 1335 at r1 (raw file):

	requestToSend.ProxyRangeInfo = nil
	requestToSend.Replica = transport.NextReplica()
	br, err := transport.SendNext(ctx, requestToSend)

Should we be consulting dist sender circuit breakers here, like we do in sendToReplicas?


pkg/kv/kvclient/kvcoord/dist_sender.go line 1340 at r1 (raw file):

	if err != nil {
		// Convert the error from an error on the transport to a
		// ProxyFailedError This can then be sent over the wire and decoded by

"ProxyFailedError This"

missing punctuation.


pkg/kv/kvclient/kvcoord/dist_sender.go line 2045 at r1 (raw file):

// a sendError. Node.maybeProxyRequest specifically excludes this error from
// propagation over the wire and instead return the NLHE from local evaluation.
var ProxyFailedWithSendError = kvpb.NewErrorf("proxy request failed with send error")

It looks like this is no longer used. Is that intentional? If so, can be remove it?

Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - It would have been easier if I had done this first, but I'm glad this worked out this way.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1311 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The proxy node inserts any range information

Can we update this comment to reflect the conditional nature of this insert? We won't replace newer information with stale information, will we?

I added some clarification.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1312 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

but doesn't use it for routing decisions

It would be useful to give a few examples of what this means in practice. For instance, if the proxy node knows that the range has been split, it will still send the request. Similarly, if it knows that the lease has been moved, it will still send to the proxy-requested leaseholder.

I added more context here.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1312 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we have any tests that demonstrate the change in behavior for proxying that this change is making? If not, should we?

Yes we should - let me try and create a test for this. I am going to focus on just the case where the proxy node has newer information that includes a split and validate the request is not split.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1335 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we be consulting dist sender circuit breakers here, like we do in sendToReplicas?

Good catch. I added dist sender tracking. I didn't pull all of the logic, and I added a TODO to refactor this a little bit. There is a special case that the normal logic handles related to "local" requests since they don't cancel the same way. This doesn't apply to proxy requests.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1340 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"ProxyFailedError This"

missing punctuation.

Fixed


pkg/kv/kvclient/kvcoord/dist_sender.go line 2045 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It looks like this is no longer used. Is that intentional? If so, can be remove it?

Removed

@andrewbaptist andrewbaptist requested a review from a team as a code owner June 17, 2024 19:15
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1312 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Yes we should - let me try and create a test for this. I am going to focus on just the case where the proxy node has newer information that includes a split and validate the request is not split.

I added a test for this case. I put it as a separate commit for now to validate that it indeed reproduces the issue (with an error because of different proxy results). I can squash the commits or leave them separate.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @arulajmani)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1311 at r1 (raw file):

The proxy node inserts any newer range information but doesn't use it for routing decisions

The meaning of "it" is ambiguous here. Consider splitting into:

The request carries range information, which may be newer, older, or the same as that in the proxy node's range cache. If the request carries newer information than what is known on the proxy node, the proxy node's cache is updated. However, even if the request carries old information, the proxy node's cache is never used to inform routing decisions.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1335 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Good catch. I added dist sender tracking. I didn't pull all of the logic, and I added a TODO to refactor this a little bit. There is a special case that the normal logic handles related to "local" requests since they don't cancel the same way. This doesn't apply to proxy requests.

I'm confused how we're able to get away with not calling replicaCircuitBreakerToken.Done. Aren't we tracking state in the ReplicaCircuitBreaker that requires a Done call, like inflightReqs?

Also, which special case are you referring to, and why does it not apply to proxy requests?


pkg/kv/kvclient/kvcoord/dist_sender.go line 1313 at r3 (raw file):

// retry locally. The proxy node inserts any newer range information but doesn't
// use it for routing decisions, outated information is ignored. The transport
// is built used based on the information in the request not the local cache. As

"build used based"


pkg/kv/kvclient/kvcoord/dist_sender_test.go line 6307 at r2 (raw file):

// TestProxyRequestDoesNotSplit tests that a proxy request does not split even
// if the proxy node would normally split the request . Spliting the request can

nit: remove space after request.


pkg/kv/kvclient/kvcoord/dist_sender_test.go line 6361 at r2 (raw file):

	// Track how many times the transport is called.
	sendCount := atomic.Int64{}

Consider splitting sendCount into a count of non-proxied requests and proxied requests (var sendCount, proxyCount atomic.Int32). That way, you'll have sendCount.Load() == 0 in the main part of this test, which would avoid confused by readers.

If a proxy request is against a range which is subsequently split, the
request should still follow what the original client requested rather
than the updated cache. This prevents complex errors from split requests
as they can't be processed correctly.

Informs: cockroachdb#123965

Release note: None
Previously in DistSender, a proxy request was handled using the same
logic within sendPartialBatch and sendToReplicas. There were short
circuits to handle the different cases of retries, but in cases with
changing range boundaries, it could return the wrong error. This change
intercepts proxy request at the start of DistSender.Send and sends to
the transport bypassing the rest of the DistSender retry logic. This
simplifies the code for proxy requests.

Epic: none
Fixes: cockroachdb#123965
Informs: cockroachdb#123146

Release note: None
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1311 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The proxy node inserts any newer range information but doesn't use it for routing decisions

The meaning of "it" is ambiguous here. Consider splitting into:

The request carries range information, which may be newer, older, or the same as that in the proxy node's range cache. If the request carries newer information than what is known on the proxy node, the proxy node's cache is updated. However, even if the request carries old information, the proxy node's cache is never used to inform routing decisions.

Thanks, I rewrite this entire comment, so please take one more look through it. There is a lot of subtly here so I agree its important to make it clear.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1335 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm confused how we're able to get away with not calling replicaCircuitBreakerToken.Done. Aren't we tracking state in the ReplicaCircuitBreaker that requires a Done call, like inflightReqs?

Also, which special case are you referring to, and why does it not apply to proxy requests?

You're right! I do need to call Done here to prevent a leak. This code is somewhat confusing because of the way transport.SendNext handles cancellation. I originally thought that Done was only necessary to handle the difference between local and remote evaluation, but is also needed to prevent the request leak.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1313 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"build used based"

Fixed


pkg/kv/kvclient/kvcoord/dist_sender_test.go line 6307 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: remove space after request.

Done


pkg/kv/kvclient/kvcoord/dist_sender_test.go line 6361 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider splitting sendCount into a count of non-proxied requests and proxied requests (var sendCount, proxyCount atomic.Int32). That way, you'll have sendCount.Load() == 0 in the main part of this test, which would avoid confused by readers.

I made the reset explicit. It is hard to cleanly determine in the transport whether it is a proxy request or not and rather than trying to parse it out, it is simpler and more resilient to explicitly reset it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/logictest: TestParallel failed
3 participants