New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv,server: implement error serialization support in gRPC #56208
Comments
Is there anything blocking this? This would be great! |
+cc @erikgrinaker. |
61617: lint: forbid gRPC Status.WithDetails() due to gogoproto issues r=tbg a=erikgrinaker gRPC's `Status.WithDetails()` allows callers to attach Protobuf-structured details to gRPC errors. Unfortunately, this does not work with gogoproto types, since they're stored in an `Any` field internally and gogo types are not registered in the standard Protobuf type registry. Calling `Details()` with such a type will return an unmarshalling error in place of the detail. To avoid this problem, this patch adds a linter that disallows any use of `Status.WithDetails()`. Note that there is a separate package `github.com/gogo/status` that emulates the standard gRPC `status` package using gogoproto Protobufs instead. However, due to the uncertain future of the gogoproto project we are hesitant to introduce additional gogo dependencies. Related to #56208 and cockroachdb/errors#63. Release justification: non-production code changes Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
62214: vendor: bump cockroachdb/errors to 1.8.3 r=knz a=erikgrinaker This fixes the "skippy peanut butter" vulnerability in error Protobufs, includes new functionality for en/decoding gRPC `Status` errors, and recompiles all Protobufs to be compatible with gogoproto 1.2 used by CockroachDB. Resolves cockroachlabs/support#876, related to #56208. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
62608: *: improve handling of permanent errors r=tbg a=erikgrinaker ***: improve handling of permanent errors** RPC errors are normally retried. However, in some cases the errors are permanent such that retries are futile, and this can cause operations to appear to hang as they keep retrying -- e.g. when running operations on a decommissioned node. There is already some detection of permanent errors, but it is incomplete. This patch attempts to improve coverage of permanent errors, in particular in the context of decommissioned nodes, and adds test cases for these scenarios. Release note: None **roachpb: propagate gRPC Status errors across Error** `roachpb.Error` uses `errors.EncodeError()` and `errors.DecodeError()` to preserve the original structured error. Unfortunately, these did not handle gRPC `Status` errors, resulting in a generic unstructured error after decoding. Since this is used while propagating errors through the KV layer, it could prevent detection of the original error type. Support for gRPC Status encoding was added in cockroachdb/errors 1.8.3, this patch registers the error en/decoder such that these errors are preserved across `roachpb.Error`. A later patch will extend existing error handling code to make better use of these. Release note: None Resolves #62233, resolves #61470. Touches #56208. Decommissioned nodes will keep running a bunch of async processes that try (and fail) to communicate with the cluster. These have not been addressed here, see #62693. Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Can someone (@ajwerner ?) please spell out how this issue depends on #56378 (as per #56378 (comment)) ? |
I don't know about #56378. I was just hoping to fix some string matching on grpc errors when I stumbled into this. |
@erikgrinaker I think you've completed this right? Is there still work to do? |
Lots of work to do. We will basically have to add Protobuf serialization support for all relevant structured errors, make sure they can reliably traverse gRPC boundaries, and replace all the error string matching we currently do with structured error matching. |
We have marked this issue as stale because it has been inactive for |
We currently have code in CRDB that looks like the following:
cockroach/pkg/cli/error.go
Lines 356 to 364 in 9ba4404
We're essentially string matching on the specific error and using that in our control flow; it makes for fragile code. As of cockroachdb/errors#14, our errors package now has the infrastructure to support error serialization in gRPC. It'd be nice to implement it and start promoting this
errors.Is(...)
usage pattern across RPC boundaries, in the same way we do for "local" errors.Jira issue: CRDB-2971
The text was updated successfully, but these errors were encountered: