Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv,server: implement error serialization support in gRPC #56208

Open
irfansharif opened this issue Nov 2, 2020 · 7 comments
Open

kv,server: implement error serialization support in gRPC #56208

irfansharif opened this issue Nov 2, 2020 · 7 comments
Labels
A-cc-enablement Pertains to current CC production issues or short-term projects A-error-handling Error messages, error propagations/annotations A-kv-server Relating to the KV-level RPC server A-server-networking Pertains to network addressing,routing,initialization C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-kv KV Team T-server-and-security DB Server & Security

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Nov 2, 2020

We currently have code in CRDB that looks like the following:

cockroach/pkg/cli/error.go

Lines 356 to 364 in 9ba4404

// Are we trying to re-initialize an initialized cluster?
if strings.Contains(err.Error(), server.ErrClusterInitialized.Error()) {
// We really want to use errors.Is() here but this would require
// error serialization support in gRPC.
// This is not yet performed in CockroachDB even though the error
// library now has infrastructure to do so, see:
// https://github.com/cockroachdb/errors/pull/14
return server.ErrClusterInitialized
}

We're essentially string matching on the specific error and using that in our control flow; it makes for fragile code. As of cockroachdb/errors#14, our errors package now has the infrastructure to support error serialization in gRPC. It'd be nice to implement it and start promoting this errors.Is(...) usage pattern across RPC boundaries, in the same way we do for "local" errors.

Jira issue: CRDB-2971

@irfansharif irfansharif added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-error-handling Error messages, error propagations/annotations labels Nov 2, 2020
@irfansharif irfansharif added this to To do in DB Server & Security via automation Nov 2, 2020
@knz knz moved this from To do to Tech debt in DB Server & Security Nov 18, 2020
@ajwerner
Copy link
Contributor

Is there anything blocking this? This would be great!

@irfansharif
Copy link
Contributor Author

+cc @erikgrinaker.

@erikgrinaker erikgrinaker self-assigned this Mar 3, 2021
craig bot pushed a commit that referenced this issue Mar 8, 2021
61617: lint: forbid gRPC Status.WithDetails() due to gogoproto issues r=tbg a=erikgrinaker

gRPC's `Status.WithDetails()` allows callers to attach
Protobuf-structured details to gRPC errors. Unfortunately, this does not
work with gogoproto types, since they're stored in an `Any` field
internally and gogo types are not registered in the standard Protobuf
type registry. Calling `Details()` with such a type will return an
unmarshalling error in place of the detail.

To avoid this problem, this patch adds a linter that disallows any use of
`Status.WithDetails()`.

Note that there is a separate package `github.com/gogo/status` that
emulates the standard gRPC `status` package using gogoproto Protobufs
instead. However, due to the uncertain future of the gogoproto project
we are hesitant to introduce additional gogo dependencies.

Related to #56208 and cockroachdb/errors#63.

Release justification: non-production code changes
Release note: None

Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Mar 19, 2021
62214: vendor: bump cockroachdb/errors to 1.8.3 r=knz a=erikgrinaker

This fixes the "skippy peanut butter" vulnerability in error Protobufs,
includes new functionality for en/decoding gRPC `Status` errors, and
recompiles all Protobufs to be compatible with gogoproto 1.2 used by
CockroachDB.

Resolves cockroachlabs/support#876, related to #56208.

Release note: None

Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Mar 31, 2021
62608: *: improve handling of permanent errors r=tbg a=erikgrinaker

***: improve handling of permanent errors**

RPC errors are normally retried. However, in some cases the errors are
permanent such that retries are futile, and this can cause operations to
appear to hang as they keep retrying -- e.g. when running operations on
a decommissioned node. There is already some detection of permanent
errors, but it is incomplete.

This patch attempts to improve coverage of permanent errors, in
particular in the context of decommissioned nodes, and adds test cases
for these scenarios.

Release note: None

**roachpb: propagate gRPC Status errors across Error**

`roachpb.Error` uses `errors.EncodeError()` and `errors.DecodeError()`
to preserve the original structured error. Unfortunately, these did not
handle gRPC `Status` errors, resulting in a generic unstructured error
after decoding. Since this is used while propagating errors through the
KV layer, it could prevent detection of the original error type.

Support for gRPC Status encoding was added in cockroachdb/errors 1.8.3,
this patch registers the error en/decoder such that these errors are
preserved across `roachpb.Error`. A later patch will extend existing
error handling code to make better use of these.

Release note: None

Resolves #62233, resolves #61470. Touches #56208.

Decommissioned nodes will keep running a bunch of async processes that
try (and fail) to communicate with the cluster. These have not been
addressed here, see #62693.

Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
@andreimatei
Copy link
Contributor

andreimatei commented Apr 4, 2021

Can someone (@ajwerner ?) please spell out how this issue depends on #56378 (as per #56378 (comment)) ?

@ajwerner
Copy link
Contributor

ajwerner commented Apr 5, 2021

I don't know about #56378. I was just hoping to fix some string matching on grpc errors when I stumbled into this.

@knz
Copy link
Contributor

knz commented May 28, 2021

@erikgrinaker I think you've completed this right? Is there still work to do?

@erikgrinaker
Copy link
Contributor

Lots of work to do. We will basically have to add Protobuf serialization support for all relevant structured errors, make sure they can reliably traverse gRPC boundaries, and replace all the error string matching we currently do with structured error matching.

@knz knz added this to Incoming in KV via automation May 28, 2021
@knz knz changed the title server: implement error serialization support in gRPC kv,server: implement error serialization support in gRPC May 28, 2021
@knz knz added the A-kv-server Relating to the KV-level RPC server label May 28, 2021
@jlinder jlinder added T-kv KV Team T-server-and-security DB Server & Security labels Jun 16, 2021
@knz knz added A-server-networking Pertains to network addressing,routing,initialization A-cc-enablement Pertains to current CC production issues or short-term projects labels Jul 29, 2021
@lunevalex lunevalex removed this from Incoming in KV Oct 1, 2021
@github-actions
Copy link

github-actions bot commented Sep 6, 2023

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cc-enablement Pertains to current CC production issues or short-term projects A-error-handling Error messages, error propagations/annotations A-kv-server Relating to the KV-level RPC server A-server-networking Pertains to network addressing,routing,initialization C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-kv KV Team T-server-and-security DB Server & Security
Projects
DB Server & Security
  
Tech debt & Grouping attempts
Development

No branches or pull requests

6 participants