-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error codes, the wire protocol, and clients #462
Comments
The current state of mixing error codes from various sources is indeed unfortunate. Since the integer contained in the I agree that backward compatibility shouldn't be a big issue, but there are few codes used in go-dqlite and perhaps LXD so please double check to convert those at the same time. |
Yeah, something like this is what I had in mind when I proposed making the code ranges disjoint.
Noted and will do. |
Here's a concrete proposal for allocating the 64-bit space of return codes that's available for the FAILURE response. Note that we use
|
I think it's fine to assume that Thinking about this a bit more though, I'm not entirely sure that we should expose libuv/raft errors. Ideally the integer returned to the client should be helpful to make decisions about what to do, or help a human understand what went wrong (together with the string error message). Having the wire protocol return, say, Perhaps what we should return is basically a superset of SQLite codes, augmented with codes that are specific to dqlite, either because they are related to wire protocol operations unrelated to SQLite (e.g. If I recall correctly what I described was actually the original design, although the actual implementation didn't quite match it. In general I believe a library or subsystem should abstract away the errors coming from lower layers, the stdlib does that with the kernel/hardware, raft does that with the stdlib/libuv (or should), SQLite does that with the stdlib (it does not "pass through" any error). This is helpful because the library should decide which errors and failure modes fit the model at hand, and ideally also provide documentation about what the client should do for each failure mode (for example if it should retry in some form). Sorry to derail the conversation a bit, just my 2 cents, YMMV. |
Note also that I believe |
I definitely agree that this is the kind of information that should ultimately be returned by the C client library functions. The question is where in the code we do the collapsing from a large, variegated space of error codes down into "SQLite codes plus". I'm not particularly attached to doing this collapsing on the client side, but I think it should happen centrally, in exactly one place in the combined server+client codebase. Everywhere else we should just propagate error codes from other APIs. One reason to do it on the client side is that the client will have its own error conditions, so it already has to do some munging to translate errors into SQLit-ese; if we put the rest of the munging in the same place it might be easier to reason about. I think any error constants we define should start with DQLITE_, not SQLITE_. |
In my mind it's more "layering" than "collapsing" (but maybe it's just a matter of terminology). Let's take a concrete example from the raft library (in static int uvWriterIoSetup(unsigned n, aio_context_t *ctx, char *errmsg)
{
int rv;
rv = UvOsIoSetup(n, ctx);
if (rv != 0) {
switch (rv) {
case UV_EAGAIN:
ErrMsgPrintf(errmsg, "AIO events user limit exceeded");
rv = RAFT_TOOMANY;
break;
default:
UvOsErrMsg(errmsg, "io_setup", rv);
rv = RAFT_IOERR;
break;
}
return rv;
}
return 0;
} In this case the errors happening in the libuv layer get handled in the raft layer: they are mapped into the raft error codes set and augmented with a more specific description. The client code consuming the raft library could then choose to handle If we were returning So, no libuv error codes should be returned by raft, only Similarly, when dqlite hits a raft error, it should do the same as the raft code above is doing when hitting a libuv error: it should handle it and map it to its own error codes (the augmented In this sense there would be no "central" place where a collapsing happens, because each error that dqlite receives from a bottom layer (e.g. raft) needs to be handled on the spot, because that's where we have the context to take a meaningful decision about what dqlite-level error code to return and what error message string to generate.
I think the reasoning above applies to the dqlite client code as well. The errors it receives from the wire-level protocol are basically the equivalent of a lower layer (think dqlite handling raft errors, or raft handling libuv errors), they need to be handled, possibly translating/augmenting them (in case of the wire-level errors most of the time I guess they will be just propagated verbatim, but there might be a few exceptions). On top of that there will be the errors from its own error conditions, as you mention. Even in this case the handling/munging/translation happens right on the spot when the error occurs.
Are you talking about wire protocol error constants or about For the wire protocol I believe we agree that those constants should be a super set of |
My concern with the "layering" approach is that we have to decide (and ideally document) on a function-by-function basis what set of return codes is being used. Then, whenever you call a function from another translation unit (or even another function in the same file), you have to check the semantics of its return code in order to determine whether to propagate it or turn it into something else. Once we've got the conventions established it's not so bad, but getting to that state from the current situation where for many functions it's not immediately clear (to me at least) what return codes are used seems like a non-trivial amount of work. But I take your point about masking errors from lower levels of the implementation that might not be useful to callers. |
In my mind if a function in raft calls a libuv function it should immediately handle and possibly translate the error code it gets. That being said, we have just one exception to that in the code base, that I know of, which is the With the above in mind, there shouldn't be cases where you need to check the semantics of the function to determine the return codes set.
As I said, I think for raft we are already good. It should be apparent by grepping the code for For dqlite, I'm not entirely sure, but I wouldn't expect the situation to be a lot different, perhaps it will need some more love.
I believe the most useful thing for callers is to have a uniform (same error codes set), bounded (you know exactly which errors from that set could be returned and under which conditions) and documented list of failure modes, precisely as the stdlib functions. We'll probably not get to that ideal any time soon, but we can opportunistically move towards that vision. |
PS: I've doubled check in the raft code, by grepping |
Thanks -- I'm getting a clearer picture of the status quo. Would you say that all fallible internal dqlite functions should use either SQLite error codes (plus our dqlite-specific additions) or the dqlite set, i.e. all raft/libuv/stdlib errors should be "masked" before returning from the containing function? (With the idea that dqlite codes will eventually surface as the return value of a dqlite.h function, while SQLite-ish codes will be send over the wire.) What are the principles you'd use to determine which of these two sets a particular function should use? |
Yes, I'd say that's the idea.
I'd distinguish two types of functions, the ones that are being called during the handling of a request (essentially anything called directly or indirectly by The first type should return SQLite-sh error codes because those codes will be encoded in the response and sent to the client (so not surfaced by any The second type should return dqlite-specific error codes, tailored to make sense for the |
Right now, dqlite reports errors to the client through the FAILURE response, which consists of an integer error code and a string. This is a reasonable design, but there are some problems with the implementation:
One thing I'd like to do to start addressing this would be to make the DQLITE_* and RAFT_* error codes disjoint from each other and from the SQLite codes (and the libuv codes), so that we can unambiguously interpret the return value of any fallible function in the codebase. That's a backwards-incompatible change, because these constants appear in public header files, but I don't expect any breakage in practice if we were to implement this. Then we could keep the current design of sending a single error code in the FAILURE response, and the client would have enough information to discern e.g. whether it's an SQLite error that should be passsed back to the caller as-is.
A backwards-compatible alternative, suggested by @stgraber, would be to add some fields to the end of the FAILURE response that hold error codes from a specific, well-defined domain. I don't think any existing clients actually look at the existing, generic error code, but if they do we could migrate them to examining these specific fields instead. If one of the new fields were reserved for an SQLite code, that would make life pretty easy for the C client.
The text was updated successfully, but these errors were encountered: