-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: clarify DB error messages that seep through the client connection #4421
Comments
I think we should first spend some time figuring out when these messages occur exactly before we determine a better / more informative phrasing for them. Also in some cases it might be possible to hide them entirely by automatically retrying the transaction. @andreimatei, care to comment? |
I think we've already decided (?) to expose all transaction retry/abort On Tue, Feb 16, 2016 at 5:20 PM kena notifications@github.com wrote:
|
@tschottdorf I'm not sure I fully understand your answer. Regardless of which syntax we use for error messages they still need to be simplified and documented. Right now (correct me if I am wrong) we do not have a proper user-level explanation for each of these messages; nor a summary of what the user-programmer should do in response to each of them. This description work should precede IMHO any discussion about which format to report the errors in. |
@knz yes, just a heads up that the errors you collected will all melt into one (which then needs documentation). For example, I just scanned your list again and these two are "bug territory":
^- looks like an honest bug. Can you repro that?
^- this is #3087 I haven't been able to track down (failure to repro). |
After upgrading to a more recent version I can't find this specific
is that similar/equivalent?
I get this one when I forget to reset my clock drifting before starting It hasn't been happening as much lately (my code has gotten better at |
What I'm planning on doing is expose all the retry-able errors (mostly printed with "pq: retry" in the CLI) and some non-retry-able errors (from the KV perspective) (I think the ones printed with "pq: txn aborted") as a single error message, with a single error code, to the user. And then the user can issue something like So let's go through your list again:
|
Ah, ok. If you're letting your clocks drift close to or above the offset, then that's an expected error. In that issue, it happens during Docker acceptance tests, where all nodes share the same clock - no bueno.
This is what happens if the (central) transaction record indicates that you need to restart. It's a bit unusual to see in practice unless the transaction is taking a lot of time and isn't doing much - for example,
I actually know how this can happen, but I would still classify as bug: If the first write in a txn is multi-range and fails retryably on the second range, the command will fail with that error on the first range (since the previous attempt was actually successful there, so we have a transaction table entry from BeginTransaction already). I believe this should not reoccur when #4443 is fixed. |
( While looking at #4036 ) |
New error messages cropping up recently:
(happen when nodes become unreachable on the network during transactions - see also #5452 ) |
@knz, @andreimatei I moved this off of 1.0 milestone. If you disagree, please comment on how this work will be scheduled and reset the 1.0 milestone. |
We've made huge strides with this over the last 3 years, closing. |
As mentioned in today's meeting we should document the messages that SQL users may see through their connection. Right now the messages convey a lot of debugging information about the DB internals, although in many cases the proper course of action is to simply say "retry the transaction". So really what we should do is:
For point (1) it was suggested during the meeting to start by making an inventory. Here is what I directly experienced during my work on #4036:
This happens when the server already has reported an error and the client tries to send more queries instead of closing the failed transaction.
This happens when the cluster is not "on time" (with a big offset).
This happens when the cluster is not "on time" (with a small offset).
The text was updated successfully, but these errors were encountered: