Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: Txn.CleanupOnError logs if transaction is already aborted #10490

Closed
bdarnell opened this issue Nov 6, 2016 · 5 comments
Closed

client: Txn.CleanupOnError logs if transaction is already aborted #10490

bdarnell opened this issue Nov 6, 2016 · 5 comments
Assignees
Milestone

Comments

@bdarnell
Copy link
Contributor

bdarnell commented Nov 6, 2016

Txn.CleanupOnError doesn't return an error, but it calls log.Errorf if an error occurs during rollback, which will happen if the transaction is already known to be aborted (this error comes from the client package; no request is sent to the server in this case).

We should at least handle the case of an already-aborted transaction silently, and for other kinds of errors log.Error is probably not the right way to handle it - this should probably go into the trace and not the server's log.

@petermattis petermattis added this to the 1.0 milestone Feb 23, 2017
@dianasaur323
Copy link
Contributor

@bdarnell Assigning you on this, but please feel free to reassign!

@a-robinson
Copy link
Contributor

I don't fully understand the interaction between different errors here. It doesn't look possible to get a proper TransactionAbortedError from the rollback code within txn.CleanupOnError. Are the "already-aborted" transactions you're talking about hitting the "does not exist" errors that I see on some of our production nodes?

@a-robinson
Copy link
Contributor

@bdarnell ^

@bdarnell
Copy link
Contributor Author

The error that prompted this issue was

internal/client/txn.go:602: attempting to use transaction with wrong status or finalized: ABORTED; abort caused by: retry txn "sql/executor.go:464 sql txn" id=b31485b7 key=/Table/51/1/159918331268857857/0 rw=true pri=0.02860422 iso=SERIALIZABLE stat=PENDING epo=1 ts=1468873601.672242849,1 orig=1468873601.672242849,1 max=1468873601.909616479,0 wto=false rop=false

This might be some sort of concurrency problem because we check for Status == PENDING in CleanupOnError. It may also no longer be an issue any more because there have been other changes to client.Txn (including Nathan's work for concurrency).

This is mainly a log spam issue - if it's happening a lot, we should probably change it from log.Errorf to log.Eventf. If it's no longer happening, we can leave it alone and close this issue. Which prod cluster were you seeing this on?

@a-robinson
Copy link
Contributor

Hm, I didn't see any of those.

On blue, I only saw two rollback errors, caused by deadlines being exceeded.
On cobalt, there weren't any rollback errors.
On adriatic, there were hundreds of rollback errors caused by the transaction not existing.
On indigo, there are hundreds of "does not exist" errors on 2 of the 9 nodes.

Although I decided to check cyan now as well, and it does have a single instance of the transaction already being aborted. It has 5-10 "already committed" errors per node, though.

None of this seems particularly bad to me. If anything, I'd say the error really worth quieting down a bit is the "does not exist" error, since it appears that the draining process before shutdown can lead to a lot of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants