-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: prioritize TransactionAbortedError above all others #44460
kv: prioritize TransactionAbortedError above all others #44460
Conversation
cc @knz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/txn_coord_sender.go, line 767 at r1 (raw file):
// error, the transaction has been rolled back and the state was updated in // handleRetryableErrLocked(). if err.Transaction.ID == ba.Txn.ID {
Drive by comment: should we push these if err.Transaction.ID == ba.Txn.ID {
checks into handleRetryableErrLocked
? It's strange to me that we handle the inverse case (the txn is aborted) in there but this case at each caller.
pkg/roachpb/errors.go, line 79 at r1 (raw file):
// ignore them). Also, the TxnCoordSender likes to assume that a // TransactionAbortedError is the only way it finds about an aborted // transaction, and so it benefits about all other errors being merged into a
s/it benefits about/it benefits from/
The DistSender and the DistSQLReceiver need to choose one error among possibly multiple to hand out to a client. Before this patch, non-retriable errors would take priority over a TransactionAbortedError. This was a problem because the TxnCoordSender assumes that the only way it'll find out about an aborted transaction is through a TransactionAbortedError. When that assumption fails, the TxnCoordSender doesn't stop the heartbeat loop, but the heartbeat loop reasonably doesn't like to run on an aborted txn. This patch makes the TransactionAbortedError have the highest priority, thus hopefully making the TxnCoordSender's assumption valid. Besides fixing this bug, prioritizing TransactionAbortedError generally seems like the right thing to do, as it puts us in a better position to allow upcoming savepoints to rollback past errors: one cannot rollback past a TransactionAbortedError and so it seems like a good idea to prefer to return this error. From the perspective of a particular transaction, a TransactionAbortedError is the most unrecoverable error of them all (even though it is retriable at a higher level). Fixes cockroachdb#43879 Release note: None
a1325b0
to
a3538bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/txn_coord_sender.go, line 767 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Drive by comment: should we push these
if err.Transaction.ID == ba.Txn.ID {
checks intohandleRetryableErrLocked
? It's strange to me that we handle the inverse case (the txn is aborted) in there but this case at each caller.
sounds good. I'll do it separately.
pkg/roachpb/errors.go, line 79 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/it benefits about/it benefits from/
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/txn_coord_sender.go, line 767 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
sounds good. I'll do it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
44457: colexec: reduce test runtime r=yuzefovich a=asubiotto **colexec: reduce maximum number of outputs in TestHashRouterRandom** We would previously create up to coldata.BatchSize() possible outputs for a HashRouter. This ends up creating one goroutine per output, which would slow down the runtime of this test dramatically. Since in a production deployment, this is the number of nodes, the number of outputs has been limited to 128. Release note: None (testing change) **colexec: iterate over less possible batch sizes during tests** For any call to runTests, the test itself would be run under a couple varying input batch sizes from 1 up to coldata.BatchSize(). Since we added randomization of the batch size to tests, this loop results in tests taking longer than necessary since the cases with differing input batch sizes would be equivalent to running the tests with a different coldata.BatchSize(). The exception to this is testing with a smaller input batch size than coldata.BatchSize() which is tested by keeping the test of using batchSize=1. runTests therefore now runs with two input batch sizes only: 1 and coldata.BatchSize(). It is difficult to compare the runtime of different `make test` invocations due to the amount of randomization in the package, but this should provide a nice improvement in the time taken to run tests. Release note: None (testing change) 44460: kv: prioritize TransactionAbortedError above all others r=andreimatei a=andreimatei The DistSender and the DistSQLReceiver need to choose one error among possibly multiple to hand out to a client. Before this patch, non-retriable errors would take priority over a TransactionAbortedError. This was a problem because the TxnCoordSender assumes that the only way it'll find out about an aborted transaction is through a TransactionAbortedError. When that assumption fails, the TxnCoordSender doesn't stop the heartbeat loop, but the heartbeat loop reasonably doesn't like to run on an aborted txn. This patch makes the TransactionAbortedError have the highest priority, thus hopefully making the TxnCoordSender's assumption valid. Besides fixing this bug, prioritizing TransactionAbortedError generally seems like the right thing to do, as it puts us in a better position to allow upcoming savepoints to rollback past errors: one cannot rollback past a TransactionAbortedError and so it seems like a good idea to prefer to return this error. From the perspective of a particular transaction, a TransactionAbortedError is the most unrecoverable error of them all (even though it is retriable at a higher level). Fixes #43879 Release note: None Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Build succeeded |
The DistSender and the DistSQLReceiver need to choose one error among
possibly multiple to hand out to a client. Before this patch,
non-retriable errors would take priority over a TransactionAbortedError.
This was a problem because the TxnCoordSender assumes that the only way
it'll find out about an aborted transaction is through a
TransactionAbortedError. When that assumption fails, the TxnCoordSender
doesn't stop the heartbeat loop, but the heartbeat loop reasonably
doesn't like to run on an aborted txn.
This patch makes the TransactionAbortedError have the highest priority,
thus hopefully making the TxnCoordSender's assumption valid.
Besides fixing this bug, prioritizing TransactionAbortedError generally
seems like the right thing to do, as it puts us in a better position to
allow upcoming savepoints to rollback past errors: one cannot rollback
past a TransactionAbortedError and so it seems like a good idea to
prefer to return this error. From the perspective of a particular
transaction, a TransactionAbortedError is the most unrecoverable error
of them all (even though it is retriable at a higher level).
Fixes #43879
Release note: None