-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: ensure auto-retrying transactions respect the statement_timeout #53968
Conversation
@andreimatei I know we talked about moving making cockroach/pkg/sql/conn_executor_exec.go Line 618 in 41375eb
queryDone function for now.
|
Thanks for your work @arulajmani! Heads up I've added this PR to the list of requested backports to the 20.2 branch. #53662 |
pkg/sql/run_control_test.go
Outdated
t.Fatal(err) | ||
} | ||
|
||
timer := time.AfterFunc(1*time.Second, func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you add a comment here mentioning that this is here to verify that the timeout error is returned before the 2 second interval passed to force_retry
elapses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @arulajmani, and @rafiss)
pkg/sql/conn_executor_exec.go, line 394 at r1 (raw file):
// There's no need to proceed with execution if the timer has already expired. if timerDuration < 0 { ex.cancelQuery(stmt.queryID)
Is this ex.cancelQuery()
call necessary? If it isn't, I wouldn't do it since it just leads to questions about what async work are we canceling.
pkg/sql/run_control_test.go, line 869 at r1 (raw file):
_, err = conn.QueryContext(ctx, `SET statement_timeout = '0.1s'`) if err != nil {
nit: require.NoError(t, err)
pkg/sql/run_control_test.go, line 873 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: could you add a comment here mentioning that this is here to verify that the timeout error is returned before the 2 second interval passed to
force_retry
elapses?
better yet, get rid of this since it's broken anyway (can't t.Fatal()
on any goroutine but the main one) and just measure the elapsed time until you get the error. No need for asynchrony.
pkg/sql/run_control_test.go, line 878 at r1 (raw file):
_, err = conn.QueryContext(ctx, `SELECT crdb_internal.force_retry('2s')`) if !testutils.IsError(err, "pq: query execution canceled due to statement timeout") {
just fyi, consider require.Regexp(t, "pq: ...", err)
pkg/sql/run_control_test.go, line 885 at r1 (raw file):
timer.Stop() // Same test as above, except in an explicit transaction.
instead of repeating, use testutils.RunTrueAndFalse()
and construct the query dynamically.
32fbf08
to
1b8f903
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/sql/conn_executor_exec.go, line 394 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Is this
ex.cancelQuery()
call necessary? If it isn't, I wouldn't do it since it just leads to questions about what async work are we canceling.
It was in the earlier structure, where the assumption was query timeouts were a special case of cancelled queries. I've untangled that bit by checking and overwriting for timed out queries outside the cancelled queries check in the queryDone
function, which allows me to get rid of this ex.cancelQuery
call here.
pkg/sql/run_control_test.go, line 878 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
just fyi, consider
require.Regexp(t, "pq: ...", err)
Done.
pkg/sql/run_control_test.go, line 885 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
instead of repeating, use
testutils.RunTrueAndFalse()
and construct the query dynamically.
Much cleaner, done!
1b8f903
to
0bd69ca
Compare
bors r=andreimatei |
Build failed (retrying...): |
bors r- looks like this is failing CI |
Canceled. |
Previously, if a statement exceeded the statement_timnout in the conn_executor, but failed with a retryable error, we would still retry it. This was because even though we overrode the error, we didn't override the `fms.Event` and `fsm.EventPayload` which actually dictates the next transition. This patch overrides them, thereby ensuring that even if a timed out query/cancelled query encountered a retryable error, we do not transition into retrying it. Fixes cockroachdb#52845 Release justification: low risk, high benefit changes to existing functionality Release note (bug fix): queries that can be automatically retried did not respect the `statement_timeout` earlier, which is now fixed.
0bd69ca
to
9f2ba69
Compare
Oops looks like I was branched off quite an old master, the test interface I was using was apparently changed. bors r=andreimatei |
Build succeeded: |
do you think the test failure in https://teamcity.cockroachdb.com/viewLog.html?buildId=2269713&buildTypeId=Cockroach_UnitTests is related to this PR? |
This PR might have made that test flaky, as 1ms may not be enough time to execute |
Let's also backport this PR and the test change as well (to the 20.2 branch) |
Previously, if a user set a really low statement_timeout value, there would be no way to reset it/remove the statement_timeout entirely. To get around this, all `SET` statements are now exempt from the statement timeout.This should also fix some of the flakes we were seeing in checks were decoupled in cockroachdb#53968. `SET` statements aren't canceled as no one checks for a canceled context, which meant this exemption existed implicitly before cockroachdb#53968. Closes cockroachdb#54372 Release note: None
Previously, if a user set a really low statement_timeout value, there would be no way to reset it/remove the statement_timeout entirely. To get around this, all `SET` statements are now exempt from the statement timeout.This should also fix some of the flakes we were seeing in checks were decoupled in cockroachdb#53968. `SET` statements aren't canceled as no one checks for a canceled context, which meant this exemption existed implicitly before cockroachdb#53968. Closes cockroachdb#54372 Release note: None
54415: sql: exempt `SET` statements from the statement_timeout r=andreimatei a=arulajmani Previously, if a user set a really low statement_timeout value, there would be no way to reset it/remove the statement_timeout entirely. To get around this, all `SET` statements are now exempt from the statement timeout.This should also fix some of the flakes we were seeing in checks were decoupled in #53968. `SET` statements aren't canceled as no one checks for a canceled context, which meant this exemption existed implicitly before #53968. Closes #54372 Release note: None Co-authored-by: arulajmani <arulajmani@gmail.com>
Previously, if a user set a really low statement_timeout value, there would be no way to reset it/remove the statement_timeout entirely. To get around this, all `SET` statements are now exempt from the statement timeout.This should also fix some of the flakes we were seeing in checks were decoupled in cockroachdb#53968. `SET` statements aren't canceled as no one checks for a canceled context, which meant this exemption existed implicitly before cockroachdb#53968. Closes cockroachdb#54372 Release note: None
Previously, if a user set a really low statement_timeout value, there would be no way to reset it/remove the statement_timeout entirely. To get around this, all `SET` statements are now exempt from the statement timeout.This should also fix some of the flakes we were seeing in checks were decoupled in cockroachdb#53968. `SET` statements aren't canceled as no one checks for a canceled context, which meant this exemption existed implicitly before cockroachdb#53968. Closes cockroachdb#54372 Release note: None
Previously, if a statement exceeded the statement_timnout in the
conn_executor, but failed with a retryable error, we would still retry
it. This was because even though we overrode the error, we didn't
override the
fms.Event
andfsm.EventPayload
which actually dictatesthe next transition. This patch overrides them, thereby ensuring that
even if a timed out query/cancelled query encountered a retryable
error, we do not transition into retrying it.
Fixes #52845
Release justification: low risk, high benefit changes to existing functionality
Release note (bug fix): queries that can be automatically retried did
not respect the
statement_timeout
earlier, which is now fixed.