You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Under normal circumstances it is highly desirable to let the binary under test in a go_test rule time out before Bazel allows the test to timeout. This is because when the binary under test times out (e.g. using -test.timeout), it dumps stacktraces and some other potentially useful info.
In cockroach we use this commit which checks the TEST_TIMEOUT and, when spawning the test subprocess, passes -test.timeout N where N is 3 seconds less than the TEST_TIMEOUT. (The 3 second figure is selected arbitrarily and is meant to give "enough time" for the binary to clean up.) This means that if you run bazel test :test_target --test_timeout 60, then the actual timeout will be 57 seconds (as 60 - 3 = 57).
While the disparity between the go test-level timeout and the Bazel-level timeout may be confusing at first, it creates a hugely superior UX. Compare the test.log output from a timed-out test without that patch:
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //pkg/cmd/dev:dev_test
-----------------------------------------------------------------------------
=== RUN TestDataDriven
-- Test timed out at 2023-11-07 17:49:38 UTC --
This says almost nothing except that a timeout occurred, and the test is not given any time to clean up before being unceremoniously killed. Compare that to with the patch:
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //pkg/cmd/dev:dev_test
-----------------------------------------------------------------------------
=== RUN TestDataDriven
panic: test timed out after 2s
running tests:
TestDataDriven (2s)
goroutine 7 [running]:
testing.(*M).startAlarm.func1()
GOROOT/src/testing/testing.go:2259 +0x320
created by time.goFunc
GOROOT/src/time/sleep.go:176 +0x38
goroutine 1 [chan receive]:
testing.(*T).Run(0x140000a7380, {0x1051300c0?, 0x66f0a118049?}, 0x10531a668)
...
Now we get stack traces.
This change carries two "regressions" in the UX:
The timeout you pass to --timeout is not the real timeout, and the "real" timeout is a few seconds shorter. In cockroach we have gotten used to this and factor it in, though the behavior could be confusing for others.
Timed-out tests show as FAILED instead of TIMEOUT in the Bazel-produced test summary which may be technically inaccurate.
For my taste, neither of these "regressions" outweighs the benefit gained from allowing the test process time to gracefully clean up, report stack traces, etc.
What version of rules_go are you using?
release-0.42
What version of gazelle are you using?
v0.33.0
What version of Bazel are you using?
6.2.1
Does this issue reproduce with the latest releases of all the above?
Yes
What operating system and processor architecture are you using?
MacOS/ARM64
Any other potentially useful information about your toolchain?
No
The text was updated successfully, but these errors were encountered:
The UX is much better that way, but I wonder whether there is a more correct implementation: When the timeout is exceeded, Bazel sends a SIGTERM to the process, followed by a SIGKILL after a grace period. Instead of anticipating the timeout, could we just build a signal handler into our test runner that panics on SIGTERM?
I am not sure how that could be implemented. Assuming it could be, that would indeed be preferable and would address the regressions I pointed out. We would probably also switch to doing that in cockroach as well in lieu of our patch.
Under normal circumstances it is highly desirable to let the binary under test in a
go_test
rule time out before Bazel allows the test to timeout. This is because when the binary under test times out (e.g. using-test.timeout
), it dumps stacktraces and some other potentially useful info.In
cockroach
we use this commit which checks theTEST_TIMEOUT
and, when spawning the test subprocess, passes-test.timeout N
whereN
is 3 seconds less than theTEST_TIMEOUT
. (The 3 second figure is selected arbitrarily and is meant to give "enough time" for the binary to clean up.) This means that if you runbazel test :test_target --test_timeout 60
, then the actual timeout will be 57 seconds (as 60 - 3 = 57).While the disparity between the
go test
-level timeout and the Bazel-level timeout may be confusing at first, it creates a hugely superior UX. Compare thetest.log
output from a timed-out test without that patch:This says almost nothing except that a timeout occurred, and the test is not given any time to clean up before being unceremoniously killed. Compare that to with the patch:
Now we get stack traces.
This change carries two "regressions" in the UX:
--timeout
is not the real timeout, and the "real" timeout is a few seconds shorter. Incockroach
we have gotten used to this and factor it in, though the behavior could be confusing for others.FAILED
instead ofTIMEOUT
in the Bazel-produced test summary which may be technically inaccurate.For my taste, neither of these "regressions" outweighs the benefit gained from allowing the test process time to gracefully clean up, report stack traces, etc.
What version of rules_go are you using?
release-0.42
What version of gazelle are you using?
v0.33.0
What version of Bazel are you using?
6.2.1
Does this issue reproduce with the latest releases of all the above?
Yes
What operating system and processor architecture are you using?
MacOS/ARM64
Any other potentially useful information about your toolchain?
No
The text was updated successfully, but these errors were encountered: