Improvement/feature request: consider allowing process under test to timeout BEFORE Bazel-level `--test_timeout` #3746

rickystewart · 2023-11-07T17:53:58Z

Under normal circumstances it is highly desirable to let the binary under test in a go_test rule time out before Bazel allows the test to timeout. This is because when the binary under test times out (e.g. using -test.timeout), it dumps stacktraces and some other potentially useful info.

In cockroach we use this commit which checks the TEST_TIMEOUT and, when spawning the test subprocess, passes -test.timeout N where N is 3 seconds less than the TEST_TIMEOUT. (The 3 second figure is selected arbitrarily and is meant to give "enough time" for the binary to clean up.) This means that if you run bazel test :test_target --test_timeout 60, then the actual timeout will be 57 seconds (as 60 - 3 = 57).

While the disparity between the go test-level timeout and the Bazel-level timeout may be confusing at first, it creates a hugely superior UX. Compare the test.log output from a timed-out test without that patch:

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //pkg/cmd/dev:dev_test
-----------------------------------------------------------------------------
=== RUN   TestDataDriven
-- Test timed out at 2023-11-07 17:49:38 UTC --

This says almost nothing except that a timeout occurred, and the test is not given any time to clean up before being unceremoniously killed. Compare that to with the patch:

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //pkg/cmd/dev:dev_test
-----------------------------------------------------------------------------
=== RUN   TestDataDriven
panic: test timed out after 2s
running tests:
	TestDataDriven (2s)

goroutine 7 [running]:
testing.(*M).startAlarm.func1()
	GOROOT/src/testing/testing.go:2259 +0x320
created by time.goFunc
	GOROOT/src/time/sleep.go:176 +0x38

goroutine 1 [chan receive]:
testing.(*T).Run(0x140000a7380, {0x1051300c0?, 0x66f0a118049?}, 0x10531a668)
...

Now we get stack traces.

This change carries two "regressions" in the UX:

The timeout you pass to --timeout is not the real timeout, and the "real" timeout is a few seconds shorter. In cockroach we have gotten used to this and factor it in, though the behavior could be confusing for others.
Timed-out tests show as FAILED instead of TIMEOUT in the Bazel-produced test summary which may be technically inaccurate.

For my taste, neither of these "regressions" outweighs the benefit gained from allowing the test process time to gracefully clean up, report stack traces, etc.

What version of rules_go are you using?

release-0.42

What version of gazelle are you using?

v0.33.0

What version of Bazel are you using?

6.2.1

Does this issue reproduce with the latest releases of all the above?

Yes

What operating system and processor architecture are you using?

MacOS/ARM64

Any other potentially useful information about your toolchain?

No

The text was updated successfully, but these errors were encountered:

fmeum · 2023-11-08T16:03:04Z

The UX is much better that way, but I wonder whether there is a more correct implementation: When the timeout is exceeded, Bazel sends a SIGTERM to the process, followed by a SIGKILL after a grace period. Instead of anticipating the timeout, could we just build a signal handler into our test runner that panics on SIGTERM?

rickystewart · 2023-11-08T22:59:58Z

I am not sure how that could be implemented. Assuming it could be, that would indeed be preferable and would address the regressions I pointed out. We would probably also switch to doing that in cockroach as well in lieu of our patch.

rickystewart mentioned this issue Nov 7, 2023

bazel: use different approach for test timeouts cockroachdb/cockroach#113945

Merged

fmeum mentioned this issue Nov 9, 2023

Panic when a test times out #3749

Merged

fmeum closed this as completed in #3749 Dec 11, 2023

rickystewart mentioned this issue Feb 29, 2024

Test binaries swallow SIGTERM; printed stack traces misleading #3875

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement/feature request: consider allowing process under test to timeout BEFORE Bazel-level `--test_timeout` #3746

Improvement/feature request: consider allowing process under test to timeout BEFORE Bazel-level `--test_timeout` #3746

rickystewart commented Nov 7, 2023

fmeum commented Nov 8, 2023

rickystewart commented Nov 8, 2023

Improvement/feature request: consider allowing process under test to timeout BEFORE Bazel-level --test_timeout #3746

Improvement/feature request: consider allowing process under test to timeout BEFORE Bazel-level --test_timeout #3746

Comments

rickystewart commented Nov 7, 2023

What version of rules_go are you using?

What version of gazelle are you using?

What version of Bazel are you using?

Does this issue reproduce with the latest releases of all the above?

What operating system and processor architecture are you using?

Any other potentially useful information about your toolchain?

fmeum commented Nov 8, 2023

rickystewart commented Nov 8, 2023

Improvement/feature request: consider allowing process under test to timeout BEFORE Bazel-level `--test_timeout` #3746

Improvement/feature request: consider allowing process under test to timeout BEFORE Bazel-level `--test_timeout` #3746