Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement/feature request: consider allowing process under test to timeout BEFORE Bazel-level --test_timeout #3746

Closed
rickystewart opened this issue Nov 7, 2023 · 2 comments · Fixed by #3749

Comments

@rickystewart
Copy link
Contributor

Under normal circumstances it is highly desirable to let the binary under test in a go_test rule time out before Bazel allows the test to timeout. This is because when the binary under test times out (e.g. using -test.timeout), it dumps stacktraces and some other potentially useful info.

In cockroach we use this commit which checks the TEST_TIMEOUT and, when spawning the test subprocess, passes -test.timeout N where N is 3 seconds less than the TEST_TIMEOUT. (The 3 second figure is selected arbitrarily and is meant to give "enough time" for the binary to clean up.) This means that if you run bazel test :test_target --test_timeout 60, then the actual timeout will be 57 seconds (as 60 - 3 = 57).

While the disparity between the go test-level timeout and the Bazel-level timeout may be confusing at first, it creates a hugely superior UX. Compare the test.log output from a timed-out test without that patch:

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //pkg/cmd/dev:dev_test
-----------------------------------------------------------------------------
=== RUN   TestDataDriven
-- Test timed out at 2023-11-07 17:49:38 UTC --

This says almost nothing except that a timeout occurred, and the test is not given any time to clean up before being unceremoniously killed. Compare that to with the patch:

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //pkg/cmd/dev:dev_test
-----------------------------------------------------------------------------
=== RUN   TestDataDriven
panic: test timed out after 2s
running tests:
	TestDataDriven (2s)

goroutine 7 [running]:
testing.(*M).startAlarm.func1()
	GOROOT/src/testing/testing.go:2259 +0x320
created by time.goFunc
	GOROOT/src/time/sleep.go:176 +0x38

goroutine 1 [chan receive]:
testing.(*T).Run(0x140000a7380, {0x1051300c0?, 0x66f0a118049?}, 0x10531a668)
...

Now we get stack traces.

This change carries two "regressions" in the UX:

  1. The timeout you pass to --timeout is not the real timeout, and the "real" timeout is a few seconds shorter. In cockroach we have gotten used to this and factor it in, though the behavior could be confusing for others.
  2. Timed-out tests show as FAILED instead of TIMEOUT in the Bazel-produced test summary which may be technically inaccurate.

For my taste, neither of these "regressions" outweighs the benefit gained from allowing the test process time to gracefully clean up, report stack traces, etc.

What version of rules_go are you using?

release-0.42

What version of gazelle are you using?

v0.33.0

What version of Bazel are you using?

6.2.1

Does this issue reproduce with the latest releases of all the above?

Yes

What operating system and processor architecture are you using?

MacOS/ARM64

Any other potentially useful information about your toolchain?

No

@fmeum
Copy link
Collaborator

fmeum commented Nov 8, 2023

The UX is much better that way, but I wonder whether there is a more correct implementation: When the timeout is exceeded, Bazel sends a SIGTERM to the process, followed by a SIGKILL after a grace period. Instead of anticipating the timeout, could we just build a signal handler into our test runner that panics on SIGTERM?

@rickystewart
Copy link
Contributor Author

I am not sure how that could be implemented. Assuming it could be, that would indeed be preferable and would address the regressions I pointed out. We would probably also switch to doing that in cockroach as well in lieu of our patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants