Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a flaky shell test #2311

Merged
merged 4 commits into from
Aug 22, 2023
Merged

Fix a flaky shell test #2311

merged 4 commits into from
Aug 22, 2023

Conversation

triarius
Copy link
Contributor

@triarius triarius commented Aug 22, 2023

The shell test ran the test binary in a separate process. The test binary is supposed to sleep forever, but the go runtime terminates it because all its goroutines are idle. This is because the test binary is not doing anything in the main goroutine. This PR fixes the issue by making the test binary do something in the main goroutine.

I've also kept the changes I made to make the test more debuggable. Here is the output from when I debugged. Note the output fatal error: all goroutines are asleep - deadlock!.

DONE 1 tests in 3.267s
gotestsum -- -count=1 -v ./internal/job/shell -run ^TestLockFileRetriesAndTimesOut$
✖  internal/job/shell (14ms)

=== Failed
=== FAIL: internal/job/shell TestLockFileRetriesAndTimesOut (0.01s)
    shell_test.go:319: acquiring lock in other process: /tmp/shelltest3604204178/my.lock
2023/08/22 13:20:35 Locking /tmp/shelltest3604204178/my.lock
2023/08/22 13:20:35 Acquired lock /tmp/shelltest3604204178/my.lock
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [select (no cases)]:
github.com/buildkite/agent/v3/internal/job/shell_test.acquiringLockHelperProcess()
	/home/narthana/devel/buildkite/agent/internal/job/shell/main_test.go:49 +0x2d1
github.com/buildkite/agent/v3/internal/job/shell_test.TestMain(0xc0000666f0?)
	/home/narthana/devel/buildkite/agent/internal/job/shell/main_test.go:23 +0x4f
main.main()
	_testmain.go:77 +0x1c6
    shell_test.go:324: assertion failed: error is nil, not "context deadline exceeded" (context.DeadlineExceeded context.deadlineExceededError)

DONE 1 tests, 1 failure in 0.229s

@triarius triarius requested a review from a team August 22, 2023 03:22
Co-authored-by: Josh Deprez <jd@buildkite.com>
if err != nil {
return cmd, err
}
assert.NilError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this fails the test immediately if err != nil? (Looks like it... had to look it up since I'm not yet familiar with this particular assert lib). Then there would be no need to have an error return on the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. I've made it not return the error. If the next assertion passes, then it's a false positive. So I think it's fine to exit early.

It does not need to return an error as it fails the test immediately if it an error occurs within it.
@triarius triarius merged commit 05e681e into main Aug 22, 2023
1 check passed
@triarius triarius deleted the pdp-1494-fix-flaky-shell-test branch August 22, 2023 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants