Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test: TestPromptForConfirmation/case=reader_closed #4948

Closed
thaJeztah opened this issue Mar 17, 2024 · 7 comments · Fixed by #4957
Closed

flaky test: TestPromptForConfirmation/case=reader_closed #4948

thaJeztah opened this issue Mar 17, 2024 · 7 comments · Fixed by #4957

Comments

@thaJeztah
Copy link
Member

Description

This test looks to be flaky on macOS at least; seen it fail a couple of times.

ok  	github.com/docker/cli/cli	0.242s	coverage: 28.2% of statements
--- FAIL: TestPromptForConfirmation (0.38s)
    --- FAIL: TestPromptForConfirmation/case=reader_closed (0.19s)
        utils_test.go:222: PromptForConfirmation did not return after promptReader was closed
FAIL
panic: send on closed channel

goroutine 111 [running]:
github.com/docker/cli/cli/command_test.TestPromptForConfirmation.func8.2()
	/Users/runner/work/cli/cli/src/github.com/docker/cli/cli/command/utils_test.go:209 +0x6d
created by github.com/docker/cli/cli/command_test.TestPromptForConfirmation.func8 in goroutine 110
	/Users/runner/work/cli/cli/src/github.com/docker/cli/cli/command/utils_test.go:207 +0x3aa
@thaJeztah
Copy link
Member Author

cc @Benehiko

@Benehiko Benehiko self-assigned this Mar 18, 2024
@laurazard
Copy link
Member

Good point from @tonistiigi here – #4888 (comment).

A more holistic approach would be to always set up signal handling in the CLI, and use it to cancel the command's context, and just migrate most places in the codebase that are setting up their own signal handling to instead handle context cancellation – funny that we do this for plugins but not for anything else.

@thaJeztah
Copy link
Member Author

Ah, yes, I saw that comment, and wanted to ask @krissetto about it as well.

A more holistic approach would be to always set up signal handling in the CLI, and use it to cancel the command's context,

Sounds like a sensible approach yes (or at least, that's my first response; contexts are always fun)

@Benehiko
Copy link
Member

Benehiko commented Mar 18, 2024

Good point from @tonistiigi here – #4888 (comment).

A more holistic approach would be to always set up signal handling in the CLI, and use it to cancel the command's context, and just migrate most places in the codebase that are setting up their own signal handling to instead handle context cancellation – funny that we do this for plugins but not for anything else.

I think this can only work once we have wired up context.Context. There are some PRs I've already made for this to support otel tracing. So I would think we could move the signal termination out from the prompt function to the main function running the cobra command once these PRs are merged.

@krissetto
Copy link
Contributor

Good catch. I agree with always setting up a signal handler in the CLI and cancelling with the context, but i'm not sure how many changes it'd require in the codebase ATM to pass the context everywhere it might be needed.

I think what confused me was this part of the signal.NotifyContext docs:

The stop function unregisters the signal behavior, which, like signal.Reset, may restore the default behavior for a given signal. For example, the default behavior of a Go program receiving os.Interrupt is to exit. Calling NotifyContext(parent, os.Interrupt) will change the behavior to cancel the returned context.

I thought that once a signal got consumed, the original behavior would be restored (so the default signal handling behavior of the application).


That said, I think this specific panic is only related to the test, as the code itself doesn't explicitly close any channels (the result chan in this case) in a way that could result in this issue.

Particularly this part at lines 205-210 of utils_test.go might be at fault

result := make(chan promptResult, 1)
defer close(result)
go func() {
    r, err := command.PromptForConfirmation(ctx, promptReader, promptOut, "")
    result <- promptResult{r, err}
}()

@thaJeztah
Copy link
Member Author

Oh! I think I meant to @ @Benehiko, and for some reason pinged @krissetto 🙈 - well, both of you are here now, so all good 😂

@Benehiko
Copy link
Member

Looking at this error, I think the timeout might be too little for some machines. I've not been able to reproduce this flake locally on my laptop so the flake might be dependent on the machine running it.

I'm guessing the timeout for the context is reached (100 milliseconds) and then the test is ended with t.Fatal() in the process of this, the function executing the test exits closing the result channel. However, the goroutine is still running waiting for a result from the prompt function. The prompt does return eventually and tries to send on the closed result channel causing the program to panic.

This is a tricky one since there is no way to guarantee that closing the reader will exit the prompt in a given time frame since this would be dependent on the OS and machine hardware. I could prevent the panic by not closing the channels inside the test and leaving it up to the garbage collector and hope that an increase to say 500ms or 1 second timeout is enough for most cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants