Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Deployers in 1.24.0 Causes Logging shutdown issue #5840

Closed
dhodun opened this issue May 13, 2021 · 3 comments · Fixed by #5858
Closed

Multiple Deployers in 1.24.0 Causes Logging shutdown issue #5840

dhodun opened this issue May 13, 2021 · 3 comments · Fixed by #5858
Assignees
Labels
kind/bug Something isn't working priority/p1 High impact feature/bug.

Comments

@dhodun
Copy link
Collaborator

dhodun commented May 13, 2021

Expected behavior

Ctl + C at the end of a Skaffold Dev loop shuts down appropriately

Actual behavior

Ctl + C causes an immediate failure and stack trace of Skaffold

Information

  • Skaffold version: 1.24
  • Operating system: MacOs Big Sur 11.2.3
  • Installed via: curl

Steps to reproduce the behavior

  1. skaffold dev when using more than one deployer (we're using helm and kustomize)
  2. After everything stabilizes, CTL + C

May be related to

#5809

Logs

^Cpanic: close of closed channel

goroutine 1 [running]:
github.com/GoogleContainerTools/skaffold/pkg/skaffold/kubernetes/logger.(*LogAggregator).StopLogger(0xc000a401e0)
/skaffold/pkg/skaffold/kubernetes/logger/log.go:145 +0x34
github.com/GoogleContainerTools/skaffold/pkg/skaffold/deploy.DeployerMux.StopLogger(0xc000840c00, 0x7, 0x8)
/skaffold/pkg/skaffold/deploy/deploy_mux.go:102 +0x5a
github.com/GoogleContainerTools/skaffold/pkg/skaffold/runner/v1.(*SkaffoldRunner).Dev(0xc00088a900, 0x62767e0, 0xc00022e140, 0x621fe60, 0xc00051e810, 0xc000618180, 0x7, 0x8, 0x0, 0x0)
/skaffold/pkg/skaffold/runner/v1/dev.go:313 +0x182f
github.com/GoogleContainerTools/skaffold/cmd/skaffold/app/cmd.runDev.func5(0x62a3800, 0xc00088a900, 0xc0006a0d00, 0x9, 0x10, 0x2, 0x2)
/skaffold/cmd/skaffold/app/cmd/dev.go:68 +0x276
github.com/GoogleContainerTools/skaffold/cmd/skaffold/app/cmd.withRunner(0x62767e0, 0xc00022e140, 0x621fe60, 0xc00051e810, 0xc000d5bbc0, 0x0, 0xc)
/skaffold/cmd/skaffold/app/cmd/runner.go:55 +0x114
github.com/GoogleContainerTools/skaffold/cmd/skaffold/app/cmd.runDev(0x62767e0, 0xc00022e140, 0x621fe60, 0xc00051e810, 0x0, 0x0)
/skaffold/cmd/skaffold/app/cmd/dev.go:63 +0x178
github.com/GoogleContainerTools/skaffold/cmd/skaffold/app/cmd.(*builder).NoArgs.func1(0xc0003fc500, 0xc0009cba40, 0x0, 0x3, 0x0, 0x0)
/skaffold/cmd/skaffold/app/cmd/commands.go:129 +0x98
github.com/spf13/cobra.(*Command).execute(0xc0003fc500, 0xc0009cba10, 0x3, 0x3, 0xc0003fc500, 0xc0009cba10)
/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:852 +0x453
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003fc000, 0x10, 0x0, 0x0)
/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
github.com/spf13/cobra.(*Command).ExecuteContext(...)
/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:890
github.com/GoogleContainerTools/skaffold/cmd/skaffold/app.Run(0x621f1a0, 0xc00012e008, 0x621f1a0, 0xc00012e010, 0x0, 0x0)
/skaffold/cmd/skaffold/app/skaffold.go:47 +0x13d
main.main()
/skaffold/cmd/skaffold/skaffold.go:37 +0x5c

@MarlonGamez MarlonGamez added kind/bug Something isn't working priority/p1 High impact feature/bug. labels May 13, 2021
@dhodun
Copy link
Collaborator Author

dhodun commented May 13, 2021

Strangely enough, I didn't see this issue when running skaffold debug from within Cloud Code, using 1.24.

@nkubala nkubala self-assigned this May 13, 2021
@nkubala
Copy link
Contributor

nkubala commented May 13, 2021

this is happening because in #5809, we changed the control of the logger to be transferred to the individual deployers, meaning each one is responsible for stopping its own logger. however, when we have multiple deployers sharing the same logger, that Stop() call results in an already closed channel having close() called it again, which panics.

the fix here is to implement the follow up work from #5809 (comment), which will prevent multiple calls to Stop() or any other methods that shouldn't be called multiple times.

@gsquared94
Copy link
Collaborator

gsquared94 commented May 17, 2021

This can be reproduced in examples/multi-config-microservices.
Also the port-forwarder fails intermittently, I think due to the same issue.

...
Deployments stabilized in 1.221 second
Watching for changes...
port forwarding deployment-leeroy-web-default-8080 got terminated: output: error: error upgrading connection: unable to upgrade connection: pod not found ("leeroy-web-7c5768c696-lf2vd_default")

Port forwarding deployment/leeroy-app in namespace default, remote port http -> 127.0.0.1:9001
Port forwarding service/leeroy-app in namespace default, remote port 50051 -> 127.0.0.1:50053

The recent changes in #5809 and others moved the logAggregator and portForwarder inside the Deployer. But in deployer_mux.go each registered deployer calls the Start and Stop methods multiple times. These invocations resolve to a single instance of logAggregator and portForwarder and those aren't implemented to be tolerant to multiple invocations.

I feel it's best to revert these series of changes to a prior stable state and rethink the implementation. Also discussed with @briandealwis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working priority/p1 High impact feature/bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants