Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: fix issues with leaving invoke containers running #1257

Merged
merged 1 commit into from Aug 9, 2022

Conversation

tonistiigi
Copy link
Member

Currently, when running --invoke, there exist the following issues:

When invoke container is not shut down by gracefully killing the container process itself then the container is left running in buildkit. This means that it also holds onto resources and cache records in buildx du do not become releasable. To reproduce, switch IO to monitor mode and then type "exit" or ctrl-D/SIGINT.

When switching IO to monitor mode and issuing "rollback" command the buildx process goes to 100% CPU (200% on repeat). Looks like there is a busy loop.

The release problem should be fixed on buildkit side as well as buildkit should handle misbehaving clients. I plan to fix it but we need this for old daemons anyway.

Tbh, I never figured out where the 100% CPU was coming from. Profiling CPU/Blocking was quite useless showing lots of work in stdlib runtime.selectgo but no call path. I tried to debug the new for-select code in monitor, but I couldn't confirm that it started to loop. But after I fixed the release bug the 100% CPU was gone as well.

@ktock

Signed-off-by: Tonis Tiigi tonistiigi@gmail.com

@tonistiigi tonistiigi added this to the v0.9.0 milestone Aug 8, 2022
@@ -643,7 +643,16 @@ func Invoke(ctx context.Context, cfg ContainerConfig) error {
return errors.Errorf("result must be provided")
}
c, res := cfg.ResultCtx.Client, cfg.ResultCtx.Res
_, err := c.Build(ctx, client.SolveOpt{}, "buildx", func(ctx context.Context, c gateway.Client) (*gateway.Result, error) {

mainCtx := ctx
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth looking into this case in buildkit client later on so hacks like this are not required. It looks like that if this context gets called in a certain timeframe the /solve grpc call is canceled before the callback returns, causing some calls inside callback to fail with "no such job". Build() should never return if the callback has not returned, even if the grpc already goes down.

@@ -673,7 +682,8 @@ func Invoke(ctx context.Context, cfg ContainerConfig) error {
if err != nil {
return nil, err
}
defer ctr.Release(ctx)
defer ctr.Release(context.TODO())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, if ctx is already canceled then this does nothing.

go func() {
m.rollback(ctx, containerConfig)
}()
m.rollback(ctx, containerConfig)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really part of the fix but seemed useless goroutine.

@@ -52,6 +50,10 @@ func RunMonitor(ctx context.Context, containerConfig build.ContainerConfig, relo
go func() {
defer close(doneCh)
defer in.Close()
go func() {
<-ctx.Done()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to handle appcontext canceling from sigint. Otherwise this goroutine will keep running although everything else shuts down. Worth thinking through this goroutine structure again to make the cancel path more clear and avoid parallel goroutines with no defined shutdown order. Atm there is a chance for a small race if this happens when the initial Invoke() call is also in the middle of processing.

Copy link
Collaborator

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants