-
Notifications
You must be signed in to change notification settings - Fork 626
Description
Contributing guidelines
- I've read the contributing guidelines and wholeheartedly agree
I've found a bug and checked that ...
- ... the documentation does not mention anything about my problem
- ... there are no open or closed issues that are related to my problem
Description
In our CI we setup buildx builder with 2 nodes - arm64 and amd64 to build multi-arch docker images. CI runs docker inspect --bootstrap to check that both of them are up and ready. When one node fails (because of a timeout) the inspect command ignores it, prints the summary and completes with exit code 0.
Expected behaviour
docker inspect --bootstrap should exit with non-zero exit-code when one of the nodes fails to start. When command fails it's possible to catch a non-zero status code and retry bootstraping the nodes.
Actual behaviour
When single node fails the driver.Boot returns an error which is stored in the channel errCh. printer.Wait() never returns an error. So err is nil and len(errCh) == 1. So the last return statement is executed with true, nil. For more details see my gist with minimal reproducible example.
Lines 178 to 206 in 65c4756
| baseCtx := ctx | |
| eg, _ := errgroup.WithContext(ctx) | |
| errCh := make(chan error, len(toBoot)) | |
| for _, idx := range toBoot { | |
| func(idx int) { | |
| eg.Go(func() error { | |
| pw := progress.WithPrefix(printer, b.NodeGroup.Nodes[idx].Name, len(toBoot) > 1) | |
| _, err := driver.Boot(ctx, baseCtx, b.nodes[idx].Driver, pw) | |
| if err != nil { | |
| b.nodes[idx].Err = err | |
| errCh <- err | |
| } | |
| return nil | |
| }) | |
| }(idx) | |
| } | |
| err = eg.Wait() | |
| close(errCh) | |
| err1 := printer.Wait() | |
| if err == nil { | |
| err = err1 | |
| } | |
| if err == nil && len(errCh) == len(toBoot) { | |
| return false, <-errCh | |
| } | |
| return true, err | |
| } |
printer.Wait():
buildx/util/progress/printer.go
Lines 39 to 45 in 65c4756
| func (p *Printer) Wait() error { | |
| p.closeOnce.Do(func() { | |
| close(p.status) | |
| <-p.done | |
| }) | |
| return p.err | |
| } |
pw.err is assigned here
buildx/util/progress/printer.go
Line 133 in 65c4756
| pw.warnings, pw.err = d.UpdateFrom(ctx, pw.status) |
but UpdateFrom never sets the error, except the ctx.Done case.
| func (d Display) UpdateFrom(ctx context.Context, ch chan *client.SolveStatus) ([]client.VertexWarning, error) { |
Buildx version
v0.14.0
Docker info
No response
Builders list
Configuration
Here is the link to this gist with code to reproduce bug in the (b *Builder) Boot() function:
https://gist.github.com/moleus/6bf1ad37f47dd86ed12f4c4345601e22
Build logs
+ docker buildx inspect --bootstrap
#1 [arm64-demo-jenkins-job-name internal] booting buildkit
#1 waiting for 1 pods to be ready
#1 ...
#2 [amd64-demo-jenkins-job-name internal] booting buildkit
#2 ...
#1 [arm64-demo-jenkins-job-name internal] booting buildkit
#1 waiting for 1 pods to be ready 93.3s done
#1 DONE 93.4s
#2 [amd64-demo-jenkins-job-name internal] booting buildkit
#2 waiting for 1 pods to be ready 109.5s done
#2 ERROR: expected 1 replicas to be ready, got 0
------
> [amd64-demo-jenkins-job-name internal] booting buildkit:
------
Name: back-demo-buildx <- *comment* this should not be printed. Should exit with 1
Driver: kubernetes
Last Activity: 2024-10-10 09:56:43 +0000 UTC
Nodes:
Name: amd64-demo-jenkins-job-name
Endpoint: kubernetes:///back-demo-buildx?deployment=amd64-demo-jenkins-job-name&kubeconfig=
Driver Options: loadbalance="random" namespace="demo-namespace" ...
Status: inactive
BuildKit daemon flags: --oci-worker-gc-keepstorage 8000 --allow-insecure-entitlement=network.host
Platforms: linux/amd64*
Name: arm64-demo-jenkins-job-name
Endpoint: kubernetes:///back-demo-buildx?deployment=arm64-demo-jenkins-job-name&kubeconfig=
Driver Options: loadbalance="random" namespace="demo-namespace" ...
Status: inactive
BuildKit daemon flags: --oci-worker-gc-keepstorage 8000 --allow-insecure-entitlement=network.host
Platforms: linux/arm64*
Additional info
I suggest replacing
if err == nil && len(errCh) == len(toBoot) { // errCh contains only one error -> condition fails
return false, <-errCh
}
return true, err // returns true, nilwith
if err == nil && len(errCh) > 0 {
return false, <-errCh
}
return true, err // returns true, nilso inspect fails when one of the nodes fails to setup