-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to push when using multi-node context #177
Comments
I don't see any errors in the output you posted. The difference in output between multi-node and single-node builds is expected. You can get the logs by running |
Right, I cant see any either, hehe. I will take a look at the buildkit logs with the debug flag asap and update the issue when I have them. Is buildkit supposed to connect docker daemons in the way I said before, or have I missed something which might be vital? |
Small update, checked the buildkit container logs and found the following over and over through the logs:
From the looks of it, it seems like the specific daemon have issues with both pulling and pushing to the specific repositories, could this be the issue? The client SHOULD be logged in, as it is able to push with a single daemon... and I'm guessing that it is only the client that needs to "log in", right? |
Oh and a note: I push to my gitlab registry first then docker hub. |
Thought I'd update this issue with a bit more information. When I push my image, it pushes the first tag only. |
So, the debug logs did give me quite a bit of information. The first thing that I saw was the following:
Something that I would personally thing should produce a warning :) The second thing that I noticed, which is probably the actual error was the following:
It seems that when using multiple nodes, they have lost their auth token and get a This is really odd, as it works perfectly fine if I remove the secondary node... This should probably not just be a warning, but rather an error. If the builder is not authed it won't be able to do anything, hence it should just crash imho. |
@tonistiigi I will gladly hand out some more debug logs if that would help more, but I would prefer to do that private as it might contain keys and stuff that I would rather not share openly ;P |
Hi there! |
I am experiencing the same issue, any updates on this? Edit: I fixed my problem by not using a custom registry port. Apparently the port does not get propagated to the builder correctly. |
Still issues here. Using public registries, so no specific ports set here :/ If anyone who have insight wish more logs or anything, let me know and I'll gladly send them over! |
I found something that might help you in your investigation. On my systems, buildx does not send any username information when pushing (or pulling from cache) multi-arch images. I do not know how it is nontheless authentificated, I guess the builder caches only the digest auth token and not the username string. Theory: Some registries reguire the username string (actor.name field in the registry metadata / notifications) but it is not set. |
To further support my argument, I pulled the access logs for a push from buildx from my nginx gateway:
As you can see, only the GET requests for auth tokens are properly authenticated using HTTP auth, the subsequent PUT requests actually uploading data are not, which is perfectly ok since auth is achieved with the token. However,
Edit: Strange thing, regular docker push uses the same pattern:
But this time the notification sent by the registry contains the actor:
So maybe it is an issue with the registry itself? Does anyone have a clue? Here are the logs of the registry. For Buildx push:
And for regular docker push (which generated correct notifications):
Apparently the registry never even receives any form of HTTP auth. Which makes sense, since Portus handles my digest auth using tokens. So maybe a registry bug? Edit: The critical difference is that buildx does not append the account GET parameter to the HTTP token request: compare So probably a buildx/buildkit bug? |
@StarGate01, awesome debugging/research, thank you for that! Hopefully fixable with all that info! :) |
Thanks! I actually fixed it already, working on creating pull-requests, building forked images, deployment to my CI etc ... It was a one-line-fix in containerd: StarGate01/containerd@2caabd9 |
Haha wow... Oh well, this will improve my build process a whole lot, I owe you one! :) |
Alright, I provide the following to the community:
You can use the unmodified buildx command with the patched buildkit like this: I tested it locally and on my CI, it works. I use https://github.com/SUSE/Portus and https://github.com/Quiq/docker-registry-ui and both tools now read the username correctly. Lets hope this fixes your problem too! :) |
So apparently, adding the However, now I dont really know how to fix your problem - maybe your registry auth controller has similar issues? What setup are you using? |
Ah that sucks. Well my setup is quite basic: 1x AMD64 machine, 1x ARM64 machine, both set up as nodes/context in docker and added as nodes in the builder context when setting up buildx/buildkit on the primary node (amd64). Both clients are logged in to the registries through the .docker/config.json file, both are set up to be experimental in daemon (and client), but I guess the client login should do no real difference really. I will take a look at your patched image and see if it works with that though! |
Same issue with a similar setup here, an ARMv6/7 and an AMD64 builder. The registry I'm targeting is the Docker Hub. I'm trying to push to both zabbixmultiarch/zabbix-agent2 and zabbixmultiarch/zabbix-agent2-alpine:
|
Is there a way to properly retag multiarch images? Something like the following:
Edit: |
Does this problem only occur when you are using multiple physical builder nodes? When I build a multi-arch image using the official buildkit and QEMU/binfmt (architecture emulation one one physical machine), the pushed tags and images look good. |
- Make latest the first tag since there's a bug in buildx docker/buildx#177 - Tag with current git commit id
Can't speak for others, but in my case it works fine in none-multi-node builds, I can do it with qemu without any issues, but the build is a lot slower on emulated arch than on a native one. Edit: Guessing you mean with you image, missed that it was you who commented the other day, hehe. |
This only occurs when multiple nodes are building the images. I tested both extensively. When building with more than one node the first tag gets pushed and nothing else. When only one node is involved everything is fine ie all tag/images get pushed as expected.
What I am trying to build (zabbix agent2) only builds fine when running on ARM. QEMU doesn't cut it. Edit: # Buildx setup
docker context create --use --docker "host=ssh://pi@raspberrypi.local" raspberrypi
docker buildx create --use --name test rasbperrypi
docker buildx create --append --name test default
# Code
git clone https://github.com/pschmitt/zabbix-docker-multiarch /tmp/zabbix-docker-multiarch
/tmp/zabbix-docker-multiarch/build.sh agent2-alpine -p |
I have the same issue. The output is different too! It looks like the merging of the manifests triggers an error. With the bug (multihost):
Without the bug (single host):
EDIT: buildx master has the same issue |
I still experience this issue... As a work around I no longer do any heavy work in the docker images, but rather on each native machine and import the binaries into the image as a final step with a single node. It's annoying to have to do it like that, but it works... |
Ok I tried to work around this issue several ways:
Here's how for tag in $BUILD_TAGS; do
docker buildx build --platform="$DOCKER_PLATFORMS" --pull --push --cache-from="$BUILD_TAG" --cache-to="type=inline" --tag="$tag" .
done Here are the two resulting images: https://hub.docker.com/r/silex/emacs/tags?page=1&name=25-dev https://hub.docker.com/r/silex/emacs/tags?page=1&name=25.3-dev. We see that for arm the sha256 digests are the same but for amd64/i386 they changed. How am I supposed to "docker tag" a multiarch image? I'm now thinking of |
This is basically how I build my images now'a day: https://github.com/jitesoft/docker-node-base/blob/master/.gitlab-ci.yml (in the docker command I use a helper file I made myself which basically generates a tag list (-t image:tag -t image:tag) and it works fine (using qemu on the build machin which is single node). The actual building of the binaries I do on build machines which are using the arch they are intended to run on. So for my aarch64 images, I use aarc64 host, on amd64, amd64 host. It works and works quite well, but i still would want this to work good too, as it would make a lot of things much more easy, hehe... |
Sorry for close and reopen, using GitHub mobile and thought it was "close window", not issue! |
@Johannestegner: be careful, your images are not multiarch (because you also did the You can see the diff with |
@Silex yes, quay does not have multi-arch support in their registry. My gitlab and docker hub images are though :) But thanks! :) |
Ah ok so you push a single tag to gitlab/docker hub and multiple tags to quay.io. |
Well, I push multiple tags to both, but in the gitlab/docker hub case I have all tags in the build command (you can use multiple -t image:tag with buildx build) and for quay I iterate through the tags and push them. :) |
Yes but that does not work when you have multiple hosts, I mean this is what this issue is about no? I also have multiple |
@Silex it's been a while since i tried it the last time, but I think only my manifest gets pushed, while the tags are not. If I use a single host to push, all the tags and the manifest works fine. |
My setup is 1 gitlab worker for amd64/i386 and 1 arm machine for the arm images. The gilab worker does the pushing:
Then I build using:
Only the first tag gets pushed. |
Does the tag really get pushed though? Looking at the logs, it looks like it pushes the manifest, then its done... :) |
@Johannestegner: I mean https://hub.docker.com/r/silex/emacs/tags?page=1&name=26.3 is multiarch so it had to be pushed at some point? But maybe you're right and it "works" randomly. Also what does it mean that only the manifest is pushed? Maybe I need to brush up on manifests vs tags. |
Each docker "image" consists of a manifest and a bunch of layers. The manifest references the layers and a tag references a manifest. When it comes to multi-arch, the manifest file is split up into the same image but with different architectures, so each tag points to a manifest, which have multiple architectures which each references a bunch of layers. From my testing and research, the manifest is pushed fine, while that's it, the layers are not pushed and the tags are not (as they comes after pushing of the layers). |
Thanks for the explanations. What I observe is that |
More information at docker/buildx#177
So I try multi-node build and push on docker hub, although it said it pushed I can't find the image. Then later I try to push to harbor, this time I finally see the image, both arm and amd image was not tagged and both are separated into two oci image instead of one oci image. the image SHA is also different than the one it output. What I think of is due to it building on two nodes, it treat it as two image build and once one node finishes building it will push on their own. Now I am trying to output to a file, then push the file, but the problem is output type oci is not supported on multi node build |
Also experiencing this :( |
For those who also struggling with this, there is an open issue on Gitlab: https://gitlab.com/gitlab-org/gitlab/-/issues/389577 The workaround is to add |
@psalkowski If you are hitting a provenance-related issue, specifically with multi-node, then make sure you are using the latest buildx and if you still see the problem open a new issue with reproduction steps. This issue is for something different (likely registry related). |
Haven't tried this since my last comment I think, so will try asap (will require some re-configuring of my build machines) and see if it's still an issue. Kind of old issue now, so would be nice to be able to close it ;) |
I will try to describe my issue as good as I can... much of the setup is done by trail and error, so I could just be doing it wrong, but the documentation for multi-node buildx instances is quite sparse at the moment, so it's the best I could!
Scenario:
I have connected a ARM64 machines docker daemon to one of my AMD64 machines through buildx to be able to utilize the native ARM64 engine for ARM64 builds. While the engines seems to work great together while building, when they are about to push they stop after the AMD64 machine have pushed the first tag.
There are no errors in the build logs, so it looks like it was built and pushed successfully while no images are pushed to the registries.
I have tried both by connecting to the remote daemon over tcp and over ssh, while none of them seems to work for me.
Connection was done through
docker buildx create --append --name <builder-name> tcp://ip:2375
It does though, seem like the registries do notice that something was being pushed as they say that the last update was "1 minute ago", while no images have been changed.
The last part of the build log looks like this when it fails to push successfully:
While a "real" successful one looks like the following (using only the AMD64 machine with Qemu):
If there are any logs that I can't find that would help in debugging the issue or any more information that you could need, please let me know!
Docker version output:
AMD machine:
ARM machine:
Buildx version:
AMD machine:
ARM machine:
The text was updated successfully, but these errors were encountered: