Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarf local docker fallback not working on MacOS+DockerDesktop+ContainerdMode #2584

Closed
RothAndrew opened this issue Jun 5, 2024 · 8 comments · Fixed by #2593
Closed

Zarf local docker fallback not working on MacOS+DockerDesktop+ContainerdMode #2584

RothAndrew opened this issue Jun 5, 2024 · 8 comments · Fixed by #2593

Comments

@RothAndrew
Copy link
Contributor

RothAndrew commented Jun 5, 2024

Environment

Device and OS: M2 macbook pro
App version: have tried with both 0.33.0 and 0.34.0
Kubernetes distro being used: k3d v1.27.4+k3s1
Other: Docker Desktop with Containerd mode turned on

Steps to reproduce

# Build an image locally, but DON'T push it. This tag does NOT exist in `registry.example.com`.
docker build -t registry.example.com/myimage:1.2.3 .

# Build the zarf package that references `registry.example.com/myimage:1.2.3` in the zarf.yaml
zarf package create --confirm --skip-sbom --no-progress

# Try to deploy it
zarf package deploy zarf-package-foo-arm64-1.2.3.tar.zst --confirm --no-progress -l debug

Expected result

Package is deployed

Actual Result

The package gets created just fine. As expected it says Falling back to local 'docker' images, failed to find the manifest on a remote. But it fails on deployment with this error:

image

If I decompress the zarf package and look at that /images/blobs/sha256 folder, there indeed is not a file with that name.

image

The string a02b607f0d337d98c48e812611a4289e8e10b81e5832685393292d83b059835c DOES appear in the zarf package's checksums.txt file, next to a sha that DOES exist in the images/blobs/sha256 folder.

In checksums.txt:

<snip>
a02b607f0d337d98c48e812611a4289e8e10b81e5832685393292d83b059835c images/blobs/sha256/ad69e88322c92fe909723f882c4c8213d412bbadfef687c7cf5e360adba141b6
<snip>

Severity/Priority

I'd say medium? For the first time ever, I'm developing an application and a zarf package together. In the dev/test cycle, I 100% do not want to be building the docker image and pushing it to a registry just to be pulling it back down. I want to build the image locally, build the zarf package locally, and deploy to a local cluster.

@RothAndrew
Copy link
Contributor Author

Of note is that if I keep everything else exactly the same, but I go ahead and push the image to my docker registry, everything works fine. But, I really really really don't want to do that

@RothAndrew
Copy link
Contributor Author

RothAndrew commented Jun 5, 2024

Edit: moved to new issue: #2586

Side-topic: I'd absolutely LOVE a way to specify that images should ONLY be pulled from the local docker daemon. Perhaps something like:

zarf package create --confirm --local-docker-only

with the ability to first pull all images that are referenced in the zarf.yaml in case there are any that are being used that are upstream dependencies

zarf package pull-images
docker build registry.example.com/myimage:1.2.3
zarf package create --confirm --local-docker-only

Why? because release-please controls all my semver versions. So, up in the registry there is definitely v1.2.3 present, but I'm now developing v1.2.4. But, I don't want to have to change versions everywhere, I want release-please to handle that for me. So, locally the version is still specified as v1.2.3 but I don't want the image from the registry, I want the local image that has my changes.

I'm working around it now by running the zarf package create in a docker container like:

docker network rm no-internet-net || true
docker network create --internal no-internet-net
docker run --platform linux/amd64 --rm -v $(pwd):/work -v /var/run/docker.sock:/var/run/docker.sock -w /work/zarf --network no-internet-net ghcr.io/defenseunicorns/build-harness/build-harness:2.0.24 uds zarf package create --architecture $(scripts/get_arch.sh) --confirm --skip-sbom --no-progress
docker network rm no-internet-net

using the custom no-internet-net network makes the container run without internet connectivity

Back to the issue at hand: I have tried doing things with just straight zarf, no docker stuff, and it still fails whenever it does the local docker fallback.

@AustinAbro321
Copy link
Contributor

AustinAbro321 commented Jun 5, 2024

Thanks for the detailed issue! If you want to create a separate issue for the local-docker-only flag I think it'd be a good feature to add.

The most interesting thing I'm seeing is from the checksum.txt. File names for image blobs in OCI should be just the sha256sum of that file. It looks to me like the issue is that the correct content is getting placed in the image blob but it is being named incorrectly.
A few questions

  • if you run zarf dev sha256sum /images/blobs/sha256/ad69e88322c92fe909723f882c4c8213d412bbadfef687c7cf5e360adba141b6 do you get the name of the file or a02b607f0d337d98c48e812611a4289e8e10b81e5832685393292d83b059835c. This will help us verify that the file is really named incorrectly and not an issue of Zarf putting the wrong checksum down.
  • What type of file is /images/blobs/sha256/ad69e88322c92fe909723f882c4c8213d412bbadfef687c7cf5e360adba141b6? It'll be a blob, json manifest, or docker image config file.
  • Are you able to reproduce with any other images?
  • Are you able to reproduce on amd64 hardware?

@RothAndrew
Copy link
Contributor Author

RothAndrew commented Jun 5, 2024

Troubleshooting notes:

  • The issue happens on ARM MacOS, using Docker Desktop with Containerd mode turned on. It does not happen when Containerd mode is off.
  • But, I can't turn Containerd mode off, because then other things break, like multi-arch building

@phillebaba
Copy link
Member

After some pairing we have determined that the error is caused by the wrong hash being used for the config layer. As stated before the hash and the file name should match for blob layers. The content of the file is correct and results in the correct hash. The file however has the same name as the hash of the index layer. We managed to reproduce the issue after determining that Docker with Containerd snapshotter was required.

After studying cranes writing logic it becomes clear that it does not hash the content of the config file to determine its hash. Instead it calls the image config name function.
https://github.com/google/go-containerregistry/blob/3764db238e3ebf35a3ea0da696287701214859b9/pkg/v1/layout/write.go#L356-L366

The config name function implementation differs based on the source of the image. Which would explain why it would only occur for local images. For local images the Docker client is used to fetch the config name.
https://github.com/google/go-containerregistry/blob/3764db238e3ebf35a3ea0da696287701214859b9/pkg/v1/daemon/image.go#L177-L181

It turns out that the config name comes from the ID returned in image inspect. What is probably happening is that the ID returned differs when running Docker standalone and Docker with Containerd snapshotter. We will need to produce some example code which shows that this is the actual issue.

@RothAndrew
Copy link
Contributor Author

RothAndrew commented Jun 5, 2024

Thanks @phillebaba , appreciate the thoroughness.

Any idea what the next step might be?

For now, I'm gonna look into running a local registry:2combined with the --registry-override flag, but I have a feeling it's gonna be super janky and I'm gonna hate it.

@RothAndrew RothAndrew changed the title Zarf local docker fallback not working Zarf local docker fallback not working on MacOS+DockerDesktop+ContainerdMode Jun 5, 2024
@phillebaba
Copy link
Member

I managed to very easily reproduce this issue.

package main

import (
	"fmt"

	"github.com/google/go-containerregistry/pkg/name"
	"github.com/google/go-containerregistry/pkg/v1/daemon"
)

func main() {
	err := run()
	if err != nil {
		panic(err)
	}
}

func run() error {
	ref, err := name.ParseReference("docker.io/library/alpine:latest@sha256:77726ef6b57ddf65bb551896826ec38bc3e53f75cdde31354fbffb4f25238ebd")
	if err != nil {
		return err
	}
	img, err := daemon.Image(ref)
	if err != nil {
		return err
	}
	configName, err := img.ConfigName()
	if err != nil {
		return err
	}
	fmt.Println("config name", configName)
	return nil
}

Running without Containerd snapshotter.

config name sha256:1d34ffeaf190be23d3de5a8de0a436676b758f48f835c3a2d4768b798c15a7f1

Running with Containerd snapshotter.

config name sha256:77726ef6b57ddf65bb551896826ec38bc3e53f75cdde31354fbffb4f25238ebd

These are the same results that we found when pairing.

Long term this needs to be fixed upstream but in the meantime @AustinAbro321 will add a work around to fix this in the next release.

@AustinAbro321
Copy link
Contributor

@RothAndrew #2593 should be the band aid fix to this issue. It's working on my mac, feel free to try it out.

lucasrod16 pushed a commit that referenced this issue Jun 6, 2024
## Description

Fixes #2584 

## Checklist before merging

- [ ] Test, docs, adr added or updated as needed
- [ ] [Contributor Guide
Steps](https://github.com/defenseunicorns/zarf/blob/main/.github/CONTRIBUTING.md#developer-workflow)
followed
AustinAbro321 added a commit that referenced this issue Jul 23, 2024
## Description

Fixes #2584

## Checklist before merging

- [ ] Test, docs, adr added or updated as needed
- [ ] [Contributor Guide
Steps](https://github.com/defenseunicorns/zarf/blob/main/.github/CONTRIBUTING.md#developer-workflow)
followed

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants