Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: push distribution images to GHCR #4130

Merged
merged 1 commit into from Oct 31, 2023

Conversation

milosgajdos
Copy link
Member

This enables pushing distribution images into GHCR.

NOTE: This is in addition to the Docker Hub push which remains in place unchanged

Closes #4125

PTAL @crazy-max 😄

This addition enables pushing distribution images into GHCR.
This is in addition to the Docker Hub push which remains in place.

Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Copy link
Contributor

@crazy-max crazy-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Note: if you want to mirror current images on Docker Hub to GHCR you can use https://github.com/crazy-max/ghaction-dockerhub-mirror

@thaJeztah
Copy link
Member

@crazy-max would there be advantages in using that? (thinking if using that would provide a better guarantee that both registries have the exact same digest)?

@crazy-max
Copy link
Contributor

@crazy-max would there be advantages in using that? (thinking if using that would provide a better guarantee that both registries have the exact same digest)?

You mean the mirror action? If so it should be a one-time use and not push here but use from another repo if user has the credentials ofc. Or just this script https://github.com/crazy-max/ghaction-dockerhub-mirror/blob/5eefa440a0262261065dd3a57cebc860c8e135f0/action.yml#L35-L68 if brave enough 😅

@milosgajdos
Copy link
Member Author

@crazy-max would there be advantages in using that? (thinking if using that would provide a better guarantee that both registries have the exact same digest)?

The image is built only once by the CI. The very same image is pushed to two distinct registries: I dont understand the point around "the same digest". Pushing the very same image doesn't change its digest.

@Jamstah
Copy link
Collaborator

Jamstah commented Oct 25, 2023

Except when the digest changes on push because of various compression and other reasons of course :)

@milosgajdos
Copy link
Member Author

milosgajdos commented Oct 25, 2023

Except when the digest changes on push because of various compression and other reasons of course :)

If the digests change in transit then we are pretty doomed I'd say. Can you give me an example of where/when you've seen a case like that?

Assume the current CI behaviour:

  • an image is built and packaged and stored locally (on the GHA VM)
  • the image is pushed by the same client to 2 different registries

If the image itself was built twice then sure: docker builds are still not reproducible under some circumstances, but the build here is done only once; image manifest is generated and all the layers/indices appropriately referenced.

If there is a compression in transit, that should not change the digests of the layers as they're specified in the built image manifest. Curious.

@Jamstah
Copy link
Collaborator

Jamstah commented Oct 25, 2023

I can't say I have a deep understanding, but I do know that doing a docker pull docker push cycle doesn't guarantee the same digest in the target registry. There's an obvious case of manifest lists, but docker can also change compression on the way down/up.

Depending on the capabilities of the target registries, I believe docker can push differently too.

It was a throwaway comment really, does it matter if the digests are different on each registry? We should sign the images in some way, and a signature includes the registry/repo/tag too iirc.

@milosgajdos
Copy link
Member Author

I do know that doing a docker pull docker push cycle doesn't guarantee the same digest in the target registry.

@dmcgowan @corhere can you please confirm this as this strikes mild panic in my body 😄 if I have an image manifest that explicitly claims specific digests, I'd expect that when I pull those I'd get the content referenced by the digests -- upstream (this repo) certainly does behave this way, so I'd like to know if there are any registries that are compromising the digests referenced int he manifests or if there are any clis that mess about and compromise the integrity of the built artifacts. I hope no registry/runtime (client) is rewriting the manifests or doing something funky 😄

does it matter if the digests are different on each registry?

Yes, it does. If we are going to push to both we must make sure the users are getting the same content uniquely identifiable by the sha256 digest as listed in the image manifests when the build succeeds

We should sign the images in some way, and a signature includes the registry/repo/tag too iirc.

Yes we should. Referrers API was meant to address attaching signatures (and other MD) to the images but that's got a way to go to be codified in the spec. That being said, I believe buildkit can now include attestations. Is that right @crazy-max ? Do we have to switch some flags on to get those? That will give the end users at least some way to verify what's in the image, etc.

@Jamstah
Copy link
Collaborator

Jamstah commented Oct 25, 2023

It doesn't tend to be the registry, it tends to be the clients.

For example, skopeo has cases where it will change digest on copy (eg containers/skopeo#1451), I actually added an option to skopeo to make it fail in these cases instead of pushing a modified manifest.

A docker images doesn't actually have a digest just from building it, but docker does store digests from the registry when pulling them now, I don't know if it ensures the same digest if it is then pushed to a different registry.

An example of a simple pull->tag->push resulting in a different digest, even with the repos both being on docker hub:

jammy:~$ docker image ls --digests
REPOSITORY                   TAG                DIGEST    IMAGE ID       CREATED       SIZE
keycloak/keycloak-operator   test               <none>    9c2bd5654bd6   2 weeks ago   448MB
jammy:~$ docker pull busybox:latest
latest: Pulling from library/busybox
3f4d90098f5b: Pull complete 
Digest: sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79
Status: Downloaded newer image for busybox:latest
docker.io/library/busybox:latest
jammy:~$ docker image ls --digests
REPOSITORY                   TAG                DIGEST                                                                    IMAGE ID       CREATED        SIZE
keycloak/keycloak-operator   test               <none>                                                                    9c2bd5654bd6   2 weeks ago    448MB
busybox                      latest             sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79   a416a98b71e2   3 months ago   4.26MB
jammy:~$ docker tag busybox:latest docker.io/jammyhewitt/busybox:latest
jammy:~$ docker image ls --digests
REPOSITORY                   TAG                DIGEST                                                                    IMAGE ID       CREATED        SIZE
keycloak/keycloak-operator   test               <none>                                                                    9c2bd5654bd6   2 weeks ago    448MB
jammyhewitt/busybox          latest             <none>                                                                    a416a98b71e2   3 months ago   4.26MB
busybox                      latest             sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79   a416a98b71e2   3 months ago   4.26MB
jammy:~$ docker push docker.io/jammyhewitt/busybox:latest
The push refers to repository [docker.io/jammyhewitt/busybox]
3d24ee258efc: Mounted from library/busybox 
latest: digest: sha256:023917ec6a886d0e8e15f28fb543515a5fcd8d938edb091e8147db4efed388ee size: 528
jammy:~$ docker image ls --digests
REPOSITORY                   TAG                DIGEST                                                                    IMAGE ID       CREATED        SIZE
keycloak/keycloak-operator   test               <none>                                                                    9c2bd5654bd6   2 weeks ago    448MB
jammyhewitt/busybox          latest             sha256:023917ec6a886d0e8e15f28fb543515a5fcd8d938edb091e8147db4efed388ee   a416a98b71e2   3 months ago   4.26MB
busybox                      latest             sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79   a416a98b71e2   3 months ago   4.26MB
jammy:~$ 

@Jamstah
Copy link
Collaborator

Jamstah commented Oct 25, 2023

We've found the only way to get digest stability when pushing to multiple repos is to push to one repo, and then copy it from there to everywhere else with skopeo and its --preserve-digests option

@milosgajdos
Copy link
Member Author

It doesn't tend to be the registry, it tends to be the clients.

Yes, as I said earlier

Assume the current CI behaviour:
an image is built and packaged and stored locally (on the GHA VM)
the image is pushed by the same client to 2 different registries

Emphasis on the image is pushed by the same client 😄 which is the case in our CI pipeline. I see no issue with this PR 🤷‍♂️ but again, I'm willing to be persuaded and learn something new.

I know the clients can be silly sometimes, but that's not our concern: our concern here is making the identical image available from different registries

@Jamstah
Copy link
Collaborator

Jamstah commented Oct 25, 2023

Yes, what we're doing should be safe, but I don't believe it is guaranteed to be safe. There's a reason docker stores multiple repo digests for each image in the local store. And like I said, I don't think it matters if the digests are different in each repo.

Also, I honestly thought I had already approved this, my comments were certainly not saying that I thought the PR shouldn't be going in :)

@milosgajdos
Copy link
Member Author

PTAL @thaJeztah

@thaJeztah
Copy link
Member

can you please confirm this as this strikes mild panic in my body 😄 if I have an image manifest that explicitly claims specific digests, I'd expect that when I pull those I'd get the content referenced by the digests

It depends...

With the containerd image-store integration, pushes are reproducible, because the image, image-manifest, and compressed layers are created once, and stored on disk. Pulling an image also keeps the distribution image (and layers) in the content store. This also means that images from different registries (or using different compression) will be stored twice, but may (after extracting) produce the same image.

This is an essential difference in design between containerd's image store and the original design of the docker daemon image store (using "graph-drivers"). The graph-driver store was designed around the local image; the image config and un-compressed layers. These layers are fully reproducible, and when pulling an image it would extract the layers (but using tar-split to reconstruct the original, uncompressed tar).

However it does not keep any of the distribution artifacts, as the runtime has no use for the compressed layers, nor manifest-index, or manifest; those are discarded (but metadata about them kept); the design here was both to only handle the artifacts used for the runtime itself (distributing it being a separate concern), as well as optimizing for storage.

When pushing an image to the same registry, it would detect layers that are not modified, and would not even re-push them, but when pushing to a different registry, it will create new content (manifest, compressed layers), which is non-reproducible; so pushing to 2 different registries will result in different digests. This scenario was less common, as well as "transfer content between registries" (pull from one, push to another) mostly out of scope, and better suited for dedicated tools.

Containerd's design was different from the start, and more focus on reproducibility of distributed artifacts, at the cost of storage (considered to be "cheap" for production scenarios). So containerd keeps the distributions artifacts when pulling, keeps BOTH compressed, and uncompressed / extracted artifacts (and when using multiple snapshotters; multiple copies of extracted artifacts).

As to the bake-action; I think it does not use the docker (graph-driver) image-store and produces distribution artifacts once (pushing multiple times), but, honestly not deeply familiar with the approach it takes; I'd have to quickly defer to @crazy-max on that one.

@milosgajdos
Copy link
Member Author

Thanks, @thaJeztah this was a super interesting insight into how all the machinery works.

I'm not gonna lie, it made me feel intensely depressed on the one hand (how we accepted this in our industry kinda boggles my mind), but very happy on the other (my personal laptop is running off NIX store where I dont have to deal with this 😄 ).

This opens a question though, should we take a different approach to this PR? Any recommendation @crazy-max ?

@davidspek
Copy link
Collaborator

Just thought I'd mention I also do this in the #4083 PR for the release automation already.

@milosgajdos milosgajdos merged commit ecb475a into distribution:main Oct 31, 2023
12 checks passed
@milosgajdos milosgajdos deleted the push-ghcr branch October 31, 2023 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Push to github artifact registry as well as docker hub
6 participants