[proposal] Improve taggers#2804
Conversation
b1ad931 to
f4ebe71
Compare
| + we introduce a `latest` tagger tag tags images with `:latest`. | ||
| + the `latest` tagger is used by default instead of the `git` tagger. | ||
| + `sha256` is completely changed to use a digest of the artifact's inputs as the tag. | ||
| Something like https://github.com/GoogleContainerTools/skaffold/pull/2301 |
There was a problem hiding this comment.
For the sha256 digest, I believe it'd make the most sense to derive it from the resulting image (therefore tagging after the build). Otherwise, unless the Skaffold checksum implementation completely reproduces the Docker's one, there always be missing things, or on the contrary, new tags for the unchanged docker image. Examples:
- Implement checksum computation for sha256 tagger (fix issue #2290) #2301 hashes
Dockerfileandskaffold.yaml. Therefore, any formatting changes in those files will affect the checksum but won't affect the image content. - The implementation above hashes the working directory (which I assume is the directory containing the
skaffold.yamlfile), however, the resulting image may contain files and other build artifacts from the outside. Therefore, the changes outside of the working directory will affect the image content but won't affect the tag.
There was a problem hiding this comment.
FWIW #2301 was not merged. I don't think we should hash Dockerfile and skaffold.yaml into the image.
To your second point: given the same exact commit, an image built twice with changes in external/remote dependencies will result in different sha256 digests given by the registry / image ID given by the docker daemon. Skaffold uses that and is rendered into the fully qualified image names in kubernetes, e.g. a git tagger based image has the git based tag + the digest: gcr.io/k8s-skaffold/gke-loadbalancer:v0.37.1-136-g174ea947a@sha256:7a3419895994a6a9e0c251fa5b3e32134aff987c3e0ec046bf81398c48e07cb4
Having this digest as a tag as well seems superfluous.
The tag can be the same - like in the case of latest, or envTemplate or dateTime (if the same image is built the exact same moment) - but the image digest will be different.
Now, we might still have an issue with the naming - sha256 seemingly generates expectations - how about depencyDigest / inputDigest / localChecksum ?
There was a problem hiding this comment.
FWIW #2301 was not merged. I don't think we should hash Dockerfile and skaffold.yaml into the image.
Whether it was merged or not is irrelevant. I just pointed out the downsides of the proposed solution.
Skaffold uses that and is rendered into the fully qualified image names in kubernetes, e.g. a
gittagger based image has the git based tag + the digest:gcr.io/k8s-skaffold/gke-loadbalancer:v0.37.1-136-g174ea947a@sha256:7a3419895994a6a9e0c251fa5b3e32134aff987c3e0ec046bf81398c48e07cb4Having this digest as a tag as well seems superfluous.
This is indeed superfluous. If we have the tag which fully identifies the image contents, nothing else is needed by definition. I'm looking for something like gcr.io/k8s-skaffold/gke-loadbalancer:7a3419895994a6a9e0c251fa5b3e32134aff987c3e0ec046bf81398c48e07cb4 which is how Skaffold v0.21.1 works.
The tag can be the same - like in the case of latest, or envTemplate or dateTime (if the same image is built the exact same moment) - but the image digest will be different.
This is the part I don't get. Are we talking about docker image tags or some other tags? As far as I understand, multiple images with different digests cannot have the same name and tags. Even if they could, how would you reference a specific image if more than one have the same name and tag?
Now, we might still have an issue with the naming -
sha256seemingly generates expectations - how aboutdepencyDigest/inputDigest/localChecksum?
What I believe would be ideal is imageDigest derived directly from the image after it's built. Please see #2713 for the details.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
OK, I read a bit more to understand the use cases that motivate sha256 tagger:
@morozov's use case: can't use latest + need a content dependent tag (repo + potential external contents)
From #2713 (comment):
The use case is that when multiple images are built from a repository, I need images to be tagged the same (sha256) if their content is not affected by a commit. Can this be achieved after reworking the tagger in #1482?
Mainly motivated by:
FATA[0007] build failed: build failed: building [quay.io/redacted/redacted]: build artifact: writing image "quay.io/redacted/redacted:latest": MANIFEST_INVALID: manifest invalid; map[message:manifest schema version not supported]
As you can see, the image is tagged as latest and is rejected by the repository versioning policy.
@elisiano's use case: prefixing implemented with envTemplate + {{.DIGEST}}
#2162 (comment)
Do we care about the actual content, or the digest of the input is sufficient?
If we care about the content digest (including external effects in non-deterministic builds, that don't only depend on the input sources in the repo) that will have to go back to "post build tagging" model - instead of "calculate the tag, and ask the Builder to build&tag".
There was a problem hiding this comment.
@morozov Docker images that are pushed to a registry can always be referenced by their digest.
For example, the alpine:3.10, as it is today, can be referenced by:
- Their digest:
alpine@sha256:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb - Tag and digest:
alpine:3.10@sha256:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb. It is nicer because it's both immutable and useful for the user who can read the version easily.
If you want to reference images by something that's immutable and content addressable, you should use any of those two references. Skaffold uses the second form whenever possible (that is when images are pushed)
You are asking Skaffold to tag by digest. It would produce references that look like that:
alpine:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb@sha256:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb. That seems really superfluous. It is redundant and doesn't offer any meaningful information to a human. (Was 72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb produces before 7746df395af22f04212cd25a92c1d6dbc5a06a0ca9579a229ef43008d4d1302a?)
wdyt? Could you try again explaining why you'd still need to tag images by their digest? I might me missing something important
There was a problem hiding this comment.
@dgageot all the above makes sense with the only exception described in #2804 (comment). It's sort of a corner case and there may be other solutions to it.
To recap: we have two sets of repositories:
-
Production repositories (one per image), where the images are referenced as:
quay.io/company/image1@sha256:$DIGEST1quay.io/company/image2@sha256:$DIGEST2
These images don't require extra tagging and can be referenced by their
sha256digest. -
Experimental repository (just for dev deployments):
quay.io/company/experimental:image1@sha256:$DIGEST11quay.io/company/experimental:image1@sha256:$DIGEST12quay.io/company/experimental:image2@sha256:$DIGEST21quay.io/company/experimental:image2@sha256:$DIGEST22
Right now, the images are referenced by their tags generated by
skaffoldusing a template (e.g.{{.IMAGE_NAME}}:project1-{{.DIGEST_HEX}}). The tags contain the image name (for identification purposes) and the digest (to be content-addressable). This will not work since one tag cannot reference more than one digest.
The question is, how do we store those experimental/dev/ephemeral images in one repository while having them content-addressable and identifiable by image name.
There was a problem hiding this comment.
Thank you @morozov for the clarifications. It really helps.
For what it's worth, when using the full form image:tag@sha256:digest, the tag is ignored so you can have both quay.io/company/experimental:image1@sha256:$DIGEST_1 and quay.io/company/experimental:image1@sha256:$DIGEST_2. Doesn't really make sense but the correct images will still be used.
In your case, images will always be referenced by a long identifier, any of the following:
quay.io/company/experimental:image1-UNIQUE-IDquay.io/company/experimental:image1-DIGESTquay.io/company/experimental:image1-UNIQUE-ID@sha256:DIGESTquay.io/company/experimental:image1-DIGEST@sha256:DIGESTquay.io/company/experimental:image1@sha256:DIGEST(image1is ignored but still gives a hint to the user that they are using the right image)quay.io/company/experimental@sha256:DIGEST.
The last two options are possible with both the current state of Skaffold and the proposed state. And they look ok to me when I compare them with the other options. wdyt?
I understand you'd rather have option 2 or 4 but it requires computing tags after the build which is something we want to avoid because we've been there and it's a mess.
Another possibility is use the inputDigest as a pseudo unique id. I think it's kinda ok but as you said, there's no bijection between such id and the digest.
There was a problem hiding this comment.
For what it's worth, when using the full form
image:tag@sha256:digest, the tag is ignored so you can have bothquay.io/company/experimental:image1@sha256:$DIGEST_1andquay.io/company/experimental:image1@sha256:$DIGEST_2. Doesn't really make sense but the correct images will still be used.
As far as I understand, it will work like this from the referencing standpoint. But what about the repository? Will skaffold be able to create two images in the same repository with the same tag but different digests?
I understand you'd rather have option 2 or 4 but it requires computing tags after the build
I don't really have a preference in which syntax I'll end up using. I need a migration path from v0.29 to the latest one. Option #2 looks closest to what we have now; #4 indeed looks redundant.
If you believe that #5 will work, I could give it some more testing but I'll need some extra guidance. What I tried was:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
mariadb latest 97df12fa9319 6 months ago 369MB
postgres latest d03e2a8f3ed4 8 months ago 312MB
$ docker tag mariadb:latest experimental:project1
$ docker tag postgres:latest experimental:project1
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
mariadb latest 97df12fa9319 6 months ago 369MB
experimental project1 d03e2a8f3ed4 8 months ago 312MB
postgres latest d03e2a8f3ed4 8 months ago 312MB
As you can see, there's only one experimental:project1 tag that points to the same image as postgres:latest.
There was a problem hiding this comment.
@elisiano what is the real application of the project- prefixes in the experimental repository? Are they used for some automated housekeeping or mostly in order for a human to understand what project a specific image belongs to? In other words, what are the consequences of not using tags in the experimental repository but instead always using the @sha256 digest?
balopat
left a comment
There was a problem hiding this comment.
Thank you for putting this much thought into this. I like the direction, have questions and some answers.
|
|
||
| ## Proposal | ||
|
|
||
| + we introduce a `latest` tagger tag tags images with `:latest`. |
There was a problem hiding this comment.
This is basically a current sha256, right? I.e it would still use the @sha256:sdfsdfsdf digest at deplyoment time? Or just pure myimage:latest?
How does that behave when the use has an image that already has a tag? overrides the tag?
Can you elaborate these here?
There was a problem hiding this comment.
See below: "No matter the tagger, Skaffold will keep on using immutable references in manifests"
| + we introduce a `latest` tagger tag tags images with `:latest`. | ||
| + the `latest` tagger is used by default instead of the `git` tagger. | ||
| + `sha256` is completely changed to use a digest of the artifact's inputs as the tag. | ||
| Something like https://github.com/GoogleContainerTools/skaffold/pull/2301 |
There was a problem hiding this comment.
FWIW #2301 was not merged. I don't think we should hash Dockerfile and skaffold.yaml into the image.
To your second point: given the same exact commit, an image built twice with changes in external/remote dependencies will result in different sha256 digests given by the registry / image ID given by the docker daemon. Skaffold uses that and is rendered into the fully qualified image names in kubernetes, e.g. a git tagger based image has the git based tag + the digest: gcr.io/k8s-skaffold/gke-loadbalancer:v0.37.1-136-g174ea947a@sha256:7a3419895994a6a9e0c251fa5b3e32134aff987c3e0ec046bf81398c48e07cb4
Having this digest as a tag as well seems superfluous.
The tag can be the same - like in the case of latest, or envTemplate or dateTime (if the same image is built the exact same moment) - but the image digest will be different.
Now, we might still have an issue with the naming - sha256 seemingly generates expectations - how about depencyDigest / inputDigest / localChecksum ?
|
|
||
| ## Open Issues/Questions | ||
|
|
||
| + How do we handle users who didn't configure a tagger and were happy with the default |
There was a problem hiding this comment.
We can introduce a new version - skaffold fix should include git tagger between the two versions. this is crappy if there is no actual config change...maybe we can wait until there is a config change? I think we might have to consider default value changes as config change.
c07677e to
b6dd5b9
Compare
|
@dgageot i'd love to get this cleaned up and merged, we're really lacking clarification here in our docs. it seems like we've answered most of the big questions here so it's mainly wording changes. |
|
@nkubala I’ll try to take some time next week. I would also like to make some progress! |
b6dd5b9 to
41438ea
Compare
0e1fa75 to
4e971b4
Compare
Signed-off-by: David Gageot <david@gageot.net>
Signed-off-by: David Gageot <david@gageot.net>
Signed-off-by: David Gageot <david@gageot.net>
Signed-off-by: David Gageot <david@gageot.net>
ca9e61b to
de83401
Compare
Signed-off-by: David Gageot <david@gageot.net>
de83401 to
2e184a5
Compare
Signed-off-by: David Gageot david@gageot.net