Skip to content

[proposal] Improve taggers#2804

Merged
nkubala merged 6 commits intoGoogleContainerTools:masterfrom
dgageot:improve-taggers
Nov 12, 2019
Merged

[proposal] Improve taggers#2804
nkubala merged 6 commits intoGoogleContainerTools:masterfrom
dgageot:improve-taggers

Conversation

@dgageot
Copy link
Copy Markdown
Contributor

@dgageot dgageot commented Sep 4, 2019

Signed-off-by: David Gageot david@gageot.net

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 4, 2019

Codecov Report

Merging #2804 into master will not change coverage.
The diff coverage is n/a.

@dgageot dgageot force-pushed the improve-taggers branch 3 times, most recently from b1ad931 to f4ebe71 Compare September 5, 2019 09:41
Comment thread docs/design_proposals/digest-tagger.md Outdated
+ we introduce a `latest` tagger tag tags images with `:latest`.
+ the `latest` tagger is used by default instead of the `git` tagger.
+ `sha256` is completely changed to use a digest of the artifact's inputs as the tag.
Something like https://github.com/GoogleContainerTools/skaffold/pull/2301
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sha256 digest, I believe it'd make the most sense to derive it from the resulting image (therefore tagging after the build). Otherwise, unless the Skaffold checksum implementation completely reproduces the Docker's one, there always be missing things, or on the contrary, new tags for the unchanged docker image. Examples:

  1. Implement checksum computation for sha256 tagger (fix issue #2290) #2301 hashes Dockerfile and skaffold.yaml. Therefore, any formatting changes in those files will affect the checksum but won't affect the image content.
  2. The implementation above hashes the working directory (which I assume is the directory containing the skaffold.yaml file), however, the resulting image may contain files and other build artifacts from the outside. Therefore, the changes outside of the working directory will affect the image content but won't affect the tag.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW #2301 was not merged. I don't think we should hash Dockerfile and skaffold.yaml into the image.
To your second point: given the same exact commit, an image built twice with changes in external/remote dependencies will result in different sha256 digests given by the registry / image ID given by the docker daemon. Skaffold uses that and is rendered into the fully qualified image names in kubernetes, e.g. a git tagger based image has the git based tag + the digest: gcr.io/k8s-skaffold/gke-loadbalancer:v0.37.1-136-g174ea947a@sha256:7a3419895994a6a9e0c251fa5b3e32134aff987c3e0ec046bf81398c48e07cb4

Having this digest as a tag as well seems superfluous.

The tag can be the same - like in the case of latest, or envTemplate or dateTime (if the same image is built the exact same moment) - but the image digest will be different.

Now, we might still have an issue with the naming - sha256 seemingly generates expectations - how about depencyDigest / inputDigest / localChecksum ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW #2301 was not merged. I don't think we should hash Dockerfile and skaffold.yaml into the image.

Whether it was merged or not is irrelevant. I just pointed out the downsides of the proposed solution.

Skaffold uses that and is rendered into the fully qualified image names in kubernetes, e.g. a git tagger based image has the git based tag + the digest: gcr.io/k8s-skaffold/gke-loadbalancer:v0.37.1-136-g174ea947a@sha256:7a3419895994a6a9e0c251fa5b3e32134aff987c3e0ec046bf81398c48e07cb4

Having this digest as a tag as well seems superfluous.

This is indeed superfluous. If we have the tag which fully identifies the image contents, nothing else is needed by definition. I'm looking for something like gcr.io/k8s-skaffold/gke-loadbalancer:7a3419895994a6a9e0c251fa5b3e32134aff987c3e0ec046bf81398c48e07cb4 which is how Skaffold v0.21.1 works.

The tag can be the same - like in the case of latest, or envTemplate or dateTime (if the same image is built the exact same moment) - but the image digest will be different.

This is the part I don't get. Are we talking about docker image tags or some other tags? As far as I understand, multiple images with different digests cannot have the same name and tags. Even if they could, how would you reference a specific image if more than one have the same name and tag?

Now, we might still have an issue with the naming - sha256 seemingly generates expectations - how about depencyDigest / inputDigest / localChecksum ?

What I believe would be ideal is imageDigest derived directly from the image after it's built. Please see #2713 for the details.

This comment was marked as off-topic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I read a bit more to understand the use cases that motivate sha256 tagger:

@morozov's use case: can't use latest + need a content dependent tag (repo + potential external contents)

From #2713 (comment):

The use case is that when multiple images are built from a repository, I need images to be tagged the same (sha256) if their content is not affected by a commit. Can this be achieved after reworking the tagger in #1482?

Mainly motivated by:

FATA[0007] build failed: build failed: building [quay.io/redacted/redacted]: build artifact: writing image "quay.io/redacted/redacted:latest": MANIFEST_INVALID: manifest invalid; map[message:manifest schema version not supported]
As you can see, the image is tagged as latest and is rejected by the repository versioning policy.

@elisiano's use case: prefixing implemented with envTemplate + {{.DIGEST}}
#2162 (comment)

Do we care about the actual content, or the digest of the input is sufficient?
If we care about the content digest (including external effects in non-deterministic builds, that don't only depend on the input sources in the repo) that will have to go back to "post build tagging" model - instead of "calculate the tag, and ask the Builder to build&tag".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morozov Docker images that are pushed to a registry can always be referenced by their digest.

For example, the alpine:3.10, as it is today, can be referenced by:

  • Their digest: alpine@sha256:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb
  • Tag and digest: alpine:3.10@sha256:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb. It is nicer because it's both immutable and useful for the user who can read the version easily.

If you want to reference images by something that's immutable and content addressable, you should use any of those two references. Skaffold uses the second form whenever possible (that is when images are pushed)

You are asking Skaffold to tag by digest. It would produce references that look like that:
alpine:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb@sha256:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb. That seems really superfluous. It is redundant and doesn't offer any meaningful information to a human. (Was 72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb produces before 7746df395af22f04212cd25a92c1d6dbc5a06a0ca9579a229ef43008d4d1302a?)

wdyt? Could you try again explaining why you'd still need to tag images by their digest? I might me missing something important

Copy link
Copy Markdown
Contributor

@morozov morozov Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgageot all the above makes sense with the only exception described in #2804 (comment). It's sort of a corner case and there may be other solutions to it.

To recap: we have two sets of repositories:

  1. Production repositories (one per image), where the images are referenced as:

    • quay.io/company/image1@sha256:$DIGEST1
    • quay.io/company/image2@sha256:$DIGEST2

    These images don't require extra tagging and can be referenced by their sha256 digest.

  2. Experimental repository (just for dev deployments):

    • quay.io/company/experimental:image1@sha256:$DIGEST11
    • quay.io/company/experimental:image1@sha256:$DIGEST12
    • quay.io/company/experimental:image2@sha256:$DIGEST21
    • quay.io/company/experimental:image2@sha256:$DIGEST22

    Right now, the images are referenced by their tags generated by skaffold using a template (e.g. {{.IMAGE_NAME}}:project1-{{.DIGEST_HEX}}). The tags contain the image name (for identification purposes) and the digest (to be content-addressable). This will not work since one tag cannot reference more than one digest.

The question is, how do we store those experimental/dev/ephemeral images in one repository while having them content-addressable and identifiable by image name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @morozov for the clarifications. It really helps.

For what it's worth, when using the full form image:tag@sha256:digest, the tag is ignored so you can have both quay.io/company/experimental:image1@sha256:$DIGEST_1 and quay.io/company/experimental:image1@sha256:$DIGEST_2. Doesn't really make sense but the correct images will still be used.

In your case, images will always be referenced by a long identifier, any of the following:

  1. quay.io/company/experimental:image1-UNIQUE-ID
  2. quay.io/company/experimental:image1-DIGEST
  3. quay.io/company/experimental:image1-UNIQUE-ID@sha256:DIGEST
  4. quay.io/company/experimental:image1-DIGEST@sha256:DIGEST
  5. quay.io/company/experimental:image1@sha256:DIGEST (image1 is ignored but still gives a hint to the user that they are using the right image)
  6. quay.io/company/experimental@sha256:DIGEST.

The last two options are possible with both the current state of Skaffold and the proposed state. And they look ok to me when I compare them with the other options. wdyt?

I understand you'd rather have option 2 or 4 but it requires computing tags after the build which is something we want to avoid because we've been there and it's a mess.

Another possibility is use the inputDigest as a pseudo unique id. I think it's kinda ok but as you said, there's no bijection between such id and the digest.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, when using the full form image:tag@sha256:digest, the tag is ignored so you can have both quay.io/company/experimental:image1@sha256:$DIGEST_1 and quay.io/company/experimental:image1@sha256:$DIGEST_2. Doesn't really make sense but the correct images will still be used.

As far as I understand, it will work like this from the referencing standpoint. But what about the repository? Will skaffold be able to create two images in the same repository with the same tag but different digests?

I understand you'd rather have option 2 or 4 but it requires computing tags after the build

I don't really have a preference in which syntax I'll end up using. I need a migration path from v0.29 to the latest one. Option #‌2 looks closest to what we have now; #‌4 indeed looks redundant.

If you believe that #‌5 will work, I could give it some more testing but I'll need some extra guidance. What I tried was:

$ docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
mariadb                        latest              97df12fa9319        6 months ago        369MB
postgres                       latest              d03e2a8f3ed4        8 months ago        312MB

$ docker tag mariadb:latest experimental:project1

$ docker tag postgres:latest experimental:project1

$ docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
mariadb                        latest              97df12fa9319        6 months ago        369MB
experimental                   project1            d03e2a8f3ed4        8 months ago        312MB
postgres                       latest              d03e2a8f3ed4        8 months ago        312MB

As you can see, there's only one experimental:project1 tag that points to the same image as postgres:latest.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elisiano what is the real application of the project- prefixes in the experimental repository? Are they used for some automated housekeeping or mostly in order for a human to understand what project a specific image belongs to? In other words, what are the consequences of not using tags in the experimental repository but instead always using the @sha256 digest?

balopat
balopat previously requested changes Sep 6, 2019
Copy link
Copy Markdown
Contributor

@balopat balopat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this much thought into this. I like the direction, have questions and some answers.

Comment thread docs/design_proposals/digest-tagger.md Outdated
Comment thread docs/design_proposals/digest-tagger.md Outdated

## Proposal

+ we introduce a `latest` tagger tag tags images with `:latest`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically a current sha256, right? I.e it would still use the @sha256:sdfsdfsdf digest at deplyoment time? Or just pure myimage:latest?
How does that behave when the use has an image that already has a tag? overrides the tag?
Can you elaborate these here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below: "No matter the tagger, Skaffold will keep on using immutable references in manifests"

Comment thread docs/design_proposals/digest-tagger.md
Comment thread docs/design_proposals/digest-tagger.md Outdated
Comment thread docs/design_proposals/digest-tagger.md Outdated
+ we introduce a `latest` tagger tag tags images with `:latest`.
+ the `latest` tagger is used by default instead of the `git` tagger.
+ `sha256` is completely changed to use a digest of the artifact's inputs as the tag.
Something like https://github.com/GoogleContainerTools/skaffold/pull/2301
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW #2301 was not merged. I don't think we should hash Dockerfile and skaffold.yaml into the image.
To your second point: given the same exact commit, an image built twice with changes in external/remote dependencies will result in different sha256 digests given by the registry / image ID given by the docker daemon. Skaffold uses that and is rendered into the fully qualified image names in kubernetes, e.g. a git tagger based image has the git based tag + the digest: gcr.io/k8s-skaffold/gke-loadbalancer:v0.37.1-136-g174ea947a@sha256:7a3419895994a6a9e0c251fa5b3e32134aff987c3e0ec046bf81398c48e07cb4

Having this digest as a tag as well seems superfluous.

The tag can be the same - like in the case of latest, or envTemplate or dateTime (if the same image is built the exact same moment) - but the image digest will be different.

Now, we might still have an issue with the naming - sha256 seemingly generates expectations - how about depencyDigest / inputDigest / localChecksum ?


## Open Issues/Questions

+ How do we handle users who didn't configure a tagger and were happy with the default
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can introduce a new version - skaffold fix should include git tagger between the two versions. this is crappy if there is no actual config change...maybe we can wait until there is a config change? I think we might have to consider default value changes as config change.

Comment thread docs/design_proposals/digest-tagger.md
Comment thread docs/design_proposals/digest-tagger.md Outdated
Comment thread docs/design_proposals/digest-tagger.md Outdated
Comment thread docs/design_proposals/digest-tagger.md
Comment thread docs/design_proposals/digest-tagger.md Outdated
Comment thread docs/design_proposals/digest-tagger.md Outdated
Comment thread docs/design_proposals/digest-tagger.md Outdated
@nkubala
Copy link
Copy Markdown
Contributor

nkubala commented Oct 11, 2019

@dgageot i'd love to get this cleaned up and merged, we're really lacking clarification here in our docs. it seems like we've answered most of the big questions here so it's mainly wording changes.

@dgageot
Copy link
Copy Markdown
Contributor Author

dgageot commented Oct 11, 2019

@nkubala I’ll try to take some time next week. I would also like to make some progress!

@dgageot
Copy link
Copy Markdown
Contributor Author

dgageot commented Oct 14, 2019

@tejal29 @nkubala @morozov @balopat, I've tried to clean things up and respond to questions. Any comment?

@dgageot dgageot requested a review from balopat October 30, 2019 16:23
Signed-off-by: David Gageot <david@gageot.net>
Signed-off-by: David Gageot <david@gageot.net>
Signed-off-by: David Gageot <david@gageot.net>
Signed-off-by: David Gageot <david@gageot.net>
Signed-off-by: David Gageot <david@gageot.net>
@dgageot dgageot force-pushed the improve-taggers branch 2 times, most recently from ca9e61b to de83401 Compare November 6, 2019 11:53
Signed-off-by: David Gageot <david@gageot.net>
Copy link
Copy Markdown
Contributor

@nkubala nkubala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything here seems good to me. I'll merge this now, and we can get started on the changes proposed here. thanks again for putting this together @dgageot!

@nkubala nkubala dismissed balopat’s stale review November 12, 2019 21:00

requested changes were addressed

@nkubala nkubala merged commit 2f3a5f7 into GoogleContainerTools:master Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants