Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/spec: tags as a first class object #173

Closed
wants to merge 5 commits into from

Conversation

stevvooe
Copy link
Collaborator

Included in this PR is a strawman proposal for first-class tags for use in the registry.

This proposal will be followed up with a companion proposal that handles first-class signature objects.

Signed-off-by: Stephen J Day stephen.day@docker.com

@stevvooe
Copy link
Collaborator Author

@ncdc
Copy link

ncdc commented Feb 14, 2015

cc @smarterclayton @vbatts

Tags in docker are missing several features that one might expect when coming
from other systems. In git, one can tag any commit and then sign that tag,
providing a name to a given revision and the ability to verify that the name
was assigned by a trusted party. We'd like bring this
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to get this out as a preview. I'll give this a full once over today.

@ncdc
Copy link

ncdc commented Feb 14, 2015

Would this be the flow for resolving a tag to a manifest and then pulling by tag+digest?

  1. GET /v2/<name>/tags/<tag>
  2. Record digest from target.digests[0]
  3. GET /v2/<name>/manifests/<tag>/<digest> where <digest> is the value recorded in step 2
  4. Create container
  5. ... Time passes ...
  6. Decide to create container using same image as the previous container
  7. GET /v2/<name>/manifests/<tag>/<digest> where <digest> is the value recorded in step 2

@thaJeztah
Copy link
Member

@stevvooe maybe I've misinterpreted this proposal, but would this assist in resolving moby/moby#8689 for V2 registries?

A list of digests that identify the target object. The first entry
should be considered canonical.
</dd>
</dl>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size should also be added to this list for additional security

@stevvooe
Copy link
Collaborator Author

@ncdc

Would this be the flow for resolving a tag to a manifest and then pulling by tag+digest?

This would require saving the history of the tag. Currently, we have support for this but mostly because that's the docker model. If we really want "git-like" tags, we may need to decide to part with this use case.

I am considering proposing a new /v2/<name>/manifests/<digest> route, which makes a lot more sense with the changed model of tags. We could optionally just support this through the blob api.

@stevvooe maybe I've misinterpreted this proposal, but would this assist in resolving moby/moby#8689 for V2 registries?

This proposal is oriented towards defining a more formal model of tags. It seems like its expected behavior of moby/moby#8689 that if a tag A is pulled, all equivalents tags (i.e. aliases) that point at the same content are also pulled. On its face, I am not sure if this fits in with the proposed lighter-weight model.

There are two possibilities to implement the behavior described in moby/moby#8689 in the context of this proposal:

  1. Literal approach: docker pull mongo:2 pulls all aliases of "2". This requires the ability to identify tags pointing at equivalent content from the API.
  2. Pull all tags: docker pull mongo:2 pulls all tags for "mongo". Equivalent tags are identified client-side and populated after pulling the content for "mongo:2".

Some details here need to be worked out but option 2 is much better from an API and security perspective. It would also benefit from the ability to tag content that is not yet resolved on the client-side.

@ncdc
Copy link

ncdc commented Feb 16, 2015

@stevvooe

This would require saving the history of the tag

By the registry? In my flow, I probably wasn't clear enough that step 2, record the digest for the tag, would be done by an external client (e.g. OpenShift). As long as the manifest@digest still exists in storage, I assume that's good enough for an external client, right?

I am considering proposing a new "/v2//manifests/" route

I think something got stripped in between v2 and manifests?

@thaJeztah
Copy link
Member

It seems like its expected behavior of moby/moby#8689 that if a tag A is pulled, all equivalents tags (i.e. aliases) that point at the same content are also pulled

The thought behind that feature was to keep images "in sync", i.e., ubuntu:14.04 and ubuntu:latest (currently) point to the same image on the registry. Doing docker pull ubuntu (which defaults to ubuntu:latest), will only update ubuntu:latest, causing ubuntu:latest and ubuntu:14.04 to produce a different result when used.

That's just to explain the original background on this, though. For a moment, I thought this proposal would make pulling (updating) aliases possible again.

Would matching images still be possible, though with V2? Querying all tags client-side to look for possible aliases is probably too heavy for this. (Some issues were reported in docker for images that had hundreds of tags, causing docker to become very slow to fetch all those tags for comparison)

I'll probably have to live with this change (not a huge problem) Thanks for clarifying though, much appreciated!

@stevvooe
Copy link
Collaborator Author

@ncdc

By the registry? In my flow, I probably wasn't clear enough that step 2, record the digest for the tag, would be done by an external client (e.g. OpenShift). As long as the manifest@digest still exists in storage, I assume that's good enough for an external client, right?

In the registry, if <name>:<tag>@<digest> is to be supported, one may have to save the history. This is less strong than my original statement. But yes, as long as manifest@digest still exists, it should be good enough. This proposal attempts to make <name>@<digest> a more viable method of specifying an object reference.

@stevvooe
Copy link
Collaborator Author

@thaJeztah Thank for clarifying the intent of the feature.

While it would be great to continue to support such behavior, there are number of security implications for managing a mutable "tracking tag" or alias. What if updates get blocked? What if a tag no longer points to the same content as an alias? Really, the tags for a repository need to be treated as a set or individually. Partial groups open up security holes during updates (ping @dmcgowan for details). Single tags can be verified reliably to a certain degree. Either way, we'd need another method to handle this securely ("tag groups" have been discussed).

Would matching images still be possible, though with V2? Querying all tags client-side to look for possible aliases is probably too heavy for this. (Some issues were reported in docker for images that had hundreds of tags, causing docker to become very slow to fetch all those tags for comparison)

The first answer is that querying tags in v2 needs to be more efficient. Currently, it is, but not quite enough for this proposal. We'd need to bolster the API with pagination (or tag groups!). A better answer would be to consider the case of "tracking tag", such as "latest". It is trivial to check if this tag has been updated. If there are "equivalent" tags, as detected on the client-side, those could be concurrently checked for updates.

I'll probably have to live with this change (not a huge problem) Thanks for clarifying though, much appreciated!

I'll go one further and predict that you'll be quite happy with the direction. The goal is to make these edge cases more obvious and more secure. The implementation of power features on top of this concept should be more straight forward than before.

@dmcgowan
Copy link
Collaborator

@thaJeztah I will add to what Stephen mentioned about potential security holes. Just as when a user requests an individual tag, there needs to be a signed statement matching the tag to the content-address of the manifest. Getting all tags inherently has the same problem, it needs to be treated as a secure list of tags. If this list is not secure it could open up a security hole to altering the list of tags by either adding tags, removing tags, or pointing to older versions of tags.

@stevvooe would you say the tags list as an object should be considered a separate proposal? I would like to allow both the tag list and tags be handled by a separate system, but the separate system could just be responsible for pointing a named tag to a content-hash of the tag object, or a hash of the tag list object.

@thaJeztah
Copy link
Member

@stevvooe @dmcgowan I see the problem, yes. I think the issue is "tags" currently aren't immutable, and can be re-used. Making :latest a special case (like :head) as a "tracking tag" could work, but would also be a breaking change because in many repos, :latest isn't actually :latest.

If I understand your comment correctly, with V2 it would no longer be possible to re-assign :latest to a different image (i.e., use it for the next Ubuntu LTS release). Tag groups wouldn't probably solve this either, because that would require the "latest" tag to be moved to a different group (14.04 + latest -> 16.04 + latest).

In addition, some images use a version-scheme where a tag is "temporarily" an alias for :latest, for example, currently ubuntu:14.04 and ubuntu:14.04.1 are the same image; both receive updates. Once ubuntu:14.04.2 is released, that tag will be an alias for ubuntu:14.04 and the ubuntu:14.04.1 image will no longer receive updates.

I think there will always be a need for "tracking tags"; people expect they can "subscribe" to a repository and are guaranteed to have their images updated with the latest (security) updates when they docker pull.

There's also a need to be able to have the opposite; being able to build a container from a specific version of an image/tag. Perhaps a different form should be used for immutable and "rolling / tracking" tags, e.g. ubuntu:14.04@adedf3a.

I really wonder what solution can be found to solve both use-cases and address the security concerns you mentioned.

I think it might be good to have the maintainers of the "official images" take part in discussion as well, because this may influence they way those images need to be "tagged" in the future.

Gosh, this is hard to put in words, LOL.

@vbatts
Copy link

vbatts commented Feb 17, 2015

Like @ncdc was asking, with this, are we giving the vocabulary for fetching exact previous images/builds, similar to pull-by-id? Also trying to address the registry-side garbage collection?

@stevvooe stevvooe added this to the Registry/Future milestone Feb 17, 2015
@stevvooe
Copy link
Collaborator Author

Gosh, this is hard to put in words, LOL.

This is the only statement above that I am certain of ;).

I hope I have answered most of your concerns or at least helped you to understand. I believe the proposal does actually cover most of the use cases you've brought up. If you have actions or changes you'd like me to take in regards to this proposal, please call them out explicitly.

Here the properties of the "new" tags that might help to reason about these scenarios:

  1. A tag points at a single piece content, called the "target".
  2. A target may be anything but for our purposes is a manifest.
  3. The target of a tag may change (tags are mutable) but timely update may not be guaranteed (tags may be out of date).
  4. The contents of a tag, including the namespace, may be confirmed by a signature or collection of signatures.

These properties cover most of what you've described above. A tag "latest" will be able to track updates and parties will be able to confirm the signatures on the "latest" tag (guaranteeing you have the latest "latest" is another problem).

There's also a need to be able to have the opposite; being able to build a container from a specific version of an image/tag. Perhaps a different form should be used for immutable and "rolling / tracking" tags, e.g. ubuntu:14.04@adedf3a.

All content in the V2 registry is content-addressable, so immutable tags aren't really needed. The syntax you propose is what would identify a specific manifest revision with the given tag.

Don't read too much into the idea of "Tag Groups". They're basically a "MacGuffin" at this point.

@stevvooe
Copy link
Collaborator Author

@vbatts

Like @ncdc was asking, with this, are we giving the vocabulary for fetching exact previous images/builds, similar to pull-by-id?

I think this is covered in @ncdc moby/moby#10740.

Also trying to address the registry-side garbage collection?

Right now, anything unreferenced by a manifest is open for garbage collection. Deleting a given tag deletes all the history of that tag. We may want to modify this approach such that a manifest that is unreferenced by a tag is open to garbage collection by this may be too aggressive.

We still need to put together a formal garbage collection proposal.

@thaJeztah
Copy link
Member

I believe the proposal does actually cover most of the use cases you've brought up. If you have actions or changes you'd like me to take in regards to this proposal, please call them out explicitly.

Thanks for putting my mind at ease, I hadn't followed the changes too closely recently and (clearly) couldn't get my head around what exactly the changes would imply.

@stevvooe stevvooe mentioned this pull request Mar 5, 2015
4 tasks
@stevvooe stevvooe removed this from the Registry/Future milestone Mar 18, 2015
---------------------------------------------------------|--------------------------------------------------|
| `application/vnd.docker.distribution.tag.v1+json` | Base tag object |
| `application/vnd.docker.distribution.tag.v1+jws` | tag object wrapped in one or more jws signatures |
| `application/vnd.docker.distribution.tag.v1+prettyjws` | tag object with jws signatures in pretty format |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we just get rid of this for simplification purposes. Having an 2 media types that differ only in whitespace seems strange. We should favour the compacted one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prettyjws and jws are formats that differ in more than just whitespace. One has the payload embedded as json while the other has it encoded in base64.

@stevvooe
Copy link
Collaborator Author

stevvooe commented Jan 7, 2016

@RichardScothern @aaronlehmann I'm not sure what our plans our, but let's make a decision about carrying this or closing this PR.

@RichardScothern
Copy link
Contributor

I don't think Signatures are in our future thoughts.

I don't think adding tagging endpoints to the registry is desirable either. Tagging should eventually be in a separate service.

@stevvooe
Copy link
Collaborator Author

@RichardScothern What service?

@stevvooe
Copy link
Collaborator Author

@RichardScothern Reading back through, this proposal is still pretty solid. We just need to remove the part about signatures and it will be fine.

@RichardScothern
Copy link
Contributor

@stevvooe notary?

@RichardScothern
Copy link
Contributor

@stevvooe : I think we are at cross communications here.

The registry can front the the tag endpoint specified in this proposal, and eventually defer to notary - until then it won't be a separate service.

Tagging endpoint will introduce extra round-trips for the client, but will alleviate a lot of issues mentioned in #1296. @aaronlehmann ?

@aaronlehmann
Copy link
Contributor

While I appreciate efforts to fix the issues in #1296, I'm not generally enthusastic about adding extra round trips. They can slow down pull and push processes quite significantly, particularly because they usually involve object store latency. Querying a tag is something that can't be parallelized with other tasks, so the latency will be noticeable.

@stevvooe
Copy link
Collaborator Author

@aaronlehmann

They can slow down pull and push processes quite significantly, particularly because they usually involve object store latency. Querying a tag is something that can't be parallelized with other tasks, so the latency will be noticeable.

I think we can fix this by changing the architecture around tag handling. We could even use the tag API to seed the task of moving to the transactional store, since this is where most of expensive round trip tasks are concentrated.

While it does step on the toes of notary, we could also design the tag API as being more batch oriented, rather than request oriented, eliminating round trips.

@dmcgowan
Copy link
Collaborator

Additions to specifications are no longer being considered here. The specification has moved to https://github.com/opencontainers/distribution-spec

This discussion may continue or be referenced as part of the notary v2 with the consideration of signed tags

@dmcgowan dmcgowan closed this Feb 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet