New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc/spec: generic distribution content manifests #62
Conversation
e847f76
to
fead316
Compare
josh! |
vincent! |
@jlhawn Per our discussion, it may be prudent to remove the version or tag field from the manifest. The revision of the manifest should be its digest. What version it is should be provided by external tag. |
@stevvooe I'm loving that idea. |
d18489a
to
b1b4c13
Compare
Specifies the schema version of the manifest as an integer. This document | ||
describes version `1` only. | ||
|
||
- **`repository`** *string* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just call this "name".
So, just to make sure I'm thinking of this in the right context, this content manifest format would be an alternate, more generic format than what exists in the distribution/v2 registry today? Does the API change as well given instead of blobsums there is the more generic dependency? Or the client changes to walk dependencies to find layers (for the traditional Docker client pull case); both in dependent other content manifests, as well as the "leaf nodes" of actual layer data -- e.g. of the media type |
Yes, we will deprecate the current format soon.
The next iteration of the API shouldn't change much. Today's "image-centric" api has two basic types of objects: image manifests and layer blobs. In the future everything will just be a content addressable blob in a named repository. Tags will be a collection of links in a repository (tag_name -> blob_digest). We'll also be adding detached signatures of content blobs and tags.
The client already does something like this to fetch all of the blobs listed in the manifest, but it will have to be updated to support nested manifest dependencies and content federation. It should be a relatively simple recursive content discovery and fetching algorithm. |
b1b4c13
to
c28b97d
Compare
|
||
- **`digest`** *string* | ||
|
||
The base64url-encoded SHA384 digest of the object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please allow algorithm agility here. TUF has a good example, which is an object of algorithm name -> hex bytes mappings.
Example:
"hashes": {
"sha256": "d03b00f125367bcd2237c6a65c442f865b3aac0ba11864d64c0f69ced766e011"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to leave open flexibility here as well, if desired. We already have existing code which places labels on the digest string (https://github.com/docker/distribution/blob/master/digest/digest.go) so the one I suggested here could look like:
"sha384+b64url:5zhS4u9AtGZ7QSsgeEqbVnCk5s9nfwk_gM1Ex6Uxwh2sKIeRJ4LaW0rg55Tx4X-Y"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why an object with potentially multiple hashes is desirable is it allows you to phase in a new algorithm while still using the old, keeping backwards compatibility with consumers that do not know about the new algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is quite valuable, yes. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevvooe want to discuss the technical implications of having multiple digests map to the same object? I believe we already handle this today by linking to a "canonical" digest right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@titanous The digest strings already include the algorithm with the hash (ie "sha256:d03..."), allowing some flexibility. Likely, there will be some need to multiple digests.
Here are some questions I've been pondering along these lines:
How would having multiple hashes work as a content addressable identifier?
What becomes the "canonical" identifier?
Should all content be hashed under multiple hashes?
Do we maintain an index for each algorithm for every object hash to every other algorithm?
Good answers here would help support this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would having multiple hashes work as a content addressable identifier?
What becomes the "canonical" identifier?
Here's one solution: use an array, the first element is always treated as an opaque canonical identifier as well as a hash.
Should all content be hashed under multiple hashes?
One algorithm should be chosen for now (SHA-512 or maybe SHA-384). If/when transitioning to another algorithm makes sense, then all content should be hashed with the new algorithm and the old one until there are no legacy consumers left.
Do we maintain an index for each algorithm for every object hash to every other algorithm?
Not sure what this question is about, have I answered it above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevvooe I am wondering if supporting multiple hash identifiers could be left up to the clients to specify multiple algorithms when storing content. If there is no reference to the new algorithm, then there is no requirement to automatically generate the new identifier. It can simply be done as an optimization to preemptively add an index entry to keep clients from pushing blobs which may already exist with a different hash. Essentially if no one tries to store content with the new hash identifier, it is reasonable to assume no one has a reference to fetch by it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we maintain an index for each algorithm for every object hash to every other algorithm?
Not sure what this question is about, have I answered it above?
Apologies for the word soup. The question is, if we have hash functions h0, h1 and h2, do we need to maintain indexes mapping hi(A) to hj(A), where i, j can have any value 0, 1, 2?
@dmcgowan This optimization sounds reasonable. I think the this approach might be bolstered by the concept of a "canonical" digest. For example, if content is uploaded with an older hash (say sha256, instead of sha384), the content would be verified against sha256 but also stored as sha384.
Are we opening the design up to collision attacks by linking a more secure digest to a less secure digest?
589eb52
to
448a612
Compare
@stevvooe Why do we have Also plural use of that word ( |
(apologies if this is a silly question); Are the |
@thaJeztah This specification makes no recommendation for how the Think of this type as a flexible primitive. @harche I'll update the comment above, accordingly. |
@stevvooe Thanks for the clarification, I was under assumption that |
@stevvooe I've digged a little bit deeper into the label usage, maybe we could use this quite well for the different ARM types and all other different CPU architectures.
Example 2: ARMv8 (64-bit) machine
These are all the detailled infos a Docker engine would need to push or pull the appropriate image, but they should be mandatory (the labels All the values could be easily and automatically gathered: In this case the logic to select the right image architecture is only necessary to be implemented in the Docker engine for a pull or a push. Even the compatibility logic will only be needed in the engine. So we could be very clean within the Docker registry itself, just storing the platform labels and use these labels for fetching the right images which are matching the requested platform where the Docker engine is running. Here is an example how to use I did also run some recent tests on different devices (Mac, Linux and even Android) with a small basic implementation in GO. https://github.com/DieterReuter/osarch#results-on-different-devices As I already said, that's just a rough idea how to determine the platform labels in an easy and elegant way. And if anyone is in the need of more specific labels, like a byte order or ABI, he is free to just add new additional labels in his container runtime. There no need to change the behaviour of the Docker registry. |
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
A proposal for generic content manifests that can be used for
any type of application that can be represented as a JSON config
and a collection of blobs of data, or may be composed of other
such applications.