Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/spec: generic distribution content manifests #62

Closed
wants to merge 1 commit into from

Conversation

jlhawn
Copy link
Contributor

@jlhawn jlhawn commented Jan 13, 2015

A proposal for generic content manifests that can be used for
any type of application that can be represented as a JSON config
and a collection of blobs of data, or may be composed of other
such applications.

@jlhawn jlhawn force-pushed the generic_manifest branch 2 times, most recently from e847f76 to fead316 Compare January 14, 2015 18:24
@vbatts
Copy link

vbatts commented Jan 14, 2015

josh!

@jlhawn
Copy link
Contributor Author

jlhawn commented Jan 14, 2015

vincent!

@stevvooe
Copy link
Collaborator

@jlhawn Per our discussion, it may be prudent to remove the version or tag field from the manifest. The revision of the manifest should be its digest. What version it is should be provided by external tag.

@jlhawn
Copy link
Contributor Author

jlhawn commented Jan 29, 2015

@stevvooe I'm loving that idea.

thumbtack-pushpin-2-md

Specifies the schema version of the manifest as an integer. This document
describes version `1` only.

- **`repository`** *string*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just call this "name".

@jlhawn
Copy link
Contributor Author

jlhawn commented Feb 7, 2015

@vbatts @ncdc @estesp please take a look at the draft spec document.

The way that we've outlined dependencies should enable layer federation (#88) for images, as well as the ability to verify that your content is based on other content.

@jlhawn jlhawn changed the title Generic Distribution Content Manifests doc/spec: Generic Distribution Content Manifests Feb 7, 2015
@stevvooe stevvooe changed the title doc/spec: Generic Distribution Content Manifests doc/spec: generic distribution content manifests Feb 7, 2015
@jlhawn jlhawn self-assigned this Feb 7, 2015
@estesp
Copy link
Contributor

estesp commented Feb 7, 2015

So, just to make sure I'm thinking of this in the right context, this content manifest format would be an alternate, more generic format than what exists in the distribution/v2 registry today? Does the API change as well given instead of blobsums there is the more generic dependency? Or the client changes to walk dependencies to find layers (for the traditional Docker client pull case); both in dependent other content manifests, as well as the "leaf nodes" of actual layer data -- e.g. of the media type application/vnd.docker.container.image.layer+x-gtar?

@jlhawn
Copy link
Contributor Author

jlhawn commented Feb 7, 2015

this content manifest format would be an alternate, more generic format than what exists in the distribution/v2 registry today?

Yes, we will deprecate the current format soon.

Does the API change as well given instead of blobsums there is the more generic dependency?

The next iteration of the API shouldn't change much. Today's "image-centric" api has two basic types of objects: image manifests and layer blobs. In the future everything will just be a content addressable blob in a named repository. Tags will be a collection of links in a repository (tag_name -> blob_digest). We'll also be adding detached signatures of content blobs and tags.

Or the client changes to walk dependencies to find layers (for the traditional Docker client pull case)

The client already does something like this to fetch all of the blobs listed in the manifest, but it will have to be updated to support nested manifest dependencies and content federation. It should be a relatively simple recursive content discovery and fetching algorithm.

@ncdc
Copy link

ncdc commented Feb 9, 2015

@jlhawn I read over this quickly and my first impression is LGTM, but I'd like to take a deeper look when I have some more time, hopefully later today.

/cc @aweiteka - please take a look w.r.t. layer federation


- **`digest`** *string*

The base64url-encoded SHA384 digest of the object.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please allow algorithm agility here. TUF has a good example, which is an object of algorithm name -> hex bytes mappings.

Example:

"hashes": {
  "sha256": "d03b00f125367bcd2237c6a65c442f865b3aac0ba11864d64c0f69ced766e011"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to leave open flexibility here as well, if desired. We already have existing code which places labels on the digest string (https://github.com/docker/distribution/blob/master/digest/digest.go) so the one I suggested here could look like:

"sha384+b64url:5zhS4u9AtGZ7QSsgeEqbVnCk5s9nfwk_gM1Ex6Uxwh2sKIeRJ4LaW0rg55Tx4X-Y"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why an object with potentially multiple hashes is desirable is it allows you to phase in a new algorithm while still using the old, keeping backwards compatibility with consumers that do not know about the new algorithm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is quite valuable, yes. +1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe want to discuss the technical implications of having multiple digests map to the same object? I believe we already handle this today by linking to a "canonical" digest right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@titanous The digest strings already include the algorithm with the hash (ie "sha256:d03..."), allowing some flexibility. Likely, there will be some need to multiple digests.

Here are some questions I've been pondering along these lines:

How would having multiple hashes work as a content addressable identifier?
What becomes the "canonical" identifier?
Should all content be hashed under multiple hashes?
Do we maintain an index for each algorithm for every object hash to every other algorithm?

Good answers here would help support this suggestion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would having multiple hashes work as a content addressable identifier?
What becomes the "canonical" identifier?

Here's one solution: use an array, the first element is always treated as an opaque canonical identifier as well as a hash.

Should all content be hashed under multiple hashes?

One algorithm should be chosen for now (SHA-512 or maybe SHA-384). If/when transitioning to another algorithm makes sense, then all content should be hashed with the new algorithm and the old one until there are no legacy consumers left.

Do we maintain an index for each algorithm for every object hash to every other algorithm?

Not sure what this question is about, have I answered it above?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe I am wondering if supporting multiple hash identifiers could be left up to the clients to specify multiple algorithms when storing content. If there is no reference to the new algorithm, then there is no requirement to automatically generate the new identifier. It can simply be done as an optimization to preemptively add an index entry to keep clients from pushing blobs which may already exist with a different hash. Essentially if no one tries to store content with the new hash identifier, it is reasonable to assume no one has a reference to fetch by it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@titanous

Do we maintain an index for each algorithm for every object hash to every other algorithm?

Not sure what this question is about, have I answered it above?

Apologies for the word soup. The question is, if we have hash functions h0, h1 and h2, do we need to maintain indexes mapping hi(A) to hj(A), where i, j can have any value 0, 1, 2?

@dmcgowan This optimization sounds reasonable. I think the this approach might be bolstered by the concept of a "canonical" digest. For example, if content is uploaded with an older hash (say sha256, instead of sha384), the content would be verified against sha256 but also stored as sha384.

Are we opening the design up to collision attacks by linking a more secure digest to a less secure digest?

@dmp42 dmp42 assigned stevvooe and unassigned jlhawn Jul 23, 2015
@harche
Copy link
Contributor

harche commented Jul 29, 2015

@stevvooe Why do we have Labels in Manifest when we already have them in Descriptor?

Also plural use of that word (Label) indicates that you want to have multiple Labels per Manifest. How do you plan to accommodate multiple OS/Arch in just map[string]string?

@thaJeztah
Copy link
Member

(apologies if this is a silly question); Are the Labels here intended to be directly copied from the labels that are set in Docker? Because those are indeed simple key/value pairs (I.e. a label with a given name can only have a single value - if it's specified multiple times, the previous value is overwritten)

@stevvooe
Copy link
Collaborator

@thaJeztah This specification makes no recommendation for how the Labels are to be used. Yes, they can be directly copied, to allow user labels but they can also be used to provide resolution hints for consuming processes.

Think of this type as a flexible primitive.

@harche Labels can be present at the manifest or descriptor level. The manifest may have labels that are different from the individual target labels, such as who created it. We may even want them on tag. They are just a convention we are trying to follow from the other data structures.

I'll update the comment above, accordingly.

@harche
Copy link
Contributor

harche commented Jul 29, 2015

@stevvooe Thanks for the clarification, I was under assumption that Labels would serve to declare only OS/Arch information.

@DieterReuter
Copy link

@stevvooe I've digged a little bit deeper into the label usage, maybe we could use this quite well for the different ARM types and all other different CPU architectures.
Example 1: Raspberry Pi 1

  "platform": {
    "os": "linux"  
    "arch": “arm”,
    "model": “armv6l”,
  }

Example 2: ARMv8 (64-bit) machine

  "platform": {
    "os": "linux"  
    "arch": “arm”,
    "model": “aarch64”,
  }

These are all the detailled infos a Docker engine would need to push or pull the appropriate image, but they should be mandatory (the labels os and arch at least) for future versions. For amd64 there’s no need to use the label model and we do have the same functionality like today in v1.

All the values could be easily and automatically gathered:
os: from GOOS env
arch: from GOARCH env
model: from uname -m (could be done from GO as well with using a simple kernel syscall, which works on Linux and Android as well)

In this case the logic to select the right image architecture is only necessary to be implemented in the Docker engine for a pull or a push. Even the compatibility logic will only be needed in the engine. So we could be very clean within the Docker registry itself, just storing the platform labels and use these labels for fetching the right images which are matching the requested platform where the Docker engine is running.

Here is an example how to use uname -a from GO: https://github.com/DieterReuter/osarch/blob/master/osarch_linux.go#L10

I did also run some recent tests on different devices (Mac, Linux and even Android) with a small basic implementation in GO. https://github.com/DieterReuter/osarch#results-on-different-devices
This tool could be extended for Windows too and could be improved to a general libosarch package/lib to gather these informations easily (maybe with a reference implementation in GO and C).

As I already said, that's just a rough idea how to determine the platform labels in an easy and elegant way. And if anyone is in the need of more specific labels, like a byte order or ABI, he is free to just add new additional labels in his container runtime. There no need to change the behaviour of the Docker registry.

@jlhawn
Copy link
Contributor Author

jlhawn commented Sep 17, 2015

Closing in favor of #993 by @estesp

@jlhawn jlhawn closed this Sep 17, 2015
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 5, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 5, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 16, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Dec 18, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Dec 18, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Dec 18, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Dec 18, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
BrianBland pushed a commit to BrianBland/distribution that referenced this pull request Dec 28, 2015
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
BrianBland pushed a commit to BrianBland/distribution that referenced this pull request Jan 6, 2016
This is a follow-on to PR distribution#62, and it borrows much of the format
from distribution#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@avianbrooks
Copy link

Github
Duplicate of # #3654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet