Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/spec: generic distribution content manifests #62

Closed
wants to merge 1 commit into from

Conversation

@jlhawn
Copy link
Contributor

@jlhawn jlhawn commented Jan 13, 2015

A proposal for generic content manifests that can be used for
any type of application that can be represented as a JSON config
and a collection of blobs of data, or may be composed of other
such applications.

@jlhawn jlhawn force-pushed the jlhawn:generic_manifest branch 2 times, most recently from e847f76 to fead316 Jan 14, 2015
@vbatts
Copy link
Contributor

@vbatts vbatts commented Jan 14, 2015

josh!

@jlhawn
Copy link
Contributor Author

@jlhawn jlhawn commented Jan 14, 2015

vincent!

@stevvooe
Copy link
Collaborator

@stevvooe stevvooe commented Jan 29, 2015

@jlhawn Per our discussion, it may be prudent to remove the version or tag field from the manifest. The revision of the manifest should be its digest. What version it is should be provided by external tag.

@jlhawn
Copy link
Contributor Author

@jlhawn jlhawn commented Jan 29, 2015

@stevvooe I'm loving that idea.

thumbtack-pushpin-2-md

@jlhawn jlhawn force-pushed the jlhawn:generic_manifest branch 2 times, most recently from d18489a to b1b4c13 Feb 7, 2015
Specifies the schema version of the manifest as an integer. This document
describes version `1` only.

- **`repository`** *string*

This comment has been minimized.

@stevvooe

stevvooe Feb 7, 2015
Collaborator

Let's just call this "name".


```json
{
"schemaVersion": 1,

This comment has been minimized.

@stevvooe

stevvooe Feb 7, 2015
Collaborator

Should this be version 2?

This comment has been minimized.

@dmcgowan

dmcgowan Feb 10, 2015
Collaborator

Was thinking the same thing earlier, will ensure that existing clients don't accidentally consume this since there is a check for this value.

@jlhawn
Copy link
Contributor Author

@jlhawn jlhawn commented Feb 7, 2015

@vbatts @ncdc @estesp please take a look at the draft spec document.

The way that we've outlined dependencies should enable layer federation (#88) for images, as well as the ability to verify that your content is based on other content.

@jlhawn jlhawn changed the title Generic Distribution Content Manifests doc/spec: Generic Distribution Content Manifests Feb 7, 2015
@stevvooe stevvooe changed the title doc/spec: Generic Distribution Content Manifests doc/spec: generic distribution content manifests Feb 7, 2015
@jlhawn jlhawn self-assigned this Feb 7, 2015
@estesp
Copy link
Contributor

@estesp estesp commented Feb 7, 2015

So, just to make sure I'm thinking of this in the right context, this content manifest format would be an alternate, more generic format than what exists in the distribution/v2 registry today? Does the API change as well given instead of blobsums there is the more generic dependency? Or the client changes to walk dependencies to find layers (for the traditional Docker client pull case); both in dependent other content manifests, as well as the "leaf nodes" of actual layer data -- e.g. of the media type application/vnd.docker.container.image.layer+x-gtar?

@jlhawn
Copy link
Contributor Author

@jlhawn jlhawn commented Feb 7, 2015

this content manifest format would be an alternate, more generic format than what exists in the distribution/v2 registry today?

Yes, we will deprecate the current format soon.

Does the API change as well given instead of blobsums there is the more generic dependency?

The next iteration of the API shouldn't change much. Today's "image-centric" api has two basic types of objects: image manifests and layer blobs. In the future everything will just be a content addressable blob in a named repository. Tags will be a collection of links in a repository (tag_name -> blob_digest). We'll also be adding detached signatures of content blobs and tags.

Or the client changes to walk dependencies to find layers (for the traditional Docker client pull case)

The client already does something like this to fetch all of the blobs listed in the manifest, but it will have to be updated to support nested manifest dependencies and content federation. It should be a relatively simple recursive content discovery and fetching algorithm.

@ncdc
Copy link
Contributor

@ncdc ncdc commented Feb 9, 2015

@jlhawn I read over this quickly and my first impression is LGTM, but I'd like to take a deeper look when I have some more time, hopefully later today.

/cc @aweiteka - please take a look w.r.t. layer federation

@jlhawn jlhawn force-pushed the jlhawn:generic_manifest branch from b1b4c13 to c28b97d Feb 9, 2015
@jlhawn jlhawn added the in progress label Feb 9, 2015

- **`digest`** *string*

The base64url-encoded SHA384 digest of the object.

This comment has been minimized.

@titanous

titanous Feb 10, 2015

Please allow algorithm agility here. TUF has a good example, which is an object of algorithm name -> hex bytes mappings.

Example:

"hashes": {
  "sha256": "d03b00f125367bcd2237c6a65c442f865b3aac0ba11864d64c0f69ced766e011"
}

This comment has been minimized.

@jlhawn

jlhawn Feb 10, 2015
Author Contributor

I want to leave open flexibility here as well, if desired. We already have existing code which places labels on the digest string (https://github.com/docker/distribution/blob/master/digest/digest.go) so the one I suggested here could look like:

"sha384+b64url:5zhS4u9AtGZ7QSsgeEqbVnCk5s9nfwk_gM1Ex6Uxwh2sKIeRJ4LaW0rg55Tx4X-Y"

This comment has been minimized.

@titanous

titanous Feb 10, 2015

The reason why an object with potentially multiple hashes is desirable is it allows you to phase in a new algorithm while still using the old, keeping backwards compatibility with consumers that do not know about the new algorithm.

This comment has been minimized.

@jlhawn

jlhawn Feb 10, 2015
Author Contributor

that is quite valuable, yes. +1

This comment has been minimized.

@jlhawn

jlhawn Feb 10, 2015
Author Contributor

@stevvooe want to discuss the technical implications of having multiple digests map to the same object? I believe we already handle this today by linking to a "canonical" digest right?

This comment has been minimized.

@stevvooe

stevvooe Feb 10, 2015
Collaborator

@titanous The digest strings already include the algorithm with the hash (ie "sha256:d03..."), allowing some flexibility. Likely, there will be some need to multiple digests.

Here are some questions I've been pondering along these lines:

How would having multiple hashes work as a content addressable identifier?
What becomes the "canonical" identifier?
Should all content be hashed under multiple hashes?
Do we maintain an index for each algorithm for every object hash to every other algorithm?

Good answers here would help support this suggestion.

This comment has been minimized.

@titanous

titanous Feb 10, 2015

How would having multiple hashes work as a content addressable identifier?
What becomes the "canonical" identifier?

Here's one solution: use an array, the first element is always treated as an opaque canonical identifier as well as a hash.

Should all content be hashed under multiple hashes?

One algorithm should be chosen for now (SHA-512 or maybe SHA-384). If/when transitioning to another algorithm makes sense, then all content should be hashed with the new algorithm and the old one until there are no legacy consumers left.

Do we maintain an index for each algorithm for every object hash to every other algorithm?

Not sure what this question is about, have I answered it above?

This comment has been minimized.

@dmcgowan

dmcgowan Feb 10, 2015
Collaborator

@stevvooe I am wondering if supporting multiple hash identifiers could be left up to the clients to specify multiple algorithms when storing content. If there is no reference to the new algorithm, then there is no requirement to automatically generate the new identifier. It can simply be done as an optimization to preemptively add an index entry to keep clients from pushing blobs which may already exist with a different hash. Essentially if no one tries to store content with the new hash identifier, it is reasonable to assume no one has a reference to fetch by it.

This comment has been minimized.

@stevvooe

stevvooe Feb 10, 2015
Collaborator

@titanous

Do we maintain an index for each algorithm for every object hash to every other algorithm?

Not sure what this question is about, have I answered it above?

Apologies for the word soup. The question is, if we have hash functions h0, h1 and h2, do we need to maintain indexes mapping hi(A) to hj(A), where i, j can have any value 0, 1, 2?

@dmcgowan This optimization sounds reasonable. I think the this approach might be bolstered by the concept of a "canonical" digest. For example, if content is uploaded with an older hash (say sha256, instead of sha384), the content would be verified against sha256 but also stored as sha384.

Are we opening the design up to collision attacks by linking a more secure digest to a less secure digest?

"mediaType": "application/vnd.docker.distribution.content.manifest.v1+json",
"size": 1024,
"repository": "isv.example.com/base/system",
"digest": "yKuiDWRdxLyLw5u9w7aEJ4doaLyBKJOAnMPjlmMcm-j7_CIP2zviko5jNGC5pofU"

This comment has been minimized.

@stevvooe

stevvooe Feb 10, 2015
Collaborator

Please show these examples is digest package format:

sha384...:yKuiDWRdxLyLw5u9w7aEJ4doaLyBKJOAnMPjlmMcm-j7_CIP2zviko5jNGC5pofU
@jlhawn jlhawn force-pushed the jlhawn:generic_manifest branch 2 times, most recently from 589eb52 to 448a612 Feb 11, 2015
@stevvooe
Copy link
Collaborator

@stevvooe stevvooe commented Jul 23, 2015

@harche The repeat of Target is intended. In this model, we always have a target, with data around providing context to that target. You have the correct PR but we'd probably need to merge the tags and manifest PRs for this effort.

@dmp42 dmp42 assigned stevvooe and unassigned jlhawn Jul 23, 2015
@harche
Copy link
Contributor

@harche harche commented Jul 29, 2015

@stevvooe Why do we have Labels in Manifest when we already have them in Descriptor?

Also plural use of that word (Label) indicates that you want to have multiple Labels per Manifest. How do you plan to accommodate multiple OS/Arch in just map[string]string?

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jul 29, 2015

(apologies if this is a silly question); Are the Labels here intended to be directly copied from the labels that are set in Docker? Because those are indeed simple key/value pairs (I.e. a label with a given name can only have a single value - if it's specified multiple times, the previous value is overwritten)

@stevvooe
Copy link
Collaborator

@stevvooe stevvooe commented Jul 29, 2015

@thaJeztah This specification makes no recommendation for how the Labels are to be used. Yes, they can be directly copied, to allow user labels but they can also be used to provide resolution hints for consuming processes.

Think of this type as a flexible primitive.

@harche Labels can be present at the manifest or descriptor level. The manifest may have labels that are different from the individual target labels, such as who created it. We may even want them on tag. They are just a convention we are trying to follow from the other data structures.

I'll update the comment above, accordingly.

@harche
Copy link
Contributor

@harche harche commented Jul 29, 2015

@stevvooe Thanks for the clarification, I was under assumption that Labels would serve to declare only OS/Arch information.

@DieterReuter
Copy link

@DieterReuter DieterReuter commented Aug 5, 2015

@stevvooe I've digged a little bit deeper into the label usage, maybe we could use this quite well for the different ARM types and all other different CPU architectures.
Example 1: Raspberry Pi 1

  "platform": {
    "os": "linux"  
    "arch": “arm”,
    "model": “armv6l”,
  }

Example 2: ARMv8 (64-bit) machine

  "platform": {
    "os": "linux"  
    "arch": “arm”,
    "model": “aarch64”,
  }

These are all the detailled infos a Docker engine would need to push or pull the appropriate image, but they should be mandatory (the labels os and arch at least) for future versions. For amd64 there’s no need to use the label model and we do have the same functionality like today in v1.

All the values could be easily and automatically gathered:
os: from GOOS env
arch: from GOARCH env
model: from uname -m (could be done from GO as well with using a simple kernel syscall, which works on Linux and Android as well)

In this case the logic to select the right image architecture is only necessary to be implemented in the Docker engine for a pull or a push. Even the compatibility logic will only be needed in the engine. So we could be very clean within the Docker registry itself, just storing the platform labels and use these labels for fetching the right images which are matching the requested platform where the Docker engine is running.

Here is an example how to use uname -a from GO: https://github.com/DieterReuter/osarch/blob/master/osarch_linux.go#L10

I did also run some recent tests on different devices (Mac, Linux and even Android) with a small basic implementation in GO. https://github.com/DieterReuter/osarch#results-on-different-devices
This tool could be extended for Windows too and could be improved to a general libosarch package/lib to gather these informations easily (maybe with a reference implementation in GO and C).

As I already said, that's just a rough idea how to determine the platform labels in an easy and elegant way. And if anyone is in the need of more specific labels, like a byte order or ABI, he is free to just add new additional labels in his container runtime. There no need to change the behaviour of the Docker registry.

@jlhawn
Copy link
Contributor Author

@jlhawn jlhawn commented Sep 17, 2015

Closing in favor of #993 by @estesp

@jlhawn jlhawn closed this Sep 17, 2015
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 5, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 5, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 14, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Oct 16, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Dec 18, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Dec 18, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Dec 18, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann added a commit to aaronlehmann/distribution that referenced this pull request Dec 18, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
BrianBland added a commit to BrianBland/distribution that referenced this pull request Dec 28, 2015
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
BrianBland added a commit to BrianBland/distribution that referenced this pull request Jan 6, 2016
This is a follow-on to PR docker#62, and it borrows much of the format
from docker#993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.