doc/spec: generic distribution content manifests #62

jlhawn · 2015-01-13T21:55:56Z

A proposal for generic content manifests that can be used for
any type of application that can be represented as a JSON config
and a collection of blobs of data, or may be composed of other
such applications.

vbatts · 2015-01-14T19:45:51Z

josh!

jlhawn · 2015-01-14T21:20:03Z

vincent!

stevvooe · 2015-01-29T19:56:53Z

@jlhawn Per our discussion, it may be prudent to remove the version or tag field from the manifest. The revision of the manifest should be its digest. What version it is should be provided by external tag.

jlhawn · 2015-01-29T19:57:30Z

@stevvooe I'm loving that idea.

stevvooe · 2015-02-07T01:36:06Z

doc/spec/manifest.md

+    Specifies the schema version of the manifest as an integer. This document 
+    describes version `1` only.
+
+- **`repository`** *string*


Let's just call this "name".

jlhawn · 2015-02-07T01:43:23Z

@vbatts @ncdc @estesp please take a look at the draft spec document.

The way that we've outlined dependencies should enable layer federation (#88) for images, as well as the ability to verify that your content is based on other content.

estesp · 2015-02-07T05:28:19Z

So, just to make sure I'm thinking of this in the right context, this content manifest format would be an alternate, more generic format than what exists in the distribution/v2 registry today? Does the API change as well given instead of blobsums there is the more generic dependency? Or the client changes to walk dependencies to find layers (for the traditional Docker client pull case); both in dependent other content manifests, as well as the "leaf nodes" of actual layer data -- e.g. of the media type application/vnd.docker.container.image.layer+x-gtar?

jlhawn · 2015-02-07T19:16:42Z

this content manifest format would be an alternate, more generic format than what exists in the distribution/v2 registry today?

Yes, we will deprecate the current format soon.

Does the API change as well given instead of blobsums there is the more generic dependency?

The next iteration of the API shouldn't change much. Today's "image-centric" api has two basic types of objects: image manifests and layer blobs. In the future everything will just be a content addressable blob in a named repository. Tags will be a collection of links in a repository (tag_name -> blob_digest). We'll also be adding detached signatures of content blobs and tags.

Or the client changes to walk dependencies to find layers (for the traditional Docker client pull case)

The client already does something like this to fetch all of the blobs listed in the manifest, but it will have to be updated to support nested manifest dependencies and content federation. It should be a relatively simple recursive content discovery and fetching algorithm.

ncdc · 2015-02-09T14:20:12Z

@jlhawn I read over this quickly and my first impression is LGTM, but I'd like to take a deeper look when I have some more time, hopefully later today.

/cc @aweiteka - please take a look w.r.t. layer federation

titanous · 2015-02-10T01:28:33Z

doc/spec/manifest.md

+
+    - **`digest`** *string*
+
+        The base64url-encoded SHA384 digest of the object.


Please allow algorithm agility here. TUF has a good example, which is an object of algorithm name -> hex bytes mappings.

Example:

"hashes": { "sha256": "d03b00f125367bcd2237c6a65c442f865b3aac0ba11864d64c0f69ced766e011" }

I want to leave open flexibility here as well, if desired. We already have existing code which places labels on the digest string (https://github.com/docker/distribution/blob/master/digest/digest.go) so the one I suggested here could look like:

"sha384+b64url:5zhS4u9AtGZ7QSsgeEqbVnCk5s9nfwk_gM1Ex6Uxwh2sKIeRJ4LaW0rg55Tx4X-Y"

The reason why an object with potentially multiple hashes is desirable is it allows you to phase in a new algorithm while still using the old, keeping backwards compatibility with consumers that do not know about the new algorithm.

that is quite valuable, yes. +1

@stevvooe want to discuss the technical implications of having multiple digests map to the same object? I believe we already handle this today by linking to a "canonical" digest right?

@titanous The digest strings already include the algorithm with the hash (ie "sha256:d03..."), allowing some flexibility. Likely, there will be some need to multiple digests.

Here are some questions I've been pondering along these lines:

How would having multiple hashes work as a content addressable identifier?
What becomes the "canonical" identifier?
Should all content be hashed under multiple hashes?
Do we maintain an index for each algorithm for every object hash to every other algorithm?

Good answers here would help support this suggestion.

How would having multiple hashes work as a content addressable identifier?
What becomes the "canonical" identifier?

Here's one solution: use an array, the first element is always treated as an opaque canonical identifier as well as a hash.

Should all content be hashed under multiple hashes?

One algorithm should be chosen for now (SHA-512 or maybe SHA-384). If/when transitioning to another algorithm makes sense, then all content should be hashed with the new algorithm and the old one until there are no legacy consumers left.

Do we maintain an index for each algorithm for every object hash to every other algorithm?

Not sure what this question is about, have I answered it above?

@stevvooe I am wondering if supporting multiple hash identifiers could be left up to the clients to specify multiple algorithms when storing content. If there is no reference to the new algorithm, then there is no requirement to automatically generate the new identifier. It can simply be done as an optimization to preemptively add an index entry to keep clients from pushing blobs which may already exist with a different hash. Essentially if no one tries to store content with the new hash identifier, it is reasonable to assume no one has a reference to fetch by it.

@titanous

Do we maintain an index for each algorithm for every object hash to every other algorithm?

Not sure what this question is about, have I answered it above?

Apologies for the word soup. The question is, if we have hash functions h₀, h₁ and h₂, do we need to maintain indexes mapping h_i(A) to h_j(A), where i, j can have any value 0, 1, 2?

@dmcgowan This optimization sounds reasonable. I think the this approach might be bolstered by the concept of a "canonical" digest. For example, if content is uploaded with an older hash (say sha256, instead of sha384), the content would be verified against sha256 but also stored as sha384.

Are we opening the design up to collision attacks by linking a more secure digest to a less secure digest?

harche · 2015-07-29T16:58:56Z

@stevvooe Why do we have Labels in Manifest when we already have them in Descriptor?

Also plural use of that word (Label) indicates that you want to have multiple Labels per Manifest. How do you plan to accommodate multiple OS/Arch in just map[string]string?

thaJeztah · 2015-07-29T17:14:09Z

(apologies if this is a silly question); Are the Labels here intended to be directly copied from the labels that are set in Docker? Because those are indeed simple key/value pairs (I.e. a label with a given name can only have a single value - if it's specified multiple times, the previous value is overwritten)

stevvooe · 2015-07-29T17:35:56Z

@thaJeztah This specification makes no recommendation for how the Labels are to be used. Yes, they can be directly copied, to allow user labels but they can also be used to provide resolution hints for consuming processes.

Think of this type as a flexible primitive.

@harche Labels can be present at the manifest or descriptor level. The manifest may have labels that are different from the individual target labels, such as who created it. We may even want them on tag. They are just a convention we are trying to follow from the other data structures.

I'll update the comment above, accordingly.

harche · 2015-07-29T17:38:39Z

@stevvooe Thanks for the clarification, I was under assumption that Labels would serve to declare only OS/Arch information.

DieterReuter · 2015-08-05T16:37:58Z

@stevvooe I've digged a little bit deeper into the label usage, maybe we could use this quite well for the different ARM types and all other different CPU architectures.
Example 1: Raspberry Pi 1

  "platform": {
    "os": "linux"  
    "arch": “arm”,
    "model": “armv6l”,
  }

Example 2: ARMv8 (64-bit) machine

  "platform": {
    "os": "linux"  
    "arch": “arm”,
    "model": “aarch64”,
  }

These are all the detailled infos a Docker engine would need to push or pull the appropriate image, but they should be mandatory (the labels os and arch at least) for future versions. For amd64 there’s no need to use the label model and we do have the same functionality like today in v1.

All the values could be easily and automatically gathered:
os: from GOOS env
arch: from GOARCH env
model: from uname -m (could be done from GO as well with using a simple kernel syscall, which works on Linux and Android as well)

In this case the logic to select the right image architecture is only necessary to be implemented in the Docker engine for a pull or a push. Even the compatibility logic will only be needed in the engine. So we could be very clean within the Docker registry itself, just storing the platform labels and use these labels for fetching the right images which are matching the requested platform where the Docker engine is running.

Here is an example how to use uname -a from GO: https://github.com/DieterReuter/osarch/blob/master/osarch_linux.go#L10

I did also run some recent tests on different devices (Mac, Linux and even Android) with a small basic implementation in GO. https://github.com/DieterReuter/osarch#results-on-different-devices
This tool could be extended for Windows too and could be improved to a general libosarch package/lib to gather these informations easily (maybe with a reference implementation in GO and C).

As I already said, that's just a rough idea how to determine the platform labels in an easy and elegant way. And if anyone is in the need of more specific labels, like a byte order or ABI, he is free to just add new additional labels in his container runtime. There no need to change the behaviour of the Docker registry.

jlhawn · 2015-09-17T17:44:03Z

Closing in favor of #993 by @estesp

This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

avianbrooks · 2022-06-06T16:49:43Z

Github
Duplicate of # #3654

jlhawn force-pushed the generic_manifest branch 2 times, most recently from e847f76 to fead316 Compare January 14, 2015 18:24

stevvooe added proposal specification Work In Progress labels Jan 14, 2015

stevvooe added this to the Distribution/OpenSprint milestone Jan 14, 2015

tiborvass mentioned this pull request Feb 5, 2015

Attempt to implement #332 - flattening layers moby/moby#8600

Closed

jlhawn force-pushed the generic_manifest branch 2 times, most recently from d18489a to b1b4c13 Compare February 7, 2015 01:33

stevvooe reviewed Feb 7, 2015
View reviewed changes

jlhawn changed the title ~~Generic Distribution Content Manifests~~ doc/spec: Generic Distribution Content Manifests Feb 7, 2015

stevvooe changed the title ~~doc/spec: Generic Distribution Content Manifests~~ doc/spec: generic distribution content manifests Feb 7, 2015

jlhawn self-assigned this Feb 7, 2015

jlhawn force-pushed the generic_manifest branch from b1b4c13 to c28b97d Compare February 9, 2015 17:49

jlhawn added the In Progress label Feb 9, 2015

dmcgowan mentioned this pull request Feb 10, 2015

Tarsum insecurity moby/moby#9719

Closed

titanous reviewed Feb 10, 2015
View reviewed changes

stevvooe mentioned this pull request Feb 11, 2015

Inconsistent Size in image JSON moby/moby#5968

Closed

jlhawn force-pushed the generic_manifest branch 2 times, most recently from 589eb52 to 448a612 Compare February 11, 2015 23:30

dmp42 assigned stevvooe and unassigned jlhawn Jul 23, 2015

stevvooe mentioned this pull request Jul 30, 2015

Proposal: Cross-platform Swarm cluster docker-archive/classicswarm#826

Closed

estesp mentioned this pull request Aug 26, 2015

Multi-arch support for docker daemon registry interface moby/moby#15866

Closed

stevvooe mentioned this pull request Sep 11, 2015

support tag delete via http request #933

Closed

estesp mentioned this pull request Sep 16, 2015

docs/spec: Proposal for generic content distribution manifest #993

Closed

miminar mentioned this pull request Sep 17, 2015

Allow metadata on images to be edited after creation openshift/origin#4666

Merged

jlhawn closed this Sep 17, 2015

aaronlehmann mentioned this pull request Oct 5, 2015

docs/spec: Proposal for new manifest format #1068

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc/spec: generic distribution content manifests #62

doc/spec: generic distribution content manifests #62

jlhawn commented Jan 13, 2015

vbatts commented Jan 14, 2015

jlhawn commented Jan 14, 2015

stevvooe commented Jan 29, 2015

jlhawn commented Jan 29, 2015

stevvooe Feb 7, 2015

jlhawn commented Feb 7, 2015

estesp commented Feb 7, 2015

jlhawn commented Feb 7, 2015

ncdc commented Feb 9, 2015

titanous Feb 10, 2015

jlhawn Feb 10, 2015

titanous Feb 10, 2015

jlhawn Feb 10, 2015

jlhawn Feb 10, 2015

stevvooe Feb 10, 2015

titanous Feb 10, 2015

dmcgowan Feb 10, 2015

stevvooe Feb 10, 2015

harche commented Jul 29, 2015

thaJeztah commented Jul 29, 2015

stevvooe commented Jul 29, 2015

harche commented Jul 29, 2015

DieterReuter commented Aug 5, 2015

jlhawn commented Sep 17, 2015

avianbrooks commented Jun 6, 2022


		- `digest` string

		The base64url-encoded SHA384 digest of the object.

doc/spec: generic distribution content manifests #62

doc/spec: generic distribution content manifests #62

Conversation

jlhawn commented Jan 13, 2015

vbatts commented Jan 14, 2015

jlhawn commented Jan 14, 2015

stevvooe commented Jan 29, 2015

jlhawn commented Jan 29, 2015

Choose a reason for hiding this comment

jlhawn commented Feb 7, 2015

estesp commented Feb 7, 2015

jlhawn commented Feb 7, 2015

ncdc commented Feb 9, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harche commented Jul 29, 2015

thaJeztah commented Jul 29, 2015

stevvooe commented Jul 29, 2015

harche commented Jul 29, 2015

DieterReuter commented Aug 5, 2015

jlhawn commented Sep 17, 2015

avianbrooks commented Jun 6, 2022