docs/spec: Proposal for new manifest format #1068

aaronlehmann · 2015-10-05T21:40:13Z

This is a follow-on to PR #62, and it borrows much of the format
from #993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.

The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.

Signed-off-by: Aaron Lehmann aaron.lehmann@docker.com

aaronlehmann · 2015-10-05T21:46:36Z

cc @stevvooe @estesp @tonistiigi @mattmoor

mattmoor · 2015-10-05T22:25:35Z

docs/spec/manifest-v2-2.md

+- **`layers`** *array*
+
+    The layer list is ordered starting from the base image (opposite order of schema1).
+    The `baseLayer` label is intended to improve human comprehensibility of the layer


Perhaps it would be better if the specification left room for this kind of unstructured annotation without calling them out? This way, different implementations can add/remove information intended for humans / UI without a schema change?

For examples, in the future we might want to know the "top" layer, or per-layer what it's Dockerfile directive was, or...

Maybe this should be "comment": "base layer"?

I'm fine with that, or we can avoid calling it out in the spec. The intention was to clarify that the label in the example doesn't have any semantic meaning to the engine.

dmcgowan · 2015-10-07T20:39:04Z

We should add a compatibility section at the end which will suggest how we will support clients.

The original suggestion is that clients which support this would provide an "Accept" header and the registry will now set the "Content-Type". When either of these headers are missing the assumption made by the other endpoint would be that schema version 1 is used.

waghpooja · 2015-10-07T20:51:19Z

docs/spec/manifest-v2-2.md

+
+- **`layers`** *array*
+
+    The layer list is ordered starting from the base image (opposite order of schema1).


What is the motivation behind reversing this order from the one in schema1?

The ordering in schema1 was awkward because engines ended up having to process it in reverse. The base layer needs to be added to the graph first, but this was at the end of the list in schema1.

estesp · 2015-10-13T20:14:11Z

I like this as a cleaner alternate to #993. I assume we still have some questions to answer based on the questions in-line above from @mattmoor and others, but given we can find answers to those, I'm LGTM on this spec design for supporting multi-arch.

bowlofeggs · 2015-10-13T20:38:42Z

Will this format allow there to be a canonical format for manifests so that there is exactly one way to calculate the digest of a manifest? @TomasTomecek has been working on writing a Python library to calculate the digest of a manifest which has been tricky due to manifests not having a "correct" way to be formatted (things like spacing, ordering of the keys, how to remove signatures, etc.)

The signature is particularly troublesome, since it is included inside the manifest. Most crypto signing libraries transmit the signature outside of the data that is being signed instead of inside it (think of PGP). I think that design would really make digest validation much easier. Another option is to consider a signed manifest and the same manifest with different or no signatures not to be the "same" so they get different digests.

mattmoor · 2015-10-13T21:18:41Z

@rbarlow Very big +1, this drove us nuts because there is also this, which would make things very straightforward.

ncdc · 2015-10-14T13:42:53Z

docs/spec/manifest-v2-2.md

+- **`schemaVersion`** *int*
+
+  This field specifies the image manifest schema version as an integer. A
+  manifest list uses version `3` to distinguish it from an image manifest.


This seems a bit strange to me. Doesn't this imply that if you want to rev the regular manifest version, the next available value would be 4?

I brought this up in the DWG meeting last week as well.

I think that if these objects are going to have perpetually distinct schema versions, we should split them into distinct API objects (with their own schema number to boot).

I've been trying to articulate this for a couple meetings now, probably badly. I will see if I can put together a proposal that better articulates it. I think where we left things last week was to wait for Steve to weigh in this week.

stevvooe · 2015-10-14T20:26:07Z

@rbarlow The content used to calculate the digest of a manifest is set by the original generator of the raw bytes. This avoids the need for specifications like Canonical JSON that purport to bring stable generation when they really don't. There are two many cross-language variations JSON generation to always get this right. This is handled simply by saving the raw bytes of the serialized manifest. This is also flexible in that it doesn't prevent implementations from generating deterministic manifests but doesn't require them where they are not needed.

bowlofeggs · 2015-10-14T22:27:05Z

@stevvooe The problem is that signatures get inserted into the manifest, changing it. If a client needs to calculate the digest of a retrieved manifest that has signatures, that client must carefully remove the signatures and try to guess how the manifest was formatted when its digest was calculated. Since there is no canonical format for the manifest, there isn't a guarantee that doing this is possible.

If the signatures were instead stored separately of the manifest, there would be no need for any client to perform any manipulation at all on the manifest to calculate the digest. This is how PGP emails work, for example. The data being signed and the signature are separate, rather than having the signature inside the data that is being signed. This makes signature verification straightforward. The Docker manifest's inclusion of the signatures within the manifest itself make it difficult to calculate the digest of the manifest and also make signature verification difficult.

I am proposing that the digest and the signatures be separated from the manifest so that a client can calculate a plain old checksum on exactly the digest bits that were retrieved without having to do any manipulation at all. For example, something like this would be viable:

{'key1': 'value1', 'key2': 'value2', etc...}
------- BEGIN DOCKER SIGNATURES -------
[SIGNATURE_1, SIGNATURE_2, etc.]
------- END DOCKER SIGNATURES -------

In the example above, the client would strip the signature block and run the digest algorithm on the exact bits that are left over (the JSON). This way the client doesn't have to manipulate the JSON at all. If the original manifest used spaces and newlines or did it all in one line, the client calculates the checksum the same way - on whatever bits were received. This makes digest validation so much simpler, and it also aids in signature verification which suffers from the same problem.

Does that make sense?

aaronlehmann · 2015-10-14T22:29:56Z

@rbarlow: This revision of the manifest won't include in-line signatures. Signatures will be handled outside the manifest by signing the manifest's digest, for example with Docker Content Trust.

This will be much less confusing, and I think it resolves the concern about canonical vs. non-canonical JSON.

bowlofeggs · 2015-10-14T22:39:35Z

@aaronlehmann That is excellent news! Any hints on what sort of time frame we can expect this change to be included in Docker? I and some others are working on integrating Pulp with Docker's v2 API, and this digest calculation is currently an issue for us. Knowing a scale of time might help inform our decision about what we should do in the meantime.

aaronlehmann · 2015-10-14T22:45:51Z

@rbarlow: Support for this new manifest format is planned for Docker 1.10, which will ship in a few months. I'm not sure what your compatibility plans are, but one thing I'd keep in mind is that you would need to handle the old manifest format if you want to interoperate with engines older than Docker 1.10, or registries that haven't been updated to support the new format.

I'm going to add some details on backward compatibility when I update this proposal.

aaronlehmann · 2015-10-14T23:20:06Z

I've updated the proposal with the following changes:

s/mediaType/mediatype/
schemaVersion 2 used throughout, instead of a combination of 2 and 3
The label field has been replaced by a more specific way of specifying a platform. The platform key maps to an object containing fields for the architecture, OS, CPU variant, and required CPU features.
Added a backward compatibility section, explaining how the registry will support versions of the engine which don't support the new manifest format.

PTAL

aaronlehmann · 2015-12-16T17:42:41Z

I was only thinking that the uploads might be ordered in a way where the dependencies can't be checked immediately. This probably won't be a problem in practice. BTW, the code that's in progress for the Docker registry validates dependencies referenced by the manifest, with an exception for the proxy mode.

harche · 2015-12-17T15:10:57Z

@aaronlehmann @stevvooe @estesp Should I create a separate issue to discuss how the CLI going to look like for fat manifests?

stevvooe · 2015-12-17T22:50:03Z

@harche Yes. This would probably make the most sense. We can have the proposal stage in the distribution project for now and dispatch to docker/docker depending on where the actual feature will land.

aaronlehmann · 2015-12-18T01:11:17Z

I've updated this proposal to make a few minor changes:

Include the content type in-band in both image manifests and manifest lists. This makes it easy to distinguish between image manifests and manifests lists in case there's no content type available out-of-band.
Changed mediatype to mediaType, because this is the way the distribution codebase already serializes Descriptor objects, and I don't want to deviate from that. If people prefer to change this serialization (and it's safe to do so), I can change it back.

I feel like this is ready to be merged. PTAL.

I have working code that implements the formats in this proposal, which I'll open as a PR against docker/distribution once #1268 is merged.

estesp · 2015-12-18T04:17:02Z

Thanks @aaronlehmann ; looks good and glad to see the other PRs showing up with implementation. I'm "non-maintainer LGTM" on this one! :)

RichardScothern · 2015-12-18T05:49:12Z

LGTM

On Thu, Dec 17, 2015 at 8:17 PM, Phil Estes notifications@github.com
wrote:

Thanks @aaronlehmann https://github.com/aaronlehmann ; looks good and
glad to see the other PRs showing up with implementation. I'm
"non-maintainer LGTM" on this one! :)

—
Reply to this email directly or view it on GitHub
#1068 (comment).

stevvooe · 2015-12-18T19:30:29Z

Include the content type in-band in both image manifests and manifest lists. This makes it easy to distinguish between image manifests and manifests lists in case there's no content type available out-of-band.

Content-type should always be in-band, from the trusted source. Content-type returned by registries (out-of-band) should not be trusted. We might want to say this explicitly, as this is important for security.

Changed mediatype to mediaType, because this is the way the distribution codebase already serializes Descriptor objects, and I don't want to deviate from that. If people prefer to change this serialization (and it's safe to do so), I can change it back.

Please don't. Leave as "mediatype". The fact that descriptor has camelCase is unintentional. In general, keys should be case-insensitive.

aaronlehmann · 2015-12-18T19:32:45Z

Please don't. Leave as "mediatype". The fact that descriptor has camelCase is unintentional. In general, keys should be case-insensitive.

Would you like me to change it to mediatype in descriptor as well? My implementation of the schema2 manifest uses distribution.Descriptor directly, and I would like to avoid forking it.

stevvooe · 2015-12-18T19:45:24Z

Would you like me to change it to mediatype in descriptor as well? My implementation of the schema2 manifest uses distribution.Descriptor directly, and I would like to avoid forking it

@aaronlehmann I don't think we use that anywhere, so go ahead.

aaronlehmann · 2015-12-18T19:54:44Z

Reverted the mediatype change; will change the Descriptor serialization to match in #1281.

aaronlehmann · 2015-12-18T19:57:45Z

@stevvooe: Changing the Descriptor serialization would change affect the notification serialization format. Are we confident enough that implementation will be case-insensitive to change the casing there?

stevvooe · 2015-12-18T20:02:19Z

@aaronlehmann I guess we can leave it alone and change to mediaType in this specification. We have no guarantee that consumers wrote appropriately agile code to handle case-insensitivity. Sorry for the confusion.

aaronlehmann · 2015-12-18T20:05:52Z

Changed back. No worries.

This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

dmp42 · 2015-12-19T00:24:57Z

LGTM

stevvooe · 2015-12-19T00:25:00Z

LGTM

docs/spec: Proposal for new manifest format

Updates broken links for distribution#1068

GordonTheTurtle added the status/1-design-review label Oct 5, 2015

mattmoor reviewed Oct 5, 2015
View reviewed changes

RichardScothern mentioned this pull request Oct 6, 2015

Distribution does not conform to canonical format #1066

Closed

3 tasks

waghpooja reviewed Oct 7, 2015
View reviewed changes

estesp mentioned this pull request Oct 13, 2015

docs/spec: Proposal for generic content distribution manifest #993

Closed

ncdc reviewed Oct 14, 2015
View reviewed changes

aaronlehmann force-pushed the manifest-proposal branch from dae2a31 to b15e4e1 Compare October 14, 2015 23:17

aaronlehmann force-pushed the manifest-proposal branch 5 times, most recently from f2bc72f to c68abe3 Compare October 16, 2015 16:57

aaronlehmann force-pushed the manifest-proposal branch from c68abe3 to e6e2e2e Compare December 18, 2015 01:08

aaronlehmann mentioned this pull request Dec 18, 2015

Implement schema2 manifest formats #1281

Merged

aaronlehmann force-pushed the manifest-proposal branch from e6e2e2e to cf8135a Compare December 18, 2015 19:54

aaronlehmann force-pushed the manifest-proposal branch from cf8135a to e8d6dbd Compare December 18, 2015 20:05

aaronlehmann force-pushed the manifest-proposal branch from e8d6dbd to 2e3f493 Compare December 18, 2015 20:26

stevvooe added a commit that referenced this pull request Dec 19, 2015

Merge pull request #1068 from aaronlehmann/manifest-proposal

ed3346e

docs/spec: Proposal for new manifest format

stevvooe merged commit ed3346e into distribution:master Dec 19, 2015

thaJeztah pushed a commit to thaJeztah/distribution that referenced this pull request Apr 22, 2021

Updates broken links for distribution#1068

f986183

thaJeztah pushed a commit to thaJeztah/distribution that referenced this pull request Apr 22, 2021

Merge pull request distribution#1110 from joaofnfernandes/broken-links

011e7ff

Updates broken links for distribution#1068

thaJeztah mentioned this pull request Nov 19, 2021

Remove support for pulling v2 schema1 moby/moby#42300

Draft

thaJeztah pushed a commit to thaJeztah/distribution that referenced this pull request Jan 19, 2022

Updates broken links for distribution#1068

042cf0f

thaJeztah pushed a commit to thaJeztah/distribution that referenced this pull request Jan 19, 2022

Merge pull request distribution#1110 from joaofnfernandes/broken-links

6bc4951

Updates broken links for distribution#1068

thaJeztah mentioned this pull request Nov 25, 2022

manifest/schema1: mark docker manifest v2, schema 1 deprecated #3804

Merged

dylanrhysscott pushed a commit to digitalocean/docker-distribution that referenced this pull request Jan 5, 2023

Updates broken links for distribution#1068

ef7a75c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/spec: Proposal for new manifest format #1068

docs/spec: Proposal for new manifest format #1068

aaronlehmann commented Oct 5, 2015

aaronlehmann commented Oct 5, 2015

mattmoor Oct 5, 2015

aaronlehmann Oct 6, 2015

dmcgowan commented Oct 7, 2015

waghpooja Oct 7, 2015

aaronlehmann Oct 7, 2015

estesp commented Oct 13, 2015

bowlofeggs commented Oct 13, 2015

mattmoor commented Oct 13, 2015

ncdc Oct 14, 2015

mattmoor Oct 14, 2015

stevvooe commented Oct 14, 2015

bowlofeggs commented Oct 14, 2015

aaronlehmann commented Oct 14, 2015

bowlofeggs commented Oct 14, 2015

aaronlehmann commented Oct 14, 2015

aaronlehmann commented Oct 14, 2015

aaronlehmann commented Dec 16, 2015

harche commented Dec 17, 2015

stevvooe commented Dec 17, 2015

aaronlehmann commented Dec 18, 2015

estesp commented Dec 18, 2015

RichardScothern commented Dec 18, 2015

stevvooe commented Dec 18, 2015

aaronlehmann commented Dec 18, 2015

stevvooe commented Dec 18, 2015

aaronlehmann commented Dec 18, 2015

aaronlehmann commented Dec 18, 2015

stevvooe commented Dec 18, 2015

aaronlehmann commented Dec 18, 2015

dmp42 commented Dec 19, 2015

stevvooe commented Dec 19, 2015


		- `layers` array

		The layer list is ordered starting from the base image (opposite order of schema1).

docs/spec: Proposal for new manifest format #1068

docs/spec: Proposal for new manifest format #1068

Conversation

aaronlehmann commented Oct 5, 2015

aaronlehmann commented Oct 5, 2015

mattmoor Oct 5, 2015

Choose a reason for hiding this comment

aaronlehmann Oct 6, 2015

Choose a reason for hiding this comment

dmcgowan commented Oct 7, 2015

waghpooja Oct 7, 2015

Choose a reason for hiding this comment

aaronlehmann Oct 7, 2015

Choose a reason for hiding this comment

estesp commented Oct 13, 2015

bowlofeggs commented Oct 13, 2015

mattmoor commented Oct 13, 2015

ncdc Oct 14, 2015

Choose a reason for hiding this comment

mattmoor Oct 14, 2015

Choose a reason for hiding this comment

stevvooe commented Oct 14, 2015

bowlofeggs commented Oct 14, 2015

aaronlehmann commented Oct 14, 2015

bowlofeggs commented Oct 14, 2015

aaronlehmann commented Oct 14, 2015

aaronlehmann commented Oct 14, 2015

aaronlehmann commented Dec 16, 2015

harche commented Dec 17, 2015

stevvooe commented Dec 17, 2015

aaronlehmann commented Dec 18, 2015

estesp commented Dec 18, 2015

RichardScothern commented Dec 18, 2015

stevvooe commented Dec 18, 2015

aaronlehmann commented Dec 18, 2015

stevvooe commented Dec 18, 2015

aaronlehmann commented Dec 18, 2015

aaronlehmann commented Dec 18, 2015

stevvooe commented Dec 18, 2015

aaronlehmann commented Dec 18, 2015

dmp42 commented Dec 19, 2015

stevvooe commented Dec 19, 2015