New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/spec: Proposal for new manifest format #1068
Conversation
- **`layers`** *array* | ||
|
||
The layer list is ordered starting from the base image (opposite order of schema1). | ||
The `baseLayer` label is intended to improve human comprehensibility of the layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be better if the specification left room for this kind of unstructured annotation without calling them out? This way, different implementations can add/remove information intended for humans / UI without a schema change?
For examples, in the future we might want to know the "top" layer, or per-layer what it's Dockerfile directive was, or...
Maybe this should be "comment": "base layer"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that, or we can avoid calling it out in the spec. The intention was to clarify that the label in the example doesn't have any semantic meaning to the engine.
We should add a compatibility section at the end which will suggest how we will support clients. The original suggestion is that clients which support this would provide an "Accept" header and the registry will now set the "Content-Type". When either of these headers are missing the assumption made by the other endpoint would be that schema version 1 is used. |
|
||
- **`layers`** *array* | ||
|
||
The layer list is ordered starting from the base image (opposite order of schema1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation behind reversing this order from the one in schema1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ordering in schema1 was awkward because engines ended up having to process it in reverse. The base layer needs to be added to the graph first, but this was at the end of the list in schema1.
Will this format allow there to be a canonical format for manifests so that there is exactly one way to calculate the digest of a manifest? @TomasTomecek has been working on writing a Python library to calculate the digest of a manifest which has been tricky due to manifests not having a "correct" way to be formatted (things like spacing, ordering of the keys, how to remove signatures, etc.) The signature is particularly troublesome, since it is included inside the manifest. Most crypto signing libraries transmit the signature outside of the data that is being signed instead of inside it (think of PGP). I think that design would really make digest validation much easier. Another option is to consider a signed manifest and the same manifest with different or no signatures not to be the "same" so they get different digests. |
- **`schemaVersion`** *int* | ||
|
||
This field specifies the image manifest schema version as an integer. A | ||
manifest list uses version `3` to distinguish it from an image manifest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit strange to me. Doesn't this imply that if you want to rev the regular manifest version, the next available value would be 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought this up in the DWG meeting last week as well.
I think that if these objects are going to have perpetually distinct schema versions, we should split them into distinct API objects (with their own schema number to boot).
I've been trying to articulate this for a couple meetings now, probably badly. I will see if I can put together a proposal that better articulates it. I think where we left things last week was to wait for Steve to weigh in this week.
@rbarlow The content used to calculate the digest of a manifest is set by the original generator of the raw bytes. This avoids the need for specifications like Canonical JSON that purport to bring stable generation when they really don't. There are two many cross-language variations JSON generation to always get this right. This is handled simply by saving the raw bytes of the serialized manifest. This is also flexible in that it doesn't prevent implementations from generating deterministic manifests but doesn't require them where they are not needed. |
@stevvooe The problem is that signatures get inserted into the manifest, changing it. If a client needs to calculate the digest of a retrieved manifest that has signatures, that client must carefully remove the signatures and try to guess how the manifest was formatted when its digest was calculated. Since there is no canonical format for the manifest, there isn't a guarantee that doing this is possible. If the signatures were instead stored separately of the manifest, there would be no need for any client to perform any manipulation at all on the manifest to calculate the digest. This is how PGP emails work, for example. The data being signed and the signature are separate, rather than having the signature inside the data that is being signed. This makes signature verification straightforward. The Docker manifest's inclusion of the signatures within the manifest itself make it difficult to calculate the digest of the manifest and also make signature verification difficult. I am proposing that the digest and the signatures be separated from the manifest so that a client can calculate a plain old checksum on exactly the digest bits that were retrieved without having to do any manipulation at all. For example, something like this would be viable:
In the example above, the client would strip the signature block and run the digest algorithm on the exact bits that are left over (the JSON). This way the client doesn't have to manipulate the JSON at all. If the original manifest used spaces and newlines or did it all in one line, the client calculates the checksum the same way - on whatever bits were received. This makes digest validation so much simpler, and it also aids in signature verification which suffers from the same problem. Does that make sense? |
@rbarlow: This revision of the manifest won't include in-line signatures. Signatures will be handled outside the manifest by signing the manifest's digest, for example with Docker Content Trust. This will be much less confusing, and I think it resolves the concern about canonical vs. non-canonical JSON. |
@aaronlehmann That is excellent news! Any hints on what sort of time frame we can expect this change to be included in Docker? I and some others are working on integrating Pulp with Docker's v2 API, and this digest calculation is currently an issue for us. Knowing a scale of time might help inform our decision about what we should do in the meantime. |
@rbarlow: Support for this new manifest format is planned for Docker 1.10, which will ship in a few months. I'm not sure what your compatibility plans are, but one thing I'd keep in mind is that you would need to handle the old manifest format if you want to interoperate with engines older than Docker 1.10, or registries that haven't been updated to support the new format. I'm going to add some details on backward compatibility when I update this proposal. |
dae2a31
to
b15e4e1
Compare
I've updated the proposal with the following changes:
PTAL |
f2bc72f
to
c68abe3
Compare
I was only thinking that the uploads might be ordered in a way where the dependencies can't be checked immediately. This probably won't be a problem in practice. BTW, the code that's in progress for the Docker registry validates dependencies referenced by the manifest, with an exception for the proxy mode. |
@aaronlehmann @stevvooe @estesp Should I create a separate issue to discuss how the CLI going to look like for fat manifests? |
@harche Yes. This would probably make the most sense. We can have the proposal stage in the distribution project for now and dispatch to docker/docker depending on where the actual feature will land. |
c68abe3
to
e6e2e2e
Compare
I've updated this proposal to make a few minor changes:
I feel like this is ready to be merged. PTAL. I have working code that implements the formats in this proposal, which I'll open as a PR against |
Thanks @aaronlehmann ; looks good and glad to see the other PRs showing up with implementation. I'm "non-maintainer LGTM" on this one! :) |
LGTM On Thu, Dec 17, 2015 at 8:17 PM, Phil Estes notifications@github.com
|
Content-type should always be in-band, from the trusted source. Content-type returned by registries (out-of-band) should not be trusted. We might want to say this explicitly, as this is important for security.
Please don't. Leave as "mediatype". The fact that descriptor has camelCase is unintentional. In general, keys should be case-insensitive. |
Would you like me to change it to |
@aaronlehmann I don't think we use that anywhere, so go ahead. |
e6e2e2e
to
cf8135a
Compare
Reverted the |
@stevvooe: Changing the |
@aaronlehmann I guess we can leave it alone and change to |
cf8135a
to
e8d6dbd
Compare
Changed back. No worries. |
This is a follow-on to PR distribution#62, and it borrows much of the format from distribution#993, but uses specific formats for the image manifest and manifest list (fat manifest) instead of a combined generic format. The intent of this proposed manifest format is to allow multi-arch, and allow for full content-addressability of images in the Docker engine. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
e8d6dbd
to
2e3f493
Compare
LGTM |
1 similar comment
LGTM |
docs/spec: Proposal for new manifest format
Updates broken links for distribution#1068
Updates broken links for distribution#1068
This is a follow-on to PR #62, and it borrows much of the format
from #993, but uses specific formats for the image manifest and manifest
list (fat manifest) instead of a combined generic format.
The intent of this proposed manifest format is to allow multi-arch, and
allow for full content-addressability of images in the Docker engine.
Signed-off-by: Aaron Lehmann aaron.lehmann@docker.com