New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distribution does not conform to canonical format #1066
Comments
This is unfortunately correct. I do know know the history of this, but manifest signatures are tied to a non-canonical json format. The manifest specification is being replaced (see here), perhaps canonical JSON can be aim for that. |
In the meantime, we should probably remove the erroneous documentation. @dcmgowan , @stevvooe thoughts? |
The manifest specification does not conform to canonical JSON, so when we either 1) change the documentation or 2) have a manifest spec that does produce canonical JSON I would be happy to close this issue. |
I'm not sure I understand what the issue is here. I don't see anything in https://github.com/docker/distribution/blob/master/docs/spec/json.md prohibiting Unicode characters. It says |
The issue is that the document specifies a canonical JSON format which distribution does not conform to (points 3 and 4) |
The current manifest format |
Discussed offline with Derek. Closing. |
Quoting the document:
Shall I understand this in a way that you have created a specification which is in fact not being used? Am going through spec of manifest and it doesn't say anything about encoding, whitespace, key order. On the other hand, it mentions something interesting:
When can we expect proper, not a provisional, manifest? To give you some background, I am working on a code (in python) which should output digest of a provided manifest. In order to do so, I need to understand how distribution calculates manifest. I was quite surprised that digest is being computed from an indented utf8-encoded manifest (when RFC suggests using unicode). |
Note the JSON RFC says:
|
@aaronlehmann does this mean that there is not a canonical format for manifests at all, or simply that the registry does not enforce a format? |
@rbarlow There is no canonical format for manifests. |
Please make sure to read the entire sentence and any included qualifiers. In the case of the manifest, it has a canonical ordering that differs from sorted. In general, you should avoid "round tripping" manifests. De-serialize the contents, but only re-serialize if the data fields have changed. Effectively, the byte contents should only generated once. If this care is not taken, it makes generating digests inconsistent and error prone. |
How do you compute digest then?
I read this as: "Manifests and digests are very fragile and you should NOT play with those." Would be nice if computing digest would be made easier, or having a tool for such job (by tool I mean a binary/executable, which provides output for a given input; not a service).
Yes, but isn't the code which generates manifest and digest part of distribution codebase?
I know what I'm thinking. Here's a snip from dockerfile:
Here's how it looks in manifest:
If you also look at json.org, grammar for strings mentions unicode characters. Therefore what I would expect is this:
Well, actually this:
(I still don't understand why you escape |
I'm sorry if I wasn't completely clear. You are astute in acknowledging that stable hash generation of non-deterministic formats (ie. JSON) is typically fragile. However, this does mean that you cannot "play" with the contents. This just means that one should preserve the bytes and only regenerate on a change. This is how the registry approaches this. It deserializes the content, saves the raw bytes, and reads the appropriate fields. When the actual data is stored, it just uses the raw bytes, so the client's hash and the registry's digest will always match. This is both secure and preserves the stability of the hash without relying on differing libraries agreeing on deterministic generation.
We would be more than happy to accept a contribution for a tool that accomplishes this. The packages available in the distribution project are more than capable of making this straightforward. Please let us know if you need more guidance. |
The question is: what would be a suitable place for such tool? First thing which comes to my mind is docker engine (that's why I submitted this issue back then) since:
We already have a set of tools which acomplish something similar:
So my question is: do you want such tool to be a part of docker-maintained codebase, or is it suppose to be an external thing? |
@TomasTomecek I'm very confused by your request. If you need to calculate digests for a manifest, just implement it in Python. Just make sure to follow the above advice (make sure to spend some time carefully reading my responses). If you need a tool to support that, go ahead and implement it in Go. If you want to submit it back to a Docker project, we would welcome it. |
My last response has only single message:
Is there a chance that you will merge it? |
@TomasTomecek Yes, if it is well done and fits a need. My confusion comes from the need of a CLI tool. Generating manifests and calculating their digest should be trivial in Python. |
@stevvooe here it is: moby/moby#17402 |
Right now when I fetch manifest via curl:
it contains, at the same time, unicode-specified characters and utf8-encoded charactershttps://docs.docker.com/v1.6/registry/spec/json/https://github.com/docker/distribution/blob/master/docs/spec/json.md
Edited: updated spec link, striked obsolete text
The text was updated successfully, but these errors were encountered: