-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop allowing invalid manifest lists which include blobs in their list in version 3.0.0 #3452
Comments
I think "will" is probably a bit strong here, opencontainers/artifacts#41 is still in discussion and I don't believe there's an issue/PR in this project for supporting this yet. I believe this should be our intent for a 3.x.x release, though. |
I don't think thats a sufficient transition - just saying v3.0.0 will have OCI artefact support doesn't give all the users (not just buildx it seems) sufficient time to adapt to something that has worked for many years. Saying that "there are alternatives" doesn't give users time to write and test code, for users to upgrade and so on. It would also be good if OCI could clarify the spec. |
This is something I would personally like to be strongly pushing for before any action is taken. This isn't the only "ambiguity" we have had to deal with when it comes to OCI spec. Someone who got involved in this [and other] project(s) later than the spec had been discussed does not have the context of the original intentions and frankly, getting it by endless browsing of the [closed] issues and PRs solves nothing. Getting the OCI spec clarified is the best step forward before we break more things for people than we fix. |
v3.0.0 should introduce breaking changes. I don't want to discourage adoption with dramatic changes by any means, but @joaodrp has done some investigation on support for this and out of DockerHub, ACR (Azure), GHCR (GitHub), ECR (Amazon), Artifactory, Quay, GCR (Google), and GitLab, only DockerHub, ACR, GHCR, and GitLab allow these cache images to be pushed. So these Image Indexes are already something without broad support. |
@brackendawson, thanks for creating this issue. I want to share my opinion on a few things:
|
Regarding the buildkit/buildx issue, I've been having a long conversation about that in docker/buildx#173, starting at docker/buildx#173 (comment), so I won't extend myself too much on that. I'd like to know if the backward compatible change that I proposed in docker/buildx#173 (comment) is viable or not though. I'm approaching this in a constructive way, not pointing fingers or trying to blame anyone. To me, this seems to be the path of least resistance to get instant compatibility with most registries (if not all) while preserving the UX. |
With regard to what an OCI Image Index should contain within the manifests array, I think this is clear enough to mean that these should be manifests — it links to the OCI Image Manifest Specification when it says this and it doesn't mention other kinds of objects that could be included. I think this is reasonably narrow, it shouldn't be necessary to enumerate everything you shouldn't put there. It doesn't restrict the mediatypes of those manifests, but that does not mean this array is open to all types of objects which could possibly be associated with a descriptor. |
I voted 👎 since the goal of this implementation was always to be the superset of what is supported by OCI with stricter behavior gated by configuration. The supported types in OCI uses |
The "unknown" vs. "unexpected" distinction is key here. "unexpected" does not appear in the current OCI Image Index spec. In regards to "unknown", I believe this is related to the following paragraph:
So registries should not refuse index references with media types that they don't know. However, using the buildkit indexes as an example (that's the only one I know): {
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:136482bf81d1fa351b424ebb8c7e34d15f2c5ed3fc0b66b544b8312bda3d52d9",
"size": 2427917,
"annotations": {
"buildkit/createdat": "2021-07-02T17:08:09.095229615Z",
"containerd.io/uncompressed": "sha256:3461705ddf3646694bec0ac1cc70d5d39631ca2f4317b451c01d0f0fd0580c90"
}
},
...
{
"mediaType": "application/vnd.buildkit.cacheconfig.v0",
"digest": "sha256:7aa23086ec6b7e0603a4dc72773ee13acf22cdd760613076ea53b1028d7a22d8",
"size": 1654
}
]
} In this case, the So, the fact that we find these references within Regardless, reading the spec, we stop way before any mentions to "unknown" media types, on the first line, when the purpose of an index is described, and it says:
So if an index references anything besides manifests, it's not compliant. We don't even need to get to the point where we look or not at media types. In this case (buildkit), this is not about the OCI spec, though. Before the OCI indexes were adopted, Docker manifest lists were being used, and their spec is even more strict:
So this was already a non-conformance issue before the OCI media types adoption. |
What is the migration plan for clients with an immediate cutover on 3.0 (assuming artifact-spec gets approved and implemented)? Do clients need to support both and automatically detect which registry they are talking to? What about content in the registry that was already pushed as an OCI index? Will there be a migration plan for content between 2.x and 3.x registries? As a user I'd like to see a migration plan that doesn't break workflows during the migration. Also my reading of the image-spec seems to be different from many of the distribution maintainers. I feel the OCI index entries with a media type that is not a known manifest should just be ignored. There seems to be a desire for the registry to do something with every entry in the index, which is counter to the image spec:
Criticisms I've heard of my suggestion seem to fall into:
The knowledge that GC can and should happen on those blobs should be encouragement for many to avoid depending on this for their own "data storage in a registry" solutions. That should help prevent this behavior from sprawling to other registry client projects. |
This strikes me as very strange. Allowing these OCI Image Indexes to be pushed up and returning a success status, then (possibly) deleting all of the referenced objects out from under it at some unknown point in the future, is really strange to me. Currently, the garbage collector doesn't delete unknown mediatypes, and I don't believe that we should update it to do so.
It's reasonable to expect a manifest list to be a list of manifests, and it's reasonable for us to enforce than moving forward with v3.0.0. We perform validation on manifest pushes, so there's clearly some intent in the design of distribution to reject manifests which are invalid.
We'd need to determine all the breaking changes we're going to in 3.0.0 before we can really talk about migration plans. Personally, I think we need to deal with the existing manifests lists with layer references, regardless of whether we allow pushes of this nature in the future. |
It's the same behavior that already exists today if we upload a blob and don't upload an associated manifest. The blob upload succeeds, but the GC can remove that blob in the future. It's my responsibility as a client to ensure that blob has a valid reference to avoid the GC. To me this is an expected behavior. I also hesitate to use GC behavior as a leading reason to decide what to accept for push since GC doesn't run automatically in
It is, but it's not known to the tooling that is looking for manifests. My concern with enforcing validity on the mediaType field of an index entry is that we may be breaking future use cases in ways the spec described as a point of extensibility. I'd also like to see this fixed with artifact-spec first, rather than focusing on blocking the existing use cases first. Otherwise we are saying "don't do that" without a better option of what should be done. |
I don't think this is the case. Let's look at a few examples. Helm{
"schemaVersion": 2,
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:31fb454efb3c69fafe53672598006790122269a1b3b458607dbe106aba7059ef",
"size": 354,
"annotations": {
"org.opencontainers.image.ref.name": "localhost:5000/myrepo/mychart:2.7.0"
}
}
]
} {
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.cncf.helm.config.v1+json",
"digest": "sha256:8ec7c0f2f6860037c19b54c3cfbab48d9b4b21b485a93d87b64690fdb68c2111",
"size": 117
},
"layers": [
{
"mediaType": "application/tar+gzip",
"digest": "sha256:1b251d38cfe948dfc0a5745b7af5ca574ecb61e52aed10b19039db39af6e1617",
"size": 2487
}
]
} WASM{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.wasm.config.v1+json",
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
"size": 2
},
"layers": [
{
"mediaType": "application/vnd.wasm.content.layer.v1+wasm",
"digest": "sha256:4c7915b4c1f9b0c13f962998e4199ceb00db39a4a7fa4554f40ae0bed83d9510",
"size": 1624962
}
]
} Homebrew{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:6324fea418ad479592ef3c21dafa2a6ffc3188d92426420f6257f5b32a7c0841",
"size": 225
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:4f6bf2e51a4952f4cc83760c1c732a5b0742ab693e977ae88df71f655b9dee7a",
"size": 16236572,
"annotations": {
"org.opencontainers.image.title": "rclone--1.55.1.catalina.bottle.tar.gz"
}
}
],
"annotations": {
...
}
} None of these are container images or had to wait for an artifacts spec, and none are using an index to aggregate non-manifest references. I think we'd really like to understand why others can't do the same and the technical reasons behind it. |
The examples there are all for a single layer. It sounds like the concerns from the buildx maintainers were that they would violate the manifest definition of layers with a multi-layer manifest:
I think sorting this one out will be best handled on the OCI side by clarifying the specs, and then distribution/distribution can follow the result. Today's OCI call discussed it, and it's included on their agenda for next week to give more people time to review the issue and attend. Some key discussion points were:
|
I raised an issue to discuss the Buildkit use case: moby/buildkit#2251 |
I'm a 👍 to solve this in a more consistent way, such as (moby/buildkit#2251). That said, those that run instances of this project shouldn't be blocked, so I'm torn in not wanting to suggest gitlab or others running distribution instances shouldn't find a way to meet their customers needs and driving a consistent solution. |
1 similar comment
I'm a 👍 to solve this in a more consistent way, such as (moby/buildkit#2251). That said, those that run instances of this project shouldn't be blocked, so I'm torn in not wanting to suggest gitlab or others running distribution instances shouldn't find a way to meet their customers needs and driving a consistent solution. |
This issue exists to document the maintainers' vote or objections on making this change, please reply with your 👍 .
Since v2.3.0 (underlying bug existed since v2.1.1), distribution incorrectly allows OCI manifest lists and fat manifests to be uploaded which reference digests uploaded as blobs rather than only those uploaded as manifests. The buildx client has made use of this oversight to upload binary build caches as blobs and then reference them in an OCI manifest list. This works in distribution registries because of a workaround set to be removed in #3365, but some of those registries will still GC the blob.
We want to merge #3365 to fix a separate 500 error on a bad request where users try to get a manifest using the digest of a blob. It was discussed what to about this knowing about this buildx behaviour on the 6th July 2021 weekly call and we think that this should be included in v3.0.0. A distribution registry upgrading to this version will break the build pipelines of its users who:
We think targeting v3.0.0 is appropriate because:
willshould come with OCI artefact support, which would be the correct way for buildx to store the cache blobs.The text was updated successfully, but these errors were encountered: