New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oci buildcache: handle pagination of tags #43136
oci buildcache: handle pagination of tags #43136
Conversation
fyi @zackgalbreath |
cc @haampie |
Notice that gitlab is not following the OCI spec, they are supposed to return all. It could be that their registry is a Docker registry, which may or may not behave differently from an OCI registry. If that is the case, then check if you can infer from the response headers that it is a Docker not OCI registry, and only then use their Link response header if it is documented (where potentially you can assume that it does not follow the Link header RFC either...) |
Incredible: opencontainers/distribution-spec#461 (comment) "missed in review" how can you miss basic things like this in a spec. |
Well, now I've caught up a bit on the discussion regarding how the spec handles tag listing. Maybe we should wait to proceed with this until that gets ironed out? |
So my TL;DR to see if we're on the same page: The OCI spec (accidentally) omits a key part abou pagination in case of many tags. Right now it says the registry should not do pagination unless the user sets a limit. The current Spack implementation conforms to the spec (we don't set a limit). In practice registries implement pagination by default because they wanna avoid possible denial of service attacks with large requests. How they implement it is registry specific: some (like whatever gitlab uses) set A strategy that still conforms to the spec is to request with |
@haampie Following up on our slack conversation this morning: TL;DRgitlab, github, and dockerhub all use Some extra detailsI couldn't find any evidence that quay.io actually implements the oci distribution spec, to me it appears they have their own custom rest api, which as you mentioned, returns json objects for the list tags functionality, where each response looks like:
So I don't think we should take inspiration from quay.io here, since it seems the odd one out. But let me know if you disagree. However, I tested gitlab, dockerhub, and github container registry, and discovered they all use For dockerhub, I learned you can't actually use the registry api to query tags for the "official" images, like e.g. If you know of a "non-official" image on dockerhub with a lot of tags, I'd like to list the tags and see what I get without a limiting query, but I think at this point, it's clear some of the major OCI registries use |
Okay, I'm fine with the Also add a test |
To me it looks like not, unless
|
Hm okay. Can run unquote regardless, it's a no-op in that case. In principle it's not necessary because From the docs notice we claim to support quay.io, so need to deal with it separately. Maybe take inspiration from here, it's also link header based. |
Any news here @scottwittenburg? |
I didn't have the time to add support for quay.io at the time, so I set this aside. I'm hoping to pick it up again in the next week or so. |
d5027ce
to
1a8ee18
Compare
I need this myself for https://github.com/spack/github-actions-buildcache, so I've pushed a change for improved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-approving, can you confirm this is OK for you @scottwittenburg?
This fixes an issue where ghcr, gitlab and possibly other container registries paginate tags by default, which violates the OCI spec v1.0, but is common practice (the spec was broken itself). After this commit, you can create build cache indices of > 100 specs on ghcr. Co-authored-by: Harmen Stoppels <me@harmenstoppels.nl>
Setting up an example ci pipeline on gitlab.com, I discovered that
registry.gitlab.com
only returns 100 tags at a time, so my oci binary cache index only ever contained the first 100 specs alphabetically.This PR updates the mock oci registry to paginate results as described in the oci distribution spec, and adds a
list_tags
method to theoci.py
module to handle the paginated results.