Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
"no supported platform found in manifest list" / "no matching manifest for XXX in the manifest list entries" #3835
TLDR: Not all architectures are created equal, but perhaps even more importantly, not all build servers we have access to are equal in performance, power, or ability to process builds reliably.
Important: Please do not post here with reports of individual image issues -- we're aware of the overall problem, and this issue is a discussion of solving it generally. Off-topic comments will be deleted.
When we merge an update PR to https://github.com/docker-library/official-images, it triggers Jenkins build jobs over in https://doi-janky.infosiftr.net/job/multiarch/ (see #2289 for more details on our multiarch approach).
Sometimes, we'll have non-
Thus, manifest lists under the
Our current method for combating the main facet of this problem (missing
As for triggering jobs more directly, the GitHub webhooks support in Jenkins makes certain assumptions about how jobs and pipelines are structured/triggered, and thus we can't use GitHub's webhooks to effectively trigger these jobs (without doing additional custom development to sit between the two systems), and thus rely on the built-in Jenkins polling mechanism. This has been fine (we haven't noticed any scalability issues with how often we're polling), and even if we were triggering builds more aggressively, that's only half the problem (since then our build queues would just pile up faster).
One solution that has been proposed is to wait until all architectures successfully build before publishing the relevant manifest list. If a naïve version of this suggestion were implemented right now, we would have no image updates published because our
One compromise would be to use the Jenkins Node API (https://doi-janky.infosiftr.net/computer/multiarch-s390x/api/json) to determine whether a particular builder is down in order to determine whether to block on builds of that architecture. Additionally, we could try to get creative with checking pending builds / queue length for a particular architecture's builds to determine whether or not a given architecture is significantly backlogged and thus a good candidate for not waiting.
We could also attempt to determine when a particular tag was added/merged, and set a time limit for some number of hours before we just assume it must be backlogged, failing, or down and move along without that tag, but this is slightly more complicated (since we don't have a modification time for a particular tag directly, and really can only determine that information on an image level without complex Git walking / image manifest file parsing). Perhaps even just a time limit on the image level would be enough, but in the case of our
Related issues: (non-comprehensive)
This was referenced
Dec 21, 2017
This was referenced
Jan 9, 2018
Hey @tianon Thanks for introducing me to this issue yesterday. Initial thoughts:
The core of the problem seems to be the availability of amd64 images (breaks my heart being an Arm guy - but its a fair statement of the situation today!). We therefore need to make sure that the fat manifest is only published once the amd64 build has completed successfully?
How about implementing a system whereby the manifest is only published when a list of 'gold' architectures have build successfully? This way, you could ensure that the amd64 issue never rears its ugly head again? It would also mean that for specific images - say popular base images like Alpine, that the manifest is only published when for example amd64, arm64v8, ppc64le and s360x are successful?
Its not an ideal solution, but it could act as a way of stabilising things whilst a better solution could be implemented.
One last thought - if this were to be implemented, it would be useful to have a global 'gold architecture' list and a per project delta from that list.
referenced this issue
Jan 18, 2018
What advice do you have for developers affected by this bug? I'm using an image which depends on docker-library/tomcat and I've been unable to build for about a half hour. I read your post pretty carefully, I think, but didn't see any mention of a workaround. Based on what I've read, this is not a problem that can be solved on my side, I would just have to wait.
If that is the case, is there any way for me to do maybe an API query to estimate a wait time until dockerhub reaches a consistent state?
@MattF-NSIDC fair point -- this was intended as a tracking issue for the problem and discussion around how to solve the crux of it properly; I think a short blurb here about how to work around it in the meantime is definitely appropriate. Here's my current recommendation:
If you rely on a specific image, use https://github.com/docker-library/repo-info (linked from every image description) to find the exact
If you're looking for a specific architecture, use the architecture-specific namespace to find it (as linked from both https://github.com/docker-library/official-images#architectures-other-than-amd64 and every image description under "Supported architectures").
As for ETA, even if we find a reasonable solution to "wait for things to be available", we'll still have a limit on how long we wait before pushing whatever we've got, which will likely be on the order of hours but still less than 24 (because IMO, 24h ought to be the absolute maximum we wait before we storm ahead, and should roughly match our current builds-scheduling timing).
@m5p3nc3r yeah, the "gold architectures" solution is basically exactly the solution I wrote on a quick note on my desk the first time we had this problem
I don't love it, but it does seem like the closest we can get, and definitely is going to be better than what we're doing now. It would also allow us to simply trigger all builds as soon as possible after merge and let everything simply trickle through. The main challenge I see is how to implement the "timeout" functionality, but perhaps we simply punt and put the timeout on the full image instead of individual tags and call it a day.