Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-arch: Windows image pull does not pull best match #6693

Open
fsiegmund opened this issue Mar 18, 2022 · 8 comments
Open

Multi-arch: Windows image pull does not pull best match #6693

fsiegmund opened this issue Mar 18, 2022 · 8 comments

Comments

@fsiegmund
Copy link

Description

If a multi-arch Windows image contains multiple Windows images with same build version but different Update Build Revisions(UBR), the first image listed in the manifest is pulled, even if an image is available that matches build and UBR of the host OS. This causes unexpected container startup delays of several minutes on Kubernetes nodes that already have Windows base images deployed.

With recent Windows versions, images with non-matching UBR can be pulled. However, the official doc still says "Windows requires the host OS version to match the container OS version" (https://hub.docker.com/_/microsoft-windows-servercore). Therefore submitting this issue as a bug instead of enhancement.

Steps to reproduce the issue

Find the OS version of the Kubernetes Windows node e.g. 10.0.17763.1817
Create a multi-arch image for OS versions 10.0.17763.1098 and 10.0.17763.1817
Launch a pod with Windows container and multi-arch image
Describe the results you received:

Image 10.0.17763.1098 is pulled

Describe the results you received and expected

Preference is given to to image that exactly matches the host OS. In this case 10.0.17763.1817. Image pull would not pull base image layers if node already caches servercore / nanoserver images.

What version of containerd are you using?

All containerd versions.

Any other relevant information

Tested on local box, AKS and GKE. AKS and GKE use different OS versions for their Windows nodes, which mean that at the moment a multi-arch image can't be created that pulls the correct sub-image on both environments.

Show configuration if it is related to CRI plugin.

No response

@TBBle
Copy link
Contributor

TBBle commented Mar 19, 2022

I think the existing reproduction case was operating correctly. Although https://hub.docker.com/_/microsoft-windows-servercore does say

Windows requires the host OS version to match the container OS version.

if you follow through to the linked pages, you can see that "host OS version" is meant to be read as, e.g. Windows Server 2022, and that patch version is discussed separately.

There's also a worked example further down that page, which mentions this:

Windows Server 2016 needs both major and minor versions (14393.1715 in this example) to be supported without Hyper-V isolation. Windows Server version 1709 only needs the major version (16299 in this example) to match.

So it would be reasonable for example to apply this logic to Windows Server 2016 only. We're going to have even looser matching rules for versions after Windows Server 2022 once #6508 is implemented, so the precedent is reasonable. Although I don't think we even support Windows Server 2016? It's not run on CI, anyway.

There's also further discussion at https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/update-containers#host-and-container-version-compatibility, which also links to the "February 11, 2020" security update, where there was a container-only patch release, so the explicit instruction was to use a patch release newer than the base image.

See also #6508 (comment) where I explored patch-release matching in more detail with a focus on this latter case.

@fsiegmund
Copy link
Author

If as an image producer I went though the effort to create a multi arch image with e.g. 10.0.17763.2114 and 10.0.17763.2686, I obviously would expect that both can be used, not just 10.0.17763.2686. E.g. node with version 10.0.17763.2114 would prefer image 10.0.17763.2114. not 10.0.17763.2686. The question why want someone do this is another issue, but as stated in #6508, "practically there's almost no reason for an image index to list multiple patch versions of the same build id.", almost is not zero.

I'm just trying to make an argument here for a minor addition in the image sort logic in the presence of same OS and multiple UBRs, which is an unexpected use case, but nevertheless a valid configuration.

I'm not suggesting to use host UBR when there is not an exact match of OS version, as discussed in #6508. The only logic change I suggest is to prefer exact match when OS version is exact match as well, otherwise use max UBR.

I understand there may be other factors that are against it, but I can't see one.

@TBBle
Copy link
Contributor

TBBle commented Mar 22, 2022

Another reason for the existence of a manifest-list with multiple UBRs for the same host version is new builds on the new UBR being added to that manifest list over time. The docker manifest tool for example lets to add to an existing manifest, but not remove items from it, which we do from our CI pipeline.

And in this case, and in the general case of "manifest list" on any platform, I personally would want to select the most up-to-date reasonable list entry, and would find "matched the host's OS patch version" (or any other behaviour other than "take the latest") surprising when I know that this is not necessary, and e.g., is refusing to accept a container which contains a security update without manually removing the previous version of the container from the manifest list.

I was curious, and looking at Docker, it seems the current logic there is actually "random" for multiple images in the list that differ only in UBR. I see there's a ticket over there with the same description this has (moby/moby#42423) which I see led to being punted here but without discussion of the proposed change itself. So no useful input from that side yet.

@fsiegmund
Copy link
Author

ok, this use case is of course a conflict. So how do you cleanup old images, if the images are locked in by the manifest? I suppose you also add a new tag to the manifest and change a deployment config to use the new version? Otherwise a cached version might still be used. I also assume you only target containerd deployments, since as you found out, moby's logic is different.

In our CI/CD all images are considered immutable, including manifest images. To make patches, we just submit a new PR, which builds the whole collection of images with new tags.

Maybe this becomes a philosophical question: what is the purpose of the docker manifest list?

https://docs.docker.com/engine/reference/commandline/manifest/ :

"Ideally a manifest list is created from images that are identical in function for different os/arch combinations."

My understanding is that the spirit of the docker manifest list is to simplify image management and deployment for different targets, and all images in the manifest are equally valid options. I therefore would expect an exact match to the target always win, and if not found, an algorithm to pick the next best option. Otherwise, why would the image be in the list? I would not expect to see a mix of valid and obsolete images.

The ability to target a specific windows version also plays a role when you run pods with multiple containers. In our case, we use three different Windows containers, coming from different CI/CD build pipelines. If every container pulls an image with different max UBR, the image load time would just double or triple. However, if every image has a list of supported target versions, and exact match wins, the UBR of the host would be used to select the version that ensures max sharing of base layers. During transition of version upgrades, there will be no performance penalty. When nodes are upgraded, a different version from the list would be used.

For Linux, we rarely worry about image size and pull time (usually < 100MB). For Windows we can't, since we are in GB domain.

The ticket to moby was mine, too :-).

I hope I made my point clear enough, and I see that there is a bit of ambiguity of what a docker manifest list is supposed to be and used for. Maybe there is a better solution to target specific Windows deployments. We certainly have to abandon manifest lists as not accurate enough and go back to he old way, if the decision is to close this issue with NFA.

@TBBle
Copy link
Contributor

TBBle commented Mar 22, 2022

It wasn't until this discussion that I realised that the CI nodes might not be pulling the latest image from the manifest list. I haven't actually looked, (I don't actually maintain that system anymore, I just submit builds to the queue), but if anyone had asked, I'd have guessed it used the latest patch release as the 'obvious' behaviour. Supporting multiple patch releases wasn't my intention when we set up the system, it was more an accident of having scripts using docker manifest create --amend because we build different platforms at different times, and sometimes rebuilding the image from source because of things like the February 2020 security update issue requiring a new base version.

In fact, thinking about it, and then checking the source, I realise that docker manifest create --amend is not actually amending the existing manifest, it's replacing it. So I've been using it wrong the whole time, and probably do not have multiple patch releases accumulating in my manifest list as I had assumed. >_< (This also means I might have given the wrong instructions to my successor in maintaining that system; hopefully they worked it out, or the Windows Server 2022 build would have erased the Windows Server 2019 build and the Linux build from the manifest list, although that wouldn't fail until someone flushed the images on a build host, or deployed a new build host. Erk.)

As it happens in this flow, the produced image is much larger than the Server Core image underneath, so optimising for Server Core pulls was not really on our radar there, as it's only a minor difference in the first pull, and after that doesn't matter because it'll use the current local image until manually flushed out, or left idle long enough for docker system prune --filter to remove it.

So it turns out I don't have the horse in this race that I thought I had.

All that said, I still think taking a non-latest version from the list of compatible versions fails the principle of least-surprise. I do recognise that my "least-surprise" is not universal.

I feel that this is the wrong layer to solve the problem you're describing, because, e.g. in the case "When nodes are upgraded, a different version from the list would be used.", unless you've pushed new images for all those containers with a container image with the new patch release (if one even exists) and the same image is pre-loaded onto that machine as part of the upgrade (i.e. it's more than just an in-place version upgrade), then you're back to possibly puling several different Server Core base images at Pod startup. This suggests to me that solving this problem in this way is fragile, and depends on external factors (from the perspective of the version matcher).

I think that solving what you want here would be better handled at a higher layer, that provides a manifest list matcher that knows exactly which versions are suitable from external information like 'which base images are already pulled' and perhaps 'which containers are in the pod'. (Although if you have multiple Server Core versions already available locally, then you don't really need the containers to match each other, as long as they use a version that's locally available, you save a few GB of pull on the first usage). I'm not sure that's actually feasible with the current APIs though, so it's only a thought. It would avoid needing to implement low-level behaviours based on implied high-leve meaning of those low-level behaviours.

@jterry75
Copy link
Contributor

In general I agree. This is a solvable problem but I don't think this is the right default globally. A custom matcher for k8s if they wish to solve this would make it possible and nothing is stopping that today. But as a best practice IMO the platform should always apply the highest patch number it can unless explicitly reference by sha/kb/version for a previous version. When someone says "pull ltsc2019" I don't believe they want the version that matches the host explicitly, they want the most patched ltsc2019 container image. Just my opinion though, others thoughts are welcome.

@fsiegmund
Copy link
Author

Custom matcher sounds good! I did not know that this was supported. I will try to make a case for this in k8s and figure out where that change should be added (any hints welcome).

Note: I doubt that any public image has manifest list with same OSVersion and multiple UBRs. moby for example does not use latest.

@jterry75
Copy link
Contributor

@fsiegmund - I think you could make this a cri opt to use a different matcher and put this in the cri plugin itself. Then it would be callers choice how to resolve on PullImage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants