New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow containerd on Windows 11 to use Windows Server 2022 images #8137
Conversation
Hi @hach-que. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I don't know what |
Hi @hach-que You need to sign it off by |
Windows 11 clients support running process-isolated containers of 10.0.20348 and above. Or to put it another way, Windows 11 clients can run Windows Server 2022 containers even when their build numbers don't match. Docker Engine already supports this scenario; this change updates containerd to make it also support this scenario. The [container/OS version compatibility page](https://learn.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility?tabs=windows-server-2022%2Cwindows-11#windows-client-host-os-compatibility) documents this level of compatibility between the client OS and server OS versions. I didn't explicitly add support for Windows 10 clients as the current version of Windows 10 is no longer compatible with the LTSC2019 container images, and there's no way to downgrade a Windows 10 install to the required version. Windows 11 compatibility with LTSC2022 is much more stable in practice than Windows 10, and Microsoft is committed to permitting any combination of Windows Server 2022 images and Windows 11 clients as outlined in the [Windows Server 2022 and beyond for containers blog post](https://techcommunity.microsoft.com/t5/containers/windows-server-2022-and-beyond-for-containers/ba-p/2712487). Signed-off-by: June Rhodes <504826+hach-que@users.noreply.github.com>
672dd76
to
d2ed134
Compare
Should be good now. |
Signed-off-by: June Rhodes <504826+hach-que@users.noreply.github.com>
bd63c20
to
1c58d8a
Compare
Looks like the failing test doesn't have anything to do with the code changes and might just be a broken environment:
|
cc @kevpar @kiashok, the windows platform code strikes again! 😆 Thanks for this change @hach-que, I think it makes sense (and probably should be supported). I could've sworn I remembered some other change with the same motivation but can't find it so 🤷♂️. I'll try and look at this soon if no one else gets to it |
5002cc1
to
5b101ae
Compare
Wondering if this is also something that could have a utility in hcsshim, so that there's a more canonical location to compare if a version is "compatible". |
Signed-off-by: June Rhodes <504826+hach-que@users.noreply.github.com>
5b101ae
to
54afdd0
Compare
After going through the effort of writing my own Kubernetes manager (😅) so I could deploy out a custom build of containerd to the Windows nodes, I can confirm that these changes work. I did backport them to 1.6.18 for this test. (Note the different kernel number presented by cmd.exe) |
func isClientOS() bool { | ||
k, err := registry.OpenKey(registry.LOCAL_MACHINE, `SOFTWARE\Microsoft\Windows NT\CurrentVersion`, registry.QUERY_VALUE) | ||
if err != nil { | ||
return false | ||
} | ||
defer k.Close() | ||
|
||
installationType, _, err := k.GetStringValue("InstallationType") | ||
if err != nil { | ||
return false | ||
} | ||
|
||
return installationType == "Client" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thaJeztah Thinking on your comment, this kind of feels like it could be a helper in hcsshim/osversion also..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevpar Curious on your thoughts. This regkey approach was recommended (6 years ago however) at least as an alternative to GetProductInfo: https://learn.microsoft.com/en-us/previous-versions/windows/desktop/legacy/ee391629(v=vs.85)
@@ -53,6 +55,31 @@ func (m windowsmatcher) Match(p specs.Platform) bool { | |||
if strings.HasPrefix(p.OSVersion, m.osVersionPrefix) { | |||
return true | |||
} | |||
if m.isClientOS && p.OSVersion != "" { | |||
split := strings.Split(p.OSVersion, ".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, what this is parsing isn't semver (it's Windows 4-part OS versions like 10.0.20348.1850), so from what I can tell, that library won't actually parse them correctly. 😅
FWIW this is a "subset" of #7856, which aimed to allow pulling/running any image on RS5+ under Hyper-V isolation. While the discussion on that (and the first instance this came up in #7431) was left at adding dedicated platform matchers for each container isolation type (host, process, and hyper-v) instead of using the single default matcher, IMHO this particular corner-case is reasonable to put there. |
@mxpv not entirely sure if these versions should be parsed with SemVer semantics (don't think they are strictly SemVer, but Microsoft's own format and semantics) (also see https://github.com/microsoft/hcsshim/blob/5871d0c4436f131c377655a3eb09fc9b5065f11d/osversion/osversion_windows.go#L10-L17). That said; as there may be uses outside of containerd itself to handle these versions; both parsing, comparing versions, as well as determining if versions are compatible (which turned out to be impossible in some cases based on version alone, as it sometimes depends on specific Windows kernel versions) Taking all of that into account, I'd love to see this being either part of hcsshim, or even a separate module (if dependency trees become complicated), so that there's a canonical truth for these kind of comparisons (some of which may require deep insight into kernel features / compatibility). See my earlier comments above, and #8137 (comment)
|
@kevpar any thoughts / opinions on the above? 🤗 |
I'm not sure this change is semantically correct, as the stable ABI guarantee isn't "Windows 11 clients", it's all about the build numbers. It just happens that Windows 11 was the first thing in the wild that meets the criteria, and the only one at the time of writing for that blog post. The actual feature is that Windows Server 2022 containers (i.e. The explicit commitment (per microsoft/Windows-Containers#117 (comment)) is that Windows Server 2022-based containers will be functional on at least the next formal Windows Server release, so I think the safest approach is to allow any host 10.0.20348 or greater to pull any image between 10.0.20348 and its own version (although until the next Windows Server release is much closer, there won't be any images greater than 10.0.20348 anyway, as MS don't produce container image bases for VNext anymore). So I'd suggest getting rid of the The latter will come out somewhat like what the Hyper-V isolation matcher will need to look like too, although from memory the challenge with Hyper-V matching is that the knowledge of "We're using HyperV isolation" is not currently used at platform-matcher time. I believe the intent there was to use a different matcher selected by the CRI layer when running in HyperV isolation. On that note, would it make sense to split |
If the guarantee around stability is subject to change (i.e. as you said, LTSC2022 will run on 2025, but perhaps not 2028), then does it make sense to have - in addition to sensible defaults - some way of overriding this in the configuration? That way when new builds of Windows 11 or Windows Server are pushed out, that users of containerd can just override the compatibility in the configuration file without having to patch the containerd binary? i.e. that way if I wanted to override this to get the effect I currently get through patching, I could do:
|
It's an interesting idea. To some extent, I'm not too fussed about this problem as I expect the Windows team to signal an ABI break in the LTSC series with plenty of time to update containerd (since it'll appear at some point in the Insiders builds and/or Azure Stack HCI too; containerd test suite can be run on such hosts fairly easily, we did that late in the LTSC 2022 release cycle for example to shake out issues.) I recognise that I may be wrong about this, and I don't know if containerd support-lifecycle policy would disrecommend running years-old versions of containerd on newer Windows releases. It's very possible they'd be broken for other reasons, but not certain. That said, in the success case, the "select the highest OS build-number that passes the matcher" logic being added here will mean that as long as a LTSC"2028"-based image has been pushed as well as the LTSC2022-based image to the image list, the "LTSC2028" image will be chosen on LTSC2028 or later. Of course, an the LTSC2022-based image will be chosen on LTSC2022, and "LTSC2025", because an "LTSC2028"-based image will be newer, and hence rejected. This requires no config hooks. So the failure case where that configuration option makes a difference is one where only an LTSC2022-based image is available: the mooted config change is turning an "incompatible version" container-start failure into a "no matching version found" pull failure on an "LTSC2028" host. It wouldn't be able to turn a failure-case into a success-case, just move the failure slightly earlier. Which seems reasonably-low value for introducing a new configuration knob for a currently-configless system. Note: The All the above is on the assumption that an "LTSC2028" host can't run an LTSC2022-based image. We don't know when "LTSC2028" here will happen, if ever; all we know is we aren't currently promised that will never happen; we are promised that the next LTSC release won't be "LTSC2028". Which is why I don't think it's worth putting a lot of effort into trying to predict this situation now, since we're talking about guessing more than a full Windows Server LTSC cycle into the future, at least. |
@TBBle That might be true for Windows Server builds, but I expect the breakage for Windows 11 to be a lot less signalled, and I can totally see there being a Windows 11 update at some point after 2025 that unexpectedly switches compatibility from "2022" to "2022+2025" and then "2025 only", outside the normal Windows Server release cycle. I don't want us to punt on making this configurable (i.e. hard-coding it so that LTSC2022 is only accepted between 10.0.20348 and 10.0.22621) and then we're doing this dance again when 23H1 or Server 2025 comes out. If the guarantee is not "2022 images will work forever for 10.0.20348 onwards", then I think having a configurable override is beneficial for those folks who are not running containerd on a Windows Server LTSC release because their blocker on getting things working will be "update the configuration" and not "patch and build new binaries". |
The same thing applies for Windows 11, in that you can't use the config options to turn a failure case into a success case, only a different failure. So there's no one who can "update the configuration to change the matched versions" to unblock themselves.
That's not what we should be doing. We should be encoding for LTSC2022-based image is accepted for host 10.0.20348 or greater, because the only time that produces a failure is when the host cannot run the LTSC2022-based image, and there is no entry in the manifest list for a newer base. And in that case, there is no success option, because that second clause is also a runtime requirement even if you pull an image without a manifest list (and hence don't invoke the platform matcher in the first place). Based on the replies, I suspect my suggested implementation here hasn't been clear. The simple logic for the Windows (process-isolation) platform matcher should be: If host < 20348: Require exact release version (i.e. 10.0.XXXX). That way, if a usable image is in the list, it will always be chosen, no configuration necessary, "forever"*. A base-image ABI break may allow an unusable image to be chosen but only if there is no usable image available. And we can add that once we know about it, to improve the failure description. *: Assuming the current "Stable API" approach remains in-use. If MS revert to "exact match only" semantics for some reason, then this logic will still pick a usable image if one exists, as that's a degenerate case. However, other matching schemas may not magically work. |
I saw your update while I was writing that response:
Seems reasonable to me. |
Yes, that's correct. If we drop the |
/ok-to-test |
@dims: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
This feature would be really helpful. Can I know if this PR is still active? |
containerd 1.7 broke HostProcess support for filesystem drivers (due to the new silo stuff), so at the moment I'm just running containerd 1.6.18 with this patched change. I haven't had a chance to come back to this PR since I wouldn't be able to migrate to the upstream/merged version anyway. |
Is there an open ticket for that new issue somewhere? I had a quick poke around and couldn't see one, but maybe I'm not looking in the right place or using the right keywords. Edit: I see it: microsoft/hcsshim#1699 |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed. |
This PR was closed because it has been stalled for 7 days with no activity. |
Windows 11 clients support running process-isolated containers of 10.0.20348 and above. Or to put it another way, Windows 11 clients can run Windows Server 2022 containers even when their build numbers don't match. Docker Engine already supports this scenario; this change updates containerd to make it also support this scenario.
The container/OS version compatibility page documents this level of compatibility between the client OS and server OS versions.
I didn't explicitly add support for Windows 10 clients as the current version of Windows 10 is no longer compatible with the LTSC2019 container images, and there's no way to downgrade a Windows 10 install to the required version. Windows 11 compatibility with LTSC2022 is much more stable in practice than Windows 10, and Microsoft is committed to permitting any combination of Windows Server 2022 images and Windows 11 clients as outlined in the Windows Server 2022 and beyond for containers blog post.