New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Platform matcher: allow for running of newer Win images under Hyper-V. #7856
Conversation
Hi @aznashwan. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Note that the referenced compatibility matrix on Microsoft's website is a bit outdated and doesn't claim running 2022 images on W10/WS2019 under Hyper-V is supported yet, but it does work so it's just a matter of the docs getting updated. This is a sister PR to moby/moby#44489, which enables the same behavior in the Docker daemon. |
f45527f
to
906cbfc
Compare
As far as I remember the matrix was setup like that for a reason, I was always hesitant to add in explicit forward compat (higher versioned guest on lower host) as there may be weird things that fail. The very first thing I tried on an rs5 machine was if we could launch a 19h1 UVM and that worked, so it's been "a thing" for awhile 😳. cc @kevpar, as I vaguely remember us looking into this |
Agreed forward compat might be risky, looping in @msscotb and @helsaawy for their take. IMO backwards-compat (older guest image on newer host) is a pretty legitimate usecase though and should be considered. |
100% agree to backcompat, I'm happy to see this! |
match: false, | ||
}, | ||
{ | ||
platform: imagespec.Platform{ | ||
Architecture: "amd64", | ||
OS: "windows", | ||
}, | ||
// If there is no platform.OSVersion, we assume it can run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.. are there many Windows images without this? What did docker do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior in upstream Docker is to accept images with empty or misformatted OSVersion strings. (i.e. if strings.Split(OSVersion, ".")
yields less than 3 components)
Wanna wait to hear @helsaawy and @msscotb's take on forward compat before reviewing too much more 😬 I hit the same realization as you pretty early on that it does seem to work, but I never took it for granted haha. I'd probably err on the side of using that compat matrix on the site as the source of truth for now; did we need lower host->higher guest for a feature? |
The primary driver for the Docker PR was allowing desktop users with updated Windows 10 hosts to run Server 2022 containers for dev purposes. At the end of the day HCS will have the final say on whether or not the container will run, so the most "damage" this could do is waste some time/storage by downloading an image which may not run in the end. |
Speaking on the logic side, we agreed it was acceptable to allow backward and forward compat on the moby side, since HCS will ultimately verify. |
SGTM. I'll give this a look sometime this week then |
91ea22f
to
906cbfc
Compare
Integration tests aren't happy with this 😞:
|
It's really weird that 2022 seems to be the unhappy one but 2019 isn't, my best guess currently is that the changes in the platform comparison logic leads to containerd considering the 2019 manifest of the busybox:1.29-2 test image to be "good enough" and trying to run that in standard Process isolation mode (which fails) I'll try to make the comparison logic prefer closer build numbers to the host but I think that'd even further complicate the platform matcher logic on Windows. |
Following the addition of Hyper-V isolation, it should be possible for Windows hosts to run container images which are older or newer than the host build if the Windows host is version RS5 (1809) or above. This patch updates the Windows platform matching logic to allow for build number mismatches between Windows images and hosts iff the host version is at least RS5. Signed-off-by: Nashwan Azhari <nazhari@cloudbasesolutions.com>
906cbfc
to
f086b02
Compare
Signed-off-by: Nashwan Azhari <nazhari@cloudbasesolutions.com>
f086b02
to
b65e1ba
Compare
I'm surprised to see the implementation for this new functionality didn't include a new platform matcher that had hyper-v isolated container specific logic and was selected at the appropriate times. Adding a new platform matcher for Windows Hyper-V isolated containers was even suggested here #7431 (comment) I'm concerned that changing the behavior of how the 'default' platform matcher works when performing image pulls on Windows could cause bad user experiences. |
#7431 flew under my radar as I was tunnel-visioned on enabling running any image under Hyper-V, though I reached the same conclusion you did on there currently not being a good way to inject a matcher during the pull process, and I did not see a cleaner course of action which doesn't involve a refactor. Even if specifying a matcher were possible, there would be some contexts (e.g. standalone Either way I agree the current approach of overriding the default matcher is not ideal, so I'll convert this PR to a draft and at the very least separate the matching logic into a |
In the stand along ctr pull scenario, the user would know if they want to run in hyper-v and so would specify that info. Is that a reasonable expectation? |
@marosset @jsturtevant @aznashwan I am not sure we can support different matchers for wcow process and hyperV isolated right away during image pull as we don't yet have that information right? I was looking into the suggestion made here #7431 (comment) - I don't see the PodSandboxConfig having a field for runtime handler : Are you suggesting that we introduce a new field to WindowsPodSandboxConfig to indicate if its hyperV isolated pod just like we have a field for HPC under WindowsSandboxSecurityContext ? For the short term, I think loosening up the windows platform matcher logic to not block on an image pull would help a lot while we work on a long term approach to specify runtime class at image pull itself? |
If I remember correctly, the PullImage CRI request has an annotation for which a HyperV-ness value was defined. It's mentioned in one of the various discussions here, but I can't immediately lay my hands on it. (Edit: #6491 (comment)) It bugs me that HyperV-ness is handled differently from Host Process Containers, but I think that's water under the bridge now. (Either both should be runtime handlers, or both should be part of Unfortunately, #7431 (comment) discovered that there's at least one underlying use of 'Default Platform Matcher' (and others have been mentioned in passing) which would need to be cleaned up before the intended "choose platform matcher based on CRI request" implementation will work. In the meantime, referencing a specific image (rather than an image index) should work, and if any platform matchers are invoked in that flow, I'd consider it a bug to be fixed. |
using Annotation field to specify which mode/runtime class to pull the image for would be a bigger change that needs kubelet and other tools to also change right? It probably a longer term fix that needs a KEP. The idea of this PR to bring in a shorter term fix where we loosen up the default windows platform matcher to allow an image to be pulled. In either case I think we should not block during the pull - we should pull an image based on what we think is best from the index. If HCS is not happy with what is being used while creating/starting a container, then we can choke at that level instead of blocking pull. |
@jsturtevant @marosset @kevpar thoughts? |
My understanding of the direction taken at this point is the other way 'round. If the user is doing something other than process isolation, then they can override the sandbox image and any other image indexes they use to reference the exact os version image they want, but the default approach is to make the process isolation the "works by default" case. That decision may have been from back when k8s explicitly did not support HyperV isolation on Windows though. (I suspect they still don't, which would be a strong argument for me to keep the default behaviours aligned around process isolation, even though it's a trade-off of less security for more performance and hence is not always the best chocie). HyperV isolation also has a further complication (seen in #8348) that you need to pull the correct sandbox image os version based on the selected os version of other images in the sandbox, and CRI cannot currently represent that; so choosing to make HyperV pick the latest build in existence by default will then only work correctly if every image index in the pod has an image available for the same os version as its own latest build. At some point we might be able to improve that by arranging to cross-check pulls for the same HyperV-isolated sandbox so they all have the same build, but my memory of CRI flows etc means that the only image we know of when processing a CRI pull request is the sandbox, so at best we can ensure images taken from an image list are all compatible with it, which is probably about right. Anyway, without some more time in the code and for consideration, I don't think HyperV-isolation-supporting pulls should be the default, as there's still conflicting use-cases about which one to pick from a list in those cases, and data-flow in CRI makes that hard to resolve nicely. Process-isolation by default is the zero-config option. If you have to configure HyperV isolation as a runtime anyway, perhaps that's a place to specify the build-number to use, which can then be used for the platform matcher for both the sandbox image and images pulled for that runtime. |
"Process-isolation by default is the zero-config option. If you have to configure HyperV isolation as a runtime anyway, perhaps that's a place to specify the build-number to use, which can then be used for the platform matcher for both the sandbox image and images pulled for that runtime." -- This is the option we are exploring for the long term (users can specify a runtime class and the runtime class will have an option to specify the platform, os version and sandboxisolation for us to decide which platform handler to call into. But this would require a KEP and might be more than a year worth of work to get all the code checked in after KEP is approved. Till such time I think loosening up the image pull is probably the only feasible short term option left?! |
I meant using different runtime classes for different platform versions. That'd be transparent to k8s (it just sees the name in the Same with the image-pull annotation, it's already populated from the Pod annotations by kubelet, so that's user-exposed now, and requires no k8s changes to consume in containerd-cri. The ideal would be that the pull operation gets explicit access to the runtime_handler name (#6657 (comment)) which does require a KEP. In the meantime, we already have implemented in containerd 1.7 an annotation This was implemented for per-runtime Snapshotters, but handily that means all the flows external to containerd-cri are already handled and being improved. I'm not aware of per-runtime Snapshotters being used in-the-wild yet, it's a new feature of containerd 1.7.0 (#6899), so I can't point at any existing user-facing docs that demonstrate this. So I think the soonest-effective change to make to achieve the goal here is for the containerd-cri runtime class config to be able to specify a PlatformMatcher like it can specify a Snapshotter, and ensure we fix any code which unconditionally uses a default platform matcher, to honour that setting. I disagree with making the default wider because it'll break existing workflows, which do not specify a |
if we have this information in cri code path couldn't we select the appropriate matcher via code (this was the approach @marosset took)? I guess specifying it via configuration makes it more extensible?
after a few other conversations I agree we can't make it wider |
Right now, the containerd code doesn't even know that a runtime is going to use HyperV isolation, that config option is actually part of an opaque blob passed down into hcsshim, we should avoid parsing it in containerd, particularly at higher layers like this. (AFAIR we do parse it in order to add debug flags at one point, but that's down near the actual call to the runtime shim. containerd-cri should not know about this at all.) So we'd need to expose something to containerd config, and I personally think it'd be better to expose a platform matcher choice rather than trying expose a bunch of knobs that affect a single platform matcher, since once we bring LCOW into the mix, we need both os (and future may need os.arch) selection options, and os.version ranges, and that config probably needs a way to say "delta from host build" to make it reasonably cut-and-pastable; explicitly specifying the current default (process isolation, match host for all of os/arch/build) would be surprisingly verbose. That said, the range of possible platform matcher implementations is also wide, and so maybe that won't be a user-nice experience either if we end up with like a dozen options. So we might need to have a few platform matcher implementations and some specific config options for those implementations. Which can also be the worst-of-both-worlds, so needs careful consideration. Either way, some actual iteration will be needed to pull together the various existing and mooted platform matcher needs; I do think that it will be unavoidable to expose them to configuration somehow, because we already have multiple feature requests that require different results for the same ImagePull request even if HyperV-ness is known, e.g. this one and #8348: AFAICT they're both valid use-cases, so they can't be resolved by simply changing behaviour based on the HyperV isolation flag. (Todo: Track down the ticket where the user wanted the Platform Matcher to prefer older os.build that was already in the content store over newer os.build that would need to be downloaded, as it's somewhat related here.) #7431 (comment) was in the context of Host-Process Containers, which containerd-cri does know about (it's a CRI field), and so we can automatically select a specific platform matcher for that case. HPC also has pretty unambiguous rules: We can take any image matching os/arch, as long as it has either a os.version <= the host, or no os.version at all. (Arch only matters when Windows ARM containers exist again, and we may not need an arch-match depending on how emulation works in that hypothetical future. We can unburn that bridge when we come to it.) We do need to fix the bugs that comment revealed of using the default platform matcher inappropriately, interfering with an override. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed. |
This PR was closed because it has been stalled for 7 days with no activity. |
Following the addition of Hyper-V isolation, it should be possible for Windows hosts to run container images which are older or newer than the host build if the Windows host is version RS5 (1809) or above.
This patch updates the Windows platform matching logic to allow for build number mismatches between Windows images and hosts iff the host version is at least RS5.
Signed-off-by: Nashwan Azhari nazhari@cloudbasesolutions.com