Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add format for platform string #6

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

kiashok
Copy link
Contributor

@kiashok kiashok commented Jan 31, 2024

Platform string to be of the form <os>[(<osversion>)]|<arch>|<os>[(<OSVersion>)]/<arch>[/<variant>]

See discussion in this PR: containerd/containerd#9609

PLEASE NOTE: The intent of this PR is NOT to solve the pull problems that exist for hyperV/process isolated containers OR to solve any platform matcher issue. This is plainly to fix the bug reported earlier to stop auto populating the host OS version in the formatting functions by using the grammar mentioned above (it gives the user a chance to specify the OSVersion if they want to. If one if not specified, this change will not autofill the host OS Version and will instead leave it empty. If the platform matchers do not like this, it is for them to handle it and not the platforms.Parse() or platforms.Format() functions).

@kiashok
Copy link
Contributor Author

kiashok commented Jan 31, 2024

Adding everyone who were part of discussion in containerd PR here : @dmcgowan @cpuguy83 @aravindhp @jsturtevant @TBBle @tonistiigi @thaJeztah

@tianon
Copy link
Member

tianon commented Feb 1, 2024

Not as an opinion, but just to share prior art: https://github.com/google/go-containerregistry/blob/8dadbe76ff8c20d0e509406f04b7eade43baa6c1/pkg/v1/platform.go#L54-L75 (os:version/architecture/variant)

@kiashok kiashok changed the title Add grammar/format for platform string Add format for platform string Feb 1, 2024
@kiashok kiashok force-pushed the addFormatParsing branch 3 times, most recently from 8721eff to a11cef4 Compare February 1, 2024 03:55
@dmcgowan
Copy link
Member

dmcgowan commented Feb 1, 2024

@tianon I almost like it but it seems to actually be os/architecture/variant:version, which older parsers would attribute to architecture or variant. The format you suggested doesn't seem too bad either if that is easier to parse. I do like having the :version always parsed with the os (before the first slash) so even on older parser any error would show up as an unknown OS with the version.

@dmcgowan
Copy link
Member

dmcgowan commented Feb 1, 2024

Thanks @kiashok for opening this here!

@kiashok
Copy link
Contributor Author

kiashok commented Feb 1, 2024

@tianon I almost like it but it seems to actually be os/architecture/variant:version, which older parsers would attribute to architecture or variant. The format you suggested doesn't seem too bad either if that is easier to parse. I do like having the :version always parsed with the os (before the first slash) so even on older parser any error would show up as an unknown OS with the version.

@dmcgowan does this current change for os(osversion) look good to you or do you want me to change to os:osversion ?

database.go Outdated Show resolved Hide resolved
@cpuguy83
Copy link
Member

cpuguy83 commented Feb 2, 2024

I still think its safer to include key=value inside of the (), i.e. windows(version=10.0.20348.2227).
I'm not sure how to include os.features in there, I'd expect comma separated from version=<v> but then how do you represent a list like os.features.

@kiashok kiashok force-pushed the addFormatParsing branch 2 times, most recently from f585855 to 508d5d9 Compare February 2, 2024 01:33
@TBBle
Copy link

TBBle commented Feb 2, 2024

There's only one known value for os.features, and no one uses it (the win32k feature is always available as of like 2018 with the death of bare-metal nanoserver, and no one validates against it anyway AFAIK), so I'd suggest ignoring os.features, beyond acknowledging that we future changes may introduce multiple os sub-elements in the OCI Image Spec Platform object at some point.

os.features being an optional list was a surprising hassle to deal with for simply copying Platform objects in Go (due to overlap in how the Go JSON support handles empty or absent lists, and the possibility for accidentally mutating other copies) so I'm hoping that "list of strings" doesn't happen in the future. (Also, comparison semantics for the list were never defined, since there was only ever 0 or 1 element...)

Similarly, we might get rid of os.features itself in the future, either just dropping it, or if it gets subsumed into whatever the OCI Image Compat WG comes up with as their generalised approach. OCI Image Compat WG appears to be discussing hierarchical trees of features (in a separate artifact) and are not yet (AFAIK) looking at how that would interact with Image Indexes or the Platform Object: the topic has been raised a couple of times, so it's not forgotten, just at the far end of the process for now.

(Note, I seem to have misplaced my Slack link, so my OCI Image Compat WG notes here are based on perusing the docs and skimming the meeting transcripts. If anyone is better-connected at the moment, feel free to correct me.)

@kiashok
Copy link
Contributor Author

kiashok commented Feb 2, 2024

There's only one known value for os.features, and no one uses it (the win32k feature is always available as of like 2018 with the death of bare-metal nanoserver, and no one validates against it anyway AFAIK), so I'd suggest ignoring os.features, beyond acknowledging that we future changes may introduce multiple os sub-elements in the OCI Image Spec Platform object at some point.

os.features being an optional list was a surprising hassle to deal with for simply copying Platform objects in Go (due to overlap in how the Go JSON support handles empty or absent lists, and the possibility for accidentally mutating other copies) so I'm hoping that "list of strings" doesn't happen in the future. (Also, comparison semantics for the list were never defined, since there was only ever 0 or 1 element...)

Similarly, we might get rid of os.features itself in the future, either just dropping it, or if it gets subsumed into whatever the OCI Image Compat WG comes up with as their generalised approach. OCI Image Compat WG appears to be discussing hierarchical trees of features (in a separate artifact) and are not yet (AFAIK) looking at how that would interact with Image Indexes or the Platform Object: the topic has been raised a couple of times, so it's not forgotten, just at the far end of the process for now.

(Note, I seem to have misplaced my Slack link, so my OCI Image Compat WG notes here are based on perusing the docs and skimming the meeting transcripts. If anyone is better-connected at the moment, feel free to correct me.)

I agree. I am not sure we need to include osFeature as it is not being used today and I have heard conversations of wanting to remove that field from ocispec as well.

@kiashok
Copy link
Contributor Author

kiashok commented Feb 2, 2024

I still think its safer to include key=value inside of the (), i.e. windows(version=10.0.20348.2227). I'm not sure how to include os.features in there, I'd expect comma separated from version=<v> but then how do you represent a list like os.features.

Hmm I think the format is mostly defined in code and the rest of the fields are not prepended with "os", "arch" or "variant". So no not having "version" would make it uniform and consistent with the rest of the grammar for platform string.

@TBBle
Copy link

TBBle commented Feb 3, 2024

Off the top of my head, for Windows, the host OS should always have an os.version value, but a container image may have a meaningful empty/absent os.version value (but no distinction between empty-string and absent).

So for the current situation of only os.version, not labelling the version is fine: windows(10.0.20348) and windows but in future (I'm going to use os.features as an example), do we need to worry about/avoid situations like windows(10.0.20348/win32k) and windows(/win32k)?

Also, only a mild concern for this PR, but what about cases where absence and empty-string need to be distinguished? The discussion on the earlier PR mooted such a case when parsing these strings, in that we need to inform the caller of platforms.Parse whether the user-provided platform string (i.e. --platform arg for nerdctl pull) included an empty version (requesting a Host Process Container-compatible image) or not specifying a version (probably meant "Best choice for current OS"). With the current syntax, that requires distinguishing windows and windows() in platforms.Parse somehow. (nerdctl build and nerdctl run potentially have different behaviours here, which isn't ideal, so this is only a moot.)

This case is hard because Go's Platform object doesn't have a way to directly make that distinction, and also tells the JSON codec not to make that distinction, but the spec allows it. So I don't have a better solution in-mind right now than either creating a special token string which is used for either the windows or windows() case, or extending platforms.Parse to return information about which values were not specified. (Which would also apply to inferred arch/os/variant values too although I don't currently have a use-case for that).

@kiashok
Copy link
Contributor Author

kiashok commented Feb 6, 2024

Off the top of my head, for Windows, the host OS should always have an os.version value, but a container image may have a meaningful empty/absent os.version value (but no distinction between empty-string and absent).

So for the current situation of only os.version, not labelling the version is fine: windows(10.0.20348) and windows but in future (I'm going to use os.features as an example), do we need to worry about/avoid situations like windows(10.0.20348/win32k) and windows(/win32k)?

Also, only a mild concern for this PR, but what about cases where absence and empty-string need to be distinguished? The discussion on the earlier PR mooted such a case when parsing these strings, in that we need to inform the caller of platforms.Parse whether the user-provided platform string (i.e. --platform arg for nerdctl pull) included an empty version (requesting a Host Process Container-compatible image) or not specifying a version (probably meant "Best choice for current OS"). With the current syntax, that requires distinguishing windows and windows() in platforms.Parse somehow. (nerdctl build and nerdctl run potentially have different behaviours here, which isn't ideal, so this is only a moot.)

This case is hard because Go's Platform object doesn't have a way to directly make that distinction, and also tells the JSON codec not to make that distinction, but the spec allows it. So I don't have a better solution in-mind right now than either creating a special token string which is used for either the windows or windows() case, or extending platforms.Parse to return information about which values were not specified. (Which would also apply to inferred arch/os/variant values too although I don't currently have a use-case for that).

if for example you do a ctr image pull and --platform is windows or windows() it means that osversion is not given and windows image pull will fail at platform matcher as no host OSVersion can be "" . We should not support a behavior of empty host OS version to mean that they wanted a host process container compatible image. For windows, this should be an error. If HPC compatible image was needed, an appropriate image for that same much be constructed with OSVersion "" .

For the OSFeature, I am not sure we need to cover all its cases as we do not know what it could entail as its type is []string and it is not being used and has plans to be removed. If it starts being used, we can revisit then IMO. If in the future it does become needed, I think it might be more helpful to have a separate field for it rather than trying to fit it into OS(OSVersion)?

cc @jsturtevant @kevpar

@TBBle
Copy link

TBBle commented Feb 6, 2024

if for example you do a ctr image pull and --platform is windows or windows() it means that osversion is not given and windows image pull will fail at platform matcher as no host OSVersion can be "" . We should not support a behavior of empty host OS version to mean that they wanted a host process container compatible image. For windows, this should be an error. If HPC compatible image was needed, an appropriate image for that same much be constructed with OSVersion "" .

So I understand, are you suggesting that in this case, --platform windows and --platform windows() should parse to the current host version, and if you want to explicitly say "Host Process-compatible images only" (or "data-only image", although the value of that use-case for in an image index is unclear to me) you use --platform windows("")?

I'm not sure that an empty osversion in --platform would cause an image pull failure, since any osversion value in an image should match an empty osversion in the host, as Host Process Containers can run with any image. However, I'm not totally sure on the semantics of nerdctl pull --platform; it'd be nice to be consistent with nerdctl run and nerdctl build, but the former has --privileged which would make more sense to use as the "Host Process Container" activator, while nerdctl pull and nerdctl build both have use-cases for building non-runnable containers that don't exist with nerdctl run. (So maybe nerdctl build and nerdctl run already need different semantics for --platform here...)

@kiashok
Copy link
Contributor Author

kiashok commented Feb 7, 2024

if for example you do a ctr image pull and --platform is windows or windows() it means that osversion is not given and windows image pull will fail at platform matcher as no host OSVersion can be "" . We should not support a behavior of empty host OS version to mean that they wanted a host process container compatible image. For windows, this should be an error. If HPC compatible image was needed, an appropriate image for that same much be constructed with OSVersion "" .

So I understand, are you suggesting that in this case, --platform windows and --platform windows() should parse to the current host version, and if you want to explicitly say "Host Process-compatible images only" (or "data-only image", although the value of that use-case for in an image index is unclear to me) you use --platform windows("")?

I'm not sure that an empty osversion in --platform would cause an image pull failure, since any osversion value in an image should match an empty osversion in the host, as Host Process Containers can run with any image. However, I'm not totally sure on the semantics of nerdctl pull --platform; it'd be nice to be consistent with nerdctl run and nerdctl build, but the former has --privileged which would make more sense to use as the "Host Process Container" activator, while nerdctl pull and nerdctl build both have use-cases for building non-runnable containers that don't exist with nerdctl run. (So maybe nerdctl build and nerdctl run already need different semantics for --platform here...)

@TBBle an example of a base image for HPC is as follows:

image

they do not have an OSVersion. Windows platform matchers need the host where image is being pulled for example or whatever string is mentioned for --platform to have the osversion. If one is not present, it will fail pulling layers. A simple test of pulling nanoserver 2022 with transfer service (ctr image pull by default used transfer service) and inspecting the content and image store will show that layer is not pulled as windows platform matcher fails.
When platforms pkg was part of containerd, this change was made to fill in the osversion as the default host osversion if one was specified : but we got feedback here containerd/containerd#9609 that this should not be done by default and needs to be changed.

@TBBle
Copy link

TBBle commented Feb 8, 2024

My point was that (as far as I recall; I don't have a HPC execution setup at-hand right now) I can also run the nanoserver LTSC 2022 image or images derived from it as a Host Process Container. So for a few-hundred megabytes, I don't need to build separate images for privileged and non-privileged execution.

So in my mind, nerdctl run --privileged mcr.microsoft.com/windows/server:ltsc2022 should naturally produce a HPC-based container, with an entire (unused) Windows nanoserver OS copy mounted at my container root.

So in this case, the host is "" (which thanks to the joys of Go's JSON parser is equal to "no os.version value"), and it should be able to fall-back to an image with os.version 10.0.20348 (or any other value) if an image with absent, null, or empty-string os.version is not available in the target image index.

The feedback at containerd/containerd#9609 is not that "this pull should not work", the feedback is that "this pull should not be made to work by fudging in the host os.version when parsing a platform string or receiving Platform objects from callers". Only the caller of platform.Parse knows if inserting particular details from the host into the parsed string is appropriate, based on what they're going to do with it, and later users (like the Platform Matcher) should not then be overriding the caller's choices in the Platform object.

Basically, Host Process Containers need a different Platform Matcher behaviour, like Hyper-V Isolated containers do. "Exact match" is only correct for Process Isolated containers. We should not be trying to massage the input inside the Platform Matcher so that an "Exact Match" check will choose the right image; Platform Matchers are customisation points, the data should reflect truth. (Sadly, the Platform Matcher API doesn't provide a way to control the rules used, and AFAIK there's still some outstanding bugs or design issues when using a non-default Platform Matcher... and gosh, adding test coverage and fixing that has been on my TODO list for much longer than I realised. >_<)

For nerdctl run or nerdctl build (or nerdctl mount if that was a thing), it's actually pretty easy to describe the semantics: Host Process Mode would be controlled by the --privileged flag as you'd expect, and if you point it at an image list, the correct image can be chosen based on user-intent.

The challenge which I raised at the start of this discussion is that this user-intent is not visible in nerdctl pull or ctr pull: nerdctl pull --privileged is semantically weird, and if you're pulling the image so you can mount it and grab a binary out (the containerd managed opt use-case), you probably want the smallest image, assuming the image list is the same binary wrapped up for multiple platforms.

We've seen the exact same issue in CRI, that the "ImagePull" operation hits a snag when the correct image to pull varies based on what's going to be done with it later. In nerdctl, assuming that the nerdctl pull is for a process-isolation nerdctl run isn't a bad assumption, but we can't make that assumption in containerd logic. The primary BuildKit maintainer seemed pretty clear that if you pass a --platform to it, e.g., via nerdctl build, you should get the same result irrespective of the host platform, so the needed handling is different there.

@kiashok
Copy link
Contributor Author

kiashok commented Feb 8, 2024

My point was that (as far as I recall; I don't have a HPC execution setup at-hand right now) I can also run the nanoserver LTSC 2022 image or images derived from it as a Host Process Container. So for a few-hundred megabytes, I don't need to build separate images for privileged and non-privileged execution.

Yes you can run definitely run the nanoserver as base but the only need for HPC is that the container image OSVersion can be "". There is no requirement that host OSVersion should or can be "" too. That is what I was meaning to put across. Since the other containerd PR suggested that we do not auto populate OSVersion's if it came in as empty and if we want to assume that host OSVersion being empty/"" and means that we assume they are asking for HPC image is behavior change that needs to be made in the windows platform matcher and not in containerd/platforms.

So in my mind, nerdctl run --privileged mcr.microsoft.com/windows/server:ltsc2022 should naturally produce a HPC-based container, with an entire (unused) Windows nanoserver OS copy mounted at my container root.

So in this case, the host is "" (which thanks to the joys of Go's JSON parser is equal to "no os.version value"), and it should be able to fall-back to an image with os.version 10.0.20348 (or any other value) if an image with absent, null, or empty-string os.version is not available in the target image index.

I am not familiar with nerdctl behavior and would have to review that before I have a thought on that behavior vs containerd windows platform matcher.

The feedback at containerd/containerd#9609 is not that "this pull should not work", the feedback is that "this pull should not be made to work by fudging in the host os.version when parsing a platform string or receiving Platform objects from callers". Only the caller of platform.Parse knows if inserting particular details from the host into the parsed string is appropriate, based on what they're going to do with it, and later users (like the Platform Matcher) should not then be overriding the caller's choices in the Platform object.

I don't think I meant that the PR said that :) I was referring to the fact that the feedback on that PR suggested that we do not auto populate OSVersion in platforms package if it was passed in as "" and instead it is a behavior that the windows platform matcher should take responsibility for.

Basically, Host Process Containers need a different Platform Matcher behaviour, like Hyper-V Isolated containers do. "Exact match" is only correct for Process Isolated containers. We should not be trying to massage the input inside the Platform Matcher so that an "Exact Match" check will choose the right image; Platform Matchers are customisation points, the data should reflect truth. (Sadly, the Platform Matcher API doesn't provide a way to control the rules used, and AFAIK there's still some outstanding bugs or design issues when using a non-default Platform Matcher... and gosh, adding test coverage and fixing that has been on my TODO list for much longer than I realised. >_<)

For nerdctl run or nerdctl build (or nerdctl mount if that was a thing), it's actually pretty easy to describe the semantics: Host Process Mode would be controlled by the --privileged flag as you'd expect, and if you point it at an image list, the correct image can be chosen based on user-intent.

The challenge which I raised at the start of this discussion is that this user-intent is not visible in nerdctl pull or ctr pull: nerdctl pull --privileged is semantically weird, and if you're pulling the image so you can mount it and grab a binary out (the containerd managed opt use-case), you probably want the smallest image, assuming the image list is the same binary wrapped up for multiple platforms.

We've seen the exact same issue in CRI, that the "ImagePull" operation hits a snag when the correct image to pull varies based on what's going to be done with it later. In nerdctl, assuming that the nerdctl pull is for a process-isolation nerdctl run isn't a bad assumption, but we can't make that assumption in containerd logic. The primary BuildKit maintainer seemed pretty clear that if you pass a --platform to it, e.g., via nerdctl build, you should get the same result irrespective of the host platform, so the needed handling is different there.

Inside a UVM for hyperv cases too it is just like a process isolated container - exact match of the UVM image and container image is still needed.
For process isolated we need exact host and guest match. For HyperV there is still a compat check between the host and UVM image (KEP 4216 has more details about the compat matrix and what UVMs are supported on which Windows host OS) + UVM image and the container image needs an exact match. There is already in KEP 4216 in v1.29 k8s and I have some PRs out in containerd to support the changes needed for this KEP (PRs are being broken to smaller ones for easier review and changes are still in progress). This KEP is to help support different behaviors for process vs hyperV vs HPC and give more flexibility.

I just don't think that the parser functions in containerd/platform is the right place to address all of this. The functions in containerd/platform repo should barely only parse what is given and set the ocispec.Platform fields or convert them to a string representation as needed.
This PR is only to set a grammar for the string representation of ocispec.Platform and not to address platform matching of windows. The latter should be handled in containerd/containerd and is already in the works in order to address kep 4216/image pull per runtime class.

@TBBle
Copy link

TBBle commented Feb 8, 2024

Yes you can run definitely run the nanoserver as base but the only need for HPC is that the container image OSVersion can be "". There is no requirement that host OSVersion should or can be "" too. That is what I was meaning to put across.

That makes sense; reading back, I think we might have been agreeing at cross-purposes.

I was assuming that if I was executing in Host Process mode, the Host's Platform object would have empty OSVersion, but hadn't really questioned that assumption. A HPC-specific Platform Matcher would never look at the host's OSVersion (it's irrelevant) so carrying the host's OSVersion isn't a problem, i.e. the DefaultPlatform object can be sensibly used.

There was some discussion that when BuildKit is told to build for --platform windows/amd64, it would not use the host's OSVersion, but some fixed OSVersion value, and maybe that value would be "". However, that was mostly mooting, and isn't really a platforms.Parse issue, it's up to BuildKit (in this case) to decide if and how to fill in unspecified Platform object fields after parsing.

@kiashok
Copy link
Contributor Author

kiashok commented Feb 13, 2024

@dmcgowan @mikebrow @TBBle @cpuguy83 could you please take a look when you have some time please? This change would be needed for image pull per runtime class : containerd/containerd#9832. Thanks!

@@ -18,7 +18,7 @@ package platforms

// DefaultString returns the default string specifier for the platform.
func DefaultString() string {
return Format(DefaultSpec())
return FormatAll(DefaultSpec())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changing defaults here.

/cc @tonistiigi @tianon

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems expected to me, but highlights a possible issue.

A quick survey of the containerd code calling this function suggests that by-and-large it's used to hold a string that is given back to Parse later; so in the same process it's fine, and in fact due to the removal of auto-insertion of Windows host OS version in Parse, this one needs to be FormatAll or we're changing the observed behaviour of Parse(DefaultString()) on Windows.

The Windows v2 Runtime manager's supported platforms list might need to be hard-coded though. It already hard-codes linux/amd64 (that should probably be linux/$(GOARCH)...?).

Some cases however are in the client library, and perhaps the string is being processed in containerd on the other side of the (stable API!) remote connection. If that's the case then depending on the direction of mismatch, we're either generating strings old containerd can't parse, or generating strings which new containerd will parse into a platform object unexpectedly absent OS version.

I haven't checked to confirm that the stable API actually contains these platform-as-strings, the ones I was able to trace through GitHub's WebUI seemed to still be Parsed in the client library code, so maybe this problematic case doesn't exist here.

Direct consumers of this library will be affected by the defaults change, but AFAIK this library is not 1.0-versioned?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion in a Moby maintainers call, and there were still some concerns if this could break things 🤔 (we should probably check in what cases)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmcgowan should we cut "1.x" tag from this repo or change DefaultString() to use Format() and have a new function that uses FormatAll() ?

database.go Outdated Show resolved Hide resolved
platforms_test.go Show resolved Hide resolved
platforms_test.go Show resolved Hide resolved
platforms.go Outdated Show resolved Hide resolved
@kiashok kiashok force-pushed the addFormatParsing branch 2 times, most recently from 220143f to caa7d77 Compare April 8, 2024 19:37
@kiashok
Copy link
Contributor Author

kiashok commented Apr 8, 2024

@thaJeztah review comments addressed. Could you please take a look and sign off if changes look ok? Thanks!

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed there were some stray calls to normalizeOSAndVersion that are no longer needed (but would result in parsing multiple times)

One other comment about the redundant parsing in normalizeOSAndVersion itself (which could save for another regex being executed altogether), but that's less critical.

platforms.go Outdated Show resolved Hide resolved
platforms.go Outdated Show resolved Hide resolved
platforms.go Outdated Show resolved Hide resolved
database.go Outdated Show resolved Hide resolved
@kiashok kiashok requested a review from thaJeztah April 11, 2024 17:14
@kiashok kiashok dismissed thaJeztah’s stale review April 11, 2024 17:15

responded to review comments

@kiashok
Copy link
Contributor Author

kiashok commented Apr 11, 2024

I noticed there were some stray calls to normalizeOSAndVersion that are no longer needed (but would result in parsing multiple times)

One other comment about the redundant parsing in normalizeOSAndVersion itself (which could save for another regex being executed altogether), but that's less critical.

Responded to the same review comment here:
#6 (comment)

@kiashok
Copy link
Contributor Author

kiashok commented Apr 22, 2024

@dmcgowan can we merge these changes?

@dmcgowan
Copy link
Member

@tianon @thaJeztah Any concern that we should address before merging this PR?

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM per windows / docker team acceptance of the new default format

defaults.go Outdated Show resolved Hide resolved
Platform string to be of the form
<os>[(<osversion>)]|<arch>|<os>[(<OSVersion>)]/<arch>[/<variant>]
OSVersion is optional only and currently used only by Windows OS.

Signed-off-by: Kirtana Ashok <kiashok@microsoft.com>
@kiashok
Copy link
Contributor Author

kiashok commented Apr 29, 2024

LGTM per windows / docker team acceptance of the new default format

@dmcgowan @mikebrow Can we merge the PR now?

cc @thaJeztah @TBBle @tianon

@tianon
Copy link
Member

tianon commented Apr 30, 2024

@tianon @thaJeztah Any concern that we should address before merging this PR?

I think the addition to the format is fine and makes sense; I'm still very worried about how it's going to fit into the overall API / flows of containerd itself (and whether we're wrong about the number of applications passing these strings across a process or even system boundary, as in moby/moby#47679 for example).

Without more changes, this will be mostly ignored by even ctr, right? (Since the existing vectors don't check the field?)

If maintainers feel strongly that it's fine, I don't feel so strongly to try to block it, but I'm definitely worried that fixing/implementing this functionality properly is going to require looking at a much bigger swath of code and UX points than just this platforms module and CRI.

@kiashok
Copy link
Contributor Author

kiashok commented Apr 30, 2024

@tianon @thaJeztah Any concern that we should address before merging this PR?

I think the addition to the format is fine and makes sense; I'm still very worried about how it's going to fit into the overall API / flows of containerd itself (and whether we're wrong about the number of applications passing these strings across a process or even system boundary, as in moby/moby#47679 for example).

Without more changes, this will be mostly ignored by even ctr, right? (Since the existing vectors don't check the field?)

If maintainers feel strongly that it's fine, I don't feel so strongly to try to block it, but I'm definitely worried that fixing/implementing this functionality properly is going to require looking at a much bigger swath of code and UX points than just this platforms module and CRI.

We need to vendor in this pkg to containerd for supporting image pull per runtime class (kep 4216) which is being requested from Kata and the windows side as well. After vendor, If ctr specifies a platform with the OS version it will be taken into account. If the caller does not specify the OS version, nothing should change. Also, there a Format() and FormatAll() API that folks can use to pick old vs new behavior while converting ocispec.Platform to a string format.
This also fixes the bug we have today where we are forcing OSVersion to be the defaultSpec()

@dmcgowan
Copy link
Member

@tianon I agree with your concern, we should keep this string from creeping into cross system/process APIs. Its fine for it be used in UX cases (CLI or config files) but the full proto/json value should be used in APIs and storage.

I think we can move forward on this PR though since it is just defining the string. More discussion is still needed on the Moby and containerd side where this string might be (mis)used.

@dmcgowan dmcgowan merged commit c1438e9 into containerd:main Apr 30, 2024
7 checks passed
@thaJeztah
Copy link
Member

Sorry for being late (again); my last comment was a quick one as I was in the middle of a maintainers call where we were discussing these changes; I asked others to also post their input, but I forgot to post a summary. Let me post a brief one of what I recall came up;

  • The latest iteration of changes in this PR looked good, and no real concerns
  • The DefaultString() is the one we found to be a potential foot-gun; we found at least one use (in buildx) where it was potentially used in a client (i.e. running on a machine that may not match the daemon / container environment).
  • We could consider deprecating it (in favor of manually doing FormatAll(DefaultSpec()))
  • Or to rename some functions to be more explicitly indicating "this produces a platform for the local environment"
  • At least we should (in follow-ups) improve GoDoc for these functions to be crystal clear on what it does (and when to do / don't use).
  • We should verify if all versions used produce the correct version (ISTR different permutations of obtaining the version can produce the wrong result on binaries that is not manifested)

That last bullet should probably be looked at more widely; from a quick look during that call, we found various existing code in this module / repository that (sometimes implicitly) sets defaults based on the current environment. We should make sure that those functions clearly document how they should be used, but perhaps also look at more explicit options to pass things (OS-version most notably), possibly deprecating "convenience" functions that do so automatically.

In either case; those can be looked at in follow-ups; this change on its own LGTM (sorry for the delay).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants