New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass down resources to CRI #4113
base: master
Are you sure you want to change the base?
Pass down resources to CRI #4113
Conversation
marquiz
commented
Jun 28, 2023
- One-line PR description: KEP for extending the CRI API to pass down unmodified resource information from the kubelet to the CRI runtime.
- Issue link: Pass down resources to CRI #4112
- Other comments:
Co-authored-by: Antti Kervinen <antti.kervinen@intel.com>
@marquiz: GitHub didn't allow me to request PR reviews from the following users: fidencio. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
+ map<string, k8s.io.apimachinery.pkg.api.resource.Quantity> requests = 2; | ||
+ map<string, k8s.io.apimachinery.pkg.api.resource.Quantity> limits = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the keys here be a special type instead of unstructured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's possible to have smth like type ResourceName string
in protobuf. Please correct me if I'm wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @haircommander, are you satisfied with the reply (close as resolved)?
/retitle Pass down resources to CRI |
@marquiz We need to check how this will work with DRA and CDI devices. If we have enough information to know which devices need to be added to the sandbox just by the resource claim name. |
@marquiz There is already some code for sandbox sizing, accumulation of resources CPU and Memory for reference: kubernetes/kubernetes#104886 that we leverage in Kata, what are the plans for this interface, deprecate or keep it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @marquiz.
It would be good to get more concrete details on the use cases that this would enable.
There is also the question of complex devices that are managed by device plugins where there isn't a clear mapping from the resources
entry (e.g. vendor.com/xpu: 1
) to the resources added to the container, or DRA where the associated resources.requests.claims
entry is not mentioned.
|
||
#### Story 3 | ||
|
||
As a cluster administrator, I want to install an OCI hook/runc wrapper/NRI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on this use case? How does extending the CRI translate to modifications in the OCI runtime specification which is interpreted by runc (or wrappers)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CRI changes (in this KEP) would not directly translate to anything in the OCI config. It's just "informational" that a possible hook/wrapper/plugin can then use to tweak the OCI config. Say you want to do customized cpu pinning in your plugin. I'll come up with some more flesh on this section...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar I updated Story 3, PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved?
requests: | ||
cpu: 100m | ||
memory: 100M | ||
vendor.com/xpu: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarification: This does not indicate the properties of the resource that was actually allocated for a container requesting one of these devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very much true. I think I'll add a note about this in the KEP somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar I added a not about device plugin resources after this example. WDYT?
WindowsPodSandboxConfig windows = 9; | ||
+ | ||
+ // Kubernetes resource spec of the containers in the pod. | ||
+ PodResourceConfig pod_resources = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeZappa87 since you shared recently something along these lines for the networking capabilities, this KEP also means to interface with NRI
Another point to consider is how we're going to integrate or not these enhancements with the new containerd Sandbox API. |
@zvonkok that one is just the native resources and gives the resources in the "obfuscated" form i.e. not telling the actual reqeusts/limits (plus it's for Linux resources only). I think we wouldn't, or even couldn't, touch this, i.e. keep it. |
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860 SQUASH runtimeclass example fabricmanager
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860 SQUASH runtimeclass example fabricmanager
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860 SQUASH runtimeclass example fabricmanager
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
- support sidecar containers: instead of separate lists for init and regular containers, have one list and include the type of container (init, sidecar, regular) - add notes about mounts and devices when describing changes to CreateContainer and UpdateContainerResources requests - update description of kubelet: more accurate description of what information is included in each CRI request - fix typos - kep.yaml: update milestone
Thanks @tallclair for the review
The goal is to not require any changes for existing CRI implementations. The CRI runtime can omit the data if it doesn't need/want to pre-allocate/pre-optimize resources for the pod. The idea is enable a kinda "forward lookup" into the future for those who need it. This probably needs to be better communicated in the proposal (and the API, with comments and naming). Thoughts?
This is a very valid point. The proposal was now changed: instead of separate lists for init and regular containers, it now has one list that contains all containers, each element in the list including the type of container (init, sidecar or regular).
I believe that will cause gray hairs/problems in some scenarios, e.g. in VM sizing and CoCo. For example, how would you aggregate resource limits? Also, you could make better decisions in case that an init container requests a lot of resources wrt. the regular containers. In CoCo knowing exactly what resources each container needs helps implementing the principle of least privileges/smaller attach surface (no sharing of unnecessary mounts between containers for example). Ref e.g.: #4113 (comment) |
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
I pushed an update last week but forgot to leave a comment:
|
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
/assign |
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
With each release make sure we ship a GPU enabled rootfs/initrd Fixes: kata-containers#6554 DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: kata-containers#8860
While sidecars did need to be addressed by the original proposal, it misses the big picture I was trying to raise here: exposing this pod lifecycle information into the container runtime will create friction for future k8s changes with pod lifecycle implications. Now any pod lifecycle change is potentially a breaking change to the runtime, so we need to manage runtime version skew in a way we didn't before. This is why I prefer to keep as much of the lifecycle logic in the Kubelet as we can. |