Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass down resources to CRI #4113

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

marquiz
Copy link
Contributor

@marquiz marquiz commented Jun 28, 2023

  • One-line PR description: KEP for extending the CRI API to pass down unmodified resource information from the kubelet to the CRI runtime.
  • Other comments:

Co-authored-by: Antti Kervinen <antti.kervinen@intel.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jun 28, 2023
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 28, 2023
@marquiz
Copy link
Contributor Author

marquiz commented Jun 28, 2023

@k8s-ci-robot
Copy link
Contributor

@marquiz: GitHub didn't allow me to request PR reviews from the following users: fidencio.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @haircommander @mikebrow @zvonkok @fidencio @kad

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines 289 to 290
+ map<string, k8s.io.apimachinery.pkg.api.resource.Quantity> requests = 2;
+ map<string, k8s.io.apimachinery.pkg.api.resource.Quantity> limits = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the keys here be a special type instead of unstructured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to have smth like type ResourceName string in protobuf. Please correct me if I'm wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @haircommander, are you satisfied with the reply (close as resolved)?

@marquiz marquiz mentioned this pull request Jul 18, 2023
4 tasks
@marquiz
Copy link
Contributor Author

marquiz commented Jul 18, 2023

/retitle Pass down resources to CRI

@k8s-ci-robot k8s-ci-robot changed the title KEP: Initial version of the Pass down resources to CRI Pass down resources to CRI Jul 18, 2023
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Jul 20, 2023
@zvonkok
Copy link

zvonkok commented Aug 2, 2023

@marquiz We need to check how this will work with DRA and CDI devices. If we have enough information to know which devices need to be added to the sandbox just by the resource claim name.

@zvonkok
Copy link

zvonkok commented Aug 2, 2023

@marquiz There is already some code for sandbox sizing, accumulation of resources CPU and Memory for reference: kubernetes/kubernetes#104886 that we leverage in Kata, what are the plans for this interface, deprecate or keep it?

@zvonkok
Copy link

zvonkok commented Aug 2, 2023

@bergwolf @egernst FYI

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marquiz.

It would be good to get more concrete details on the use cases that this would enable.
There is also the question of complex devices that are managed by device plugins where there isn't a clear mapping from the resources entry (e.g. vendor.com/xpu: 1) to the resources added to the container, or DRA where the associated resources.requests.claims entry is not mentioned.


#### Story 3

As a cluster administrator, I want to install an OCI hook/runc wrapper/NRI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on this use case? How does extending the CRI translate to modifications in the OCI runtime specification which is interpreted by runc (or wrappers)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRI changes (in this KEP) would not directly translate to anything in the OCI config. It's just "informational" that a possible hook/wrapper/plugin can then use to tweak the OCI config. Say you want to do customized cpu pinning in your plugin. I'll come up with some more flesh on this section...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar I updated Story 3, PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved?

requests:
cpu: 100m
memory: 100M
vendor.com/xpu: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification: This does not indicate the properties of the resource that was actually allocated for a container requesting one of these devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very much true. I think I'll add a note about this in the KEP somewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar I added a not about device plugin resources after this example. WDYT?

keps/sig-node/4112-passdown-resources-to-cri/README.md Outdated Show resolved Hide resolved
WindowsPodSandboxConfig windows = 9;
+
+ // Kubernetes resource spec of the containers in the pod.
+ PodResourceConfig pod_resources = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeZappa87 since you shared recently something along these lines for the networking capabilities, this KEP also means to interface with NRI

@zvonkok
Copy link

zvonkok commented Aug 3, 2023

Another point to consider is how we're going to integrate or not these enhancements with the new containerd Sandbox API.

@marquiz
Copy link
Contributor Author

marquiz commented Aug 3, 2023

There is already some code for sandbox sizing, accumulation of resources CPU and Memory for reference: kubernetes/kubernetes#104886 that we leverage in Kata, what are the plans for this interface, deprecate or keep it?

@zvonkok that one is just the native resources and gives the resources in the "obfuscated" form i.e. not telling the actual reqeusts/limits (plus it's for Linux resources only). I think we wouldn't, or even couldn't, touch this, i.e. keep it.

@zvonkok
Copy link

zvonkok commented Aug 3, 2023

@zvonkok
Copy link

zvonkok commented Sep 1, 2023

Since the DevicePlugin API supports CDI devices with this KEP: #4011 we should try to add more restrictions and requirements how we want to design this passthrough interface. @marquiz FYI

zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Mar 12, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860

 SQUASH

runtimeclass example

fabricmanager
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Mar 14, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860

 SQUASH

runtimeclass example

fabricmanager
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Mar 15, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860

 SQUASH

runtimeclass example

fabricmanager
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 5, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 9, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
- support sidecar containers: instead of separate lists for init and
  regular containers, have one list and include the type of container
  (init, sidecar, regular)
- add notes about mounts and devices when describing changes to
  CreateContainer and UpdateContainerResources requests
- update description of kubelet: more accurate description of what
  information is included in each CRI request
- fix typos
- kep.yaml: update milestone
@marquiz
Copy link
Contributor Author

marquiz commented Apr 9, 2024

Thanks @tallclair for the review

I'm concerned that this KEP as is forces the container runtime to reimplement too much of the container lifecycle, which in the best case puts a burden on CRI implementation maintainers, and in the worst case could slow down future Kubernetes feature development.

The goal is to not require any changes for existing CRI implementations. The CRI runtime can omit the data if it doesn't need/want to pre-allocate/pre-optimize resources for the pod. The idea is enable a kinda "forward lookup" into the future for those who need it. This probably needs to be better communicated in the proposal (and the API, with comments and naming). Thoughts?

For example, the proposed API separates out init containers & regular containers, but sidecar containers blur those lines. Now, calculating the maximum resource requirements for the pod involves accounting for sidecar containers: https://github.com/kubernetes/kubernetes/blob/4a4f5dbc079e85e63f62178af962cb65bd60d987/pkg/api/v1/resource/helpers.go#L50. I don't think we should treat this as a 1-off change.

This is a very valid point. The proposal was now changed: instead of separate lists for init and regular containers, it now has one list that contains all containers, each element in the list including the type of container (init, sidecar or regular).

Why not have the Kubelet create a pod-level aggregated view of the resources? Similar to what is already done with the sandbox annotations, but without translating to the platform-specific types?

I believe that will cause gray hairs/problems in some scenarios, e.g. in VM sizing and CoCo. For example, how would you aggregate resource limits? Also, you could make better decisions in case that an init container requests a lot of resources wrt. the regular containers. In CoCo knowing exactly what resources each container needs helps implementing the principle of least privileges/smaller attach surface (no sharing of unnecessary mounts between containers for example).

Ref e.g.: #4113 (comment)

zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 10, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 10, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 11, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 11, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 11, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 12, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 15, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 15, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 15, 2024
In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 16, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
@marquiz
Copy link
Contributor Author

marquiz commented Apr 16, 2024

I pushed an update last week but forgot to leave a comment:

  • sidecar containers: instead of separate lists for init and regular containers, have one list and include the type of container (init, sidecar, regular)
  • add notes about mounts and devices where describing changes to CreateContainer and UpdateContainerResources requests
  • update description of kubelet changes: more accurate description of what information is included in each CRI request
  • fix typos
  • kep.yaml: updated milestone to v1.31

zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 16, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 18, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
@tallclair
Copy link
Member

/assign

zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 29, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request Apr 29, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request May 2, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request May 2, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
zvonkok added a commit to zvonkok/kata-containers that referenced this pull request May 2, 2024
With each release make sure we ship a GPU enabled rootfs/initrd

Fixes: kata-containers#6554

DependsOn: kata-containers#6664 kata-containers#6595 kata-containers#6993 kata-containers#6949

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

gpu: reintroduce pcie_root_port and add pcie_switch_port

In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: kata-containers#8860
@tallclair
Copy link
Member

For example, the proposed API separates out init containers & regular containers, but sidecar containers blur those lines. Now, calculating the maximum resource requirements for the pod involves accounting for sidecar containers: https://github.com/kubernetes/kubernetes/blob/4a4f5dbc079e85e63f62178af962cb65bd60d987/pkg/api/v1/resource/helpers.go#L50. I don't think we should treat this as a 1-off change.

This is a very valid point. The proposal was now changed: instead of separate lists for init and regular containers, it now has one list that contains all containers, each element in the list including the type of container (init, sidecar or regular).

While sidecars did need to be addressed by the original proposal, it misses the big picture I was trying to raise here: exposing this pod lifecycle information into the container runtime will create friction for future k8s changes with pod lifecycle implications. Now any pod lifecycle change is potentially a breaking change to the runtime, so we need to manage runtime version skew in a way we didn't before. This is why I prefer to keep as much of the lifecycle logic in the Kubelet as we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status
Projects
SIG Node PR Triage
Needs Reviewer
Development

Successfully merging this pull request may close these issues.

None yet