-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Adding Alpha criteria for KEP to support IPPR & Pod Level Resources #5423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Alpha criteria for KEP to support IPPR & Pod Level Resources #5423
Conversation
Skipping CI for Draft Pull Request. |
4e712ae
to
71e896c
Compare
71e896c
to
c561b22
Compare
/assign @tallclair |
c561b22
to
287a985
Compare
3. Decrease pod-level cgroup (if needed) | ||
4. Increase container resources | ||
|
||
Note: As part of a resize operation, users will now be permitted to add or modify pod-level resources within the Pod specification. Upon successful update, the Kubelet will ensure its internal checkpoint for the Pod is updated to reflect these new resource definitions. Importantly, to optimize performance and prevent redundant operations, the Kubelet will only trigger an actual cgroup resize for the Pod's sandbox if the specified pod-level resources are not equal to the aggregated sum of the individual container-level resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tallclair I couldn't find your comment because I changed the origin of my branch. I tried to address your question about allowing adding pod-level resources as a part of resize. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to allow adding & removing container-level resources. I feel like the use case is pretty weak, so I'm not too worried about this constraint, but it is a minor inconsistency.
/assign @mrunalp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly ok, the two bits that I'd like to get clarified before approving are:
- defaulting
- Resoruces field in PodStatus.
keps/sig-node/5419-pod-level-resources-in-place-resize/kep.yaml
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
@ndixita: Can not set label lead-opted-in: Must be member in one of these teams: [release-team-enhancements release-team-leads sig-api-machinery-leads sig-apps-leads sig-architecture-leads sig-auth-leads sig-autoscaling-leads sig-cli-leads sig-cloud-provider-leads sig-cluster-lifecycle-leads sig-contributor-experience-leads sig-docs-leads sig-instrumentation-leads sig-k8s-infra-leads sig-multicluster-leads sig-network-leads sig-node-leads sig-release-leads sig-scalability-leads sig-scheduling-leads sig-security-leads sig-storage-leads sig-testing-leads sig-windows-leads] In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
287a985
to
0aacd00
Compare
keps/sig-node/5419-pod-level-resources-in-place-resize/kep.yaml
Outdated
Show resolved
Hide resolved
7b5a42b
to
9445a26
Compare
@soltysh added some details in #### Surfacing Pod Resource Requirements to add some clarity. PTAL |
9445a26
to
01f01ff
Compare
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
63f9b28
to
ea2804e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
the PRR
keps/sig-node/5419-pod-level-resources-in-place-resize/kep.yaml
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
3. Decrease pod-level cgroup (if needed) | ||
4. Increase container resources | ||
|
||
Note: As part of a resize operation, users will now be permitted to add or modify pod-level resources within the Pod specification. Upon successful update, the Kubelet will ensure its internal checkpoint for the Pod is updated to reflect these new resource definitions. Importantly, to optimize performance and prevent redundant operations, the Kubelet will only trigger an actual cgroup resize for the Pod's sandbox if the specified pod-level resources are not equal to the aggregated sum of the individual container-level resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to allow adding & removing container-level resources. I feel like the use case is pretty weak, so I'm not too worried about this constraint, but it is a minor inconsistency.
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
controller to inspect running and pending pods to quickly determine their | ||
effective resource requirements. | ||
|
||
* The field will be first populated initlaized in API server defaulting logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a weird consequence of defaulting in the apiserver: if a pod is resized before it is scheduled, then the desired resources will be different. Do we care about this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would only be temporary until it is updated by the kubelet in generateAPIPodStatus
, which presumably should happen quickly right? I think that seems fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should happen quickly in most cases, but if the pod is unschedulable it could be long-lasting. It is also sort of a resize pending/in-progress state, but without the associated condition. The other question it raises is, should the scheduler schedule based on the desired, original, or max resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the alternative? Leaving it empty until the kubelet gets to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should happen quickly in most cases, but if the pod is unschedulable it could be long-lasting.
I think it is better to have the scheduler set PodStatus.Resources after it binds the node to a pod in that case. If a pod is resized before scheduler sets PodStatus.Resources, it will be nil and controllers that need the info about pod requests in that case could compute it from pod spec using the resource helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing the context of the motivation of defaulting this value before it gets to the kubelet, so can't really speak to the consequences of not doing that.
But I think:
Should happen quickly in most cases, but if the pod is unschedulable it could be long-lasting. It is also sort of a resize pending/in-progress state, but without the associated condition.
We're already having conversations about putting the kubelet in the scheduling path to deal with other resize / scheduling race conditions. If we prioritize that for 1.35 and include this in that discussion, the problems here could become much easier to resolve.
The other question it raises is, should the scheduler schedule based on the desired, original, or max resources?
The kubelet admits based on desired and wouldn't see the original, so the scheduler probably would need to do the same?
If a pod is resized before scheduler sets PodStatus.Resources, it will be nil and controllers that need the info about pod requests in that case could compute it from pod spec using the resource helper
If controllers are anyways going to have to handle the case that PodStatus.Resources
could be nil, then I'm not seeing the benefit of having the apiserver or scheduler set PodStatus.Resources
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are converging towards the same thing. If we want to address the other resizes/scheduling race conditions, it makes sense for kubelet to set PodStatus.Resources. The idea behind this field is to allow controllers to inspect scheduled pods to quickly determine their effective resource requirements.
For unscheduled pods, the controllers can rely on components resource helpers to determine the requests from the PodSpec object.
Agree with not much benefit of having scheduler set PodStatus.Resources in that case. It would just help users understand what defaults does API server apply to the PodSpec before it is scheduled in case that's needed, but then there are cons of this wrt to the other resizes/scheduling race conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving PodStatus.Resources
unset for now sounds good. The Kubelet owns this field, so I don't think the scheduler should touch it (especially not after binding to the node). It's really only safe for another component to set it on create, after that the Kubelet should be the only writer.
The motivation for defaulting it is if any components want to use this field instead of the component-helper to calculate pod-level resource totals. If we don't set it in defaulting, that means that it can't be used for admission (e.g. quota controls), or any controllers that run before the pod is started.
ea2804e
to
e30e963
Compare
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
3023a9f
to
3eec5cd
Compare
keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp, natasha41575, ndixita, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3eec5cd
to
c7b276c
Compare
c7b276c
to
0e02825
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Uh oh!
There was an error while loading. Please reload this page.