KEP-4816 update for beta in 1.34 #5261

mortent · 2025-04-27T22:37:44Z

One-line PR description: Update KEP to prepare for beta in 1.34
Issue link: DRA: Prioritized Alternatives in Device Requests #4816
Other comments: Some of the material is already covered in KEP-4381: DRA Structure Parameters, so that KEP is referenced in some places.

mortent · 2025-04-28T15:22:20Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

-
-Scheduling a claim that uses this feature may take a bit longer, if it is
-necessary to go deeper into the list of alternative options before finding a
-suitable device. We can measure this impact in alpha.


I'm not sure if we can easily measure this, as it largely depends on what the structure of the ResourceClaim.

But in general each subrequest requires the same amount of work as a regular request, so a request with two subrequest will in the worst-case take about twice as long as just a single request. The maximum number of subrequests for each request is 8, so in the worst case, where none of the eight subrequests succeed, it would take 8 times longer than just a normal request.

Since the subrequests are tried in priority order, the extra work is only needed in situations where the first subrequests can be satisfied, so a situation where using just a single request would have failed to allocate devices for the request.

keps/sig-scheduling/4816-dra-prioritized-list/README.md

BenTheElder · 2025-06-04T19:50:21Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

+without this feature. However, the references to the original request in the
+`DeviceRequestAllocationResult.Request` field will be on the format
+`<main request>/<subrequest>` for `DeviceRequest`s using the feature. The
+driver must be updated to understand this format in order to correctly


This seems like a risk, is there any way we could version this?

At some point as DRA goes into beta we probably want to maintain skew support for the drivers? I would want to be able to upgrade Kubernetes without worrying about the drivers failing immediately.

cc @aojea

Sorry for the nitpicking but

At some point as DRA goes into beta

I assume we mean as DRA goes into GA? (it is already in beta)

I assume we mean as DRA goes into GA? (it is already in beta)

Well ... parts of DRA are beta, there are many parts in alpha (like this KEP).

I think the overall point stands that we should be avoiding breakage by now?

And no, not as it goes to GA, GA should only be a rename and additional stability commitment, see #5242

https://groups.google.com/a/kubernetes.io/g/dev/c/thuIea5aYtw/m/YYcOY2dhEwAJ

Kubernetes production clusters are trusted to be stable by our users. Beta is our state where features are accessible for usage in production clusters, with a feature gate to disable the feature if it malfunctions. This is “GA-quality without field feedback”. We have updated the KEP template to help guide future KEPs (ref) by explicitly indicating that beta must be complete and promotion from beta to GA must have no significant change for the release. PRR reviewers will check for these criteria during the approval process and help contributors and sig leads explicitly handle promotion criteria.

There are no changes to the format of these references for ResourceClaims that doesn't use the PrioritizedList feature. So enabling this feature does not affect any existing claims, regardless of whether the driver is upgraded to support the feature.

What this means is that:

The user needs to make sure that the feature is enabled on the cluster and that the drivers they are running do support this feature. If they don't plan to use the feature, no action is needed.

If the driver on a node is rolled back to a version that doesn't support the feature, pods that references claim using the feature will not run on the node.

I think this is expected behavior and it will be the case for any new DRA feature that require changes to the drivers.

ACK.

If the driver on a node is rolled back to a version that doesn't support the feature, pods that references claim using the feature will not run on the node.

Assuming the driver support and kubelet level are the same? Or some other mechanism?

Beyond the scope of this KEP, I think we should be thinking about how DRA drivers evolve safely with new capabilities.

For example, CNI config is versioned so implementations advertise which version they're using when writing the config (which is read by the CRI).

This reminds me of discussions with @tallclair about node/kubelet capabilities <> scheduler following in-place resize.

Assuming the driver support and kubelet level are the same? Or some other mechanism?

The kubelet has no changes related to this feature, so a rollback of the kubelet will not matter. Obviously, the kubelet needs to be at a version that supports DRA (1.32+) and have the DRA feature enabled for anything DRA to work.

The driver needs to support this feature, otherwise it will not be able to understand the reference to the selected DeviceRequest in the ResourceClaim status.

I agree that some way to express capabilities supported by drivers would be useful here. At the very least so it is possible for users to determine which features of DRA a specific version of a driver supports. We will have similar questions to this for many/most of the other DRA features that are in-flight, as many of them will require support in the drivers.
Not sure if we want drivers to advertise capabilities in a similar way to what is being discussed about the kubelet, but it seems like we have a very similar challenge for drivers.

keps/sig-scheduling/4816-dra-prioritized-list/README.md

johnbelamaric

We discussed adding a timeout for DRA scheduling for this feature - or perhaps just in general. Is that part of #4381 or this KEP?

keps/sig-scheduling/4816-dra-prioritized-list/README.md

mortent · 2025-06-07T15:51:47Z

We discussed adding a timeout for DRA scheduling for this feature - or perhaps just in general. Is that part of #4381 or this KEP?

That is not specific to the PrioritizedList feature, so I think we should handle that as part of #4381

mortent · 2025-06-09T21:46:57Z

/wg device-management

deads2k

PRR metrics addition requested. PRR lgtm otherwise.

keps/sig-scheduling/4816-dra-prioritized-list/README.md

mortent · 2025-06-12T16:42:04Z

PRR metrics addition requested. PRR lgtm otherwise.

Thanks! @deads2k can I add you as the PRR reviewer for this change since John is OOO?

keps/sig-scheduling/4816-dra-prioritized-list/README.md

macsko · 2025-06-16T10:49:43Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

@@ -1010,7 +1017,6 @@ ensure they are handled by the scheduler as described in this KEP.
 #### Beta

 - Gather feedback
- Implement node scoring
 - Evaluate feasibilty of cluster auto scaler implementation


Was it evaluated?

This is something we want to do and it is tracked in #4970. It was also discussed in #5261 (comment). It is something we want to do, but it is not on the roadmap for this year.

I meant the "Evaluate feasibilty of cluster auto scaler implementation" point

Yeah, I don't think we have any concerns about the cluster autoscaler integration with this feature, but I will verify. Good catch.

Please state it explicitly in the KEP before merging it

@mortent this still seems to be outstanding

SIG-Autoscaling has confirmed we don't have any concern about this and has approved this PR. I've updated the KEP.

macsko · 2025-06-16T10:51:59Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

+One indicator are unexpected restarts of the cluster control plane components
+(kube-scheduler, apiserver, kube-controller-manager).
+
+If the `scheduler_pending_pods` metric in the kube-scheduler suddenly increases,


Or when it's constant, i.e. no pods are being scheduled?

Yeah, I've updated the language to clarify.

deads2k · 2025-06-16T17:01:06Z

PRR lgtm. Approving for PRR only

/approve

macsko · 2025-06-18T18:12:13Z

/approve as SIG Scheduling

/hold for #5261 (comment)

jackfrancis · 2025-06-20T16:07:03Z

@haircommander @mortent I'll take a look for SIG Autoscaling shortly, stand by

jackfrancis · 2025-06-20T17:02:24Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

-rollout. Similarly, consider large clusters and how enablement/disablement
-will rollout across nodes.
-->
+Workloads that doesn't use the Prioritized List feature should not be impacted,


This is the key cluster-autoscaler story for the near term (CA + DRA doesn't currently implement prioritized alternatives features). We can make a point to identify non-implemented 1.34 DRA features when CA ships its 1.34 equivalent release to ensure that CA customers understand how they can use DRA. (and we will follow that pattern going forward as new DRA functionality arrives and CA implements support for them in its releases)

Thanks. And just to confirm, there is nothing about this feature that we think will be problematic to support in the Cluster Autoscaler and therefore should stop us from taking this to beta?

I think the problematic part to evaluate was how to solve the scoring part in Cluster Autoscaler. The node-local part of choosing the most optimal device on a given Node should work out of the box with the vendored DRA scheduler plugin.

So IMO we should revisit the feasibility when scoring is in scope. Unless we have a more generic CA scoring integration in place by then, we can probably write a CA Expander that would prefer the node groups with more optimal devices, right @jackfrancis?

jackfrancis · 2025-06-20T17:04:34Z

/lgtm
/approve for SIG Autoscaling

k8s-ci-robot · 2025-06-20T17:04:41Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jackfrancis, macsko, mortent

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
~~keps/sig-scheduling/OWNERS~~ [macsko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

haircommander · 2025-06-20T18:00:46Z

/lgtm
/hold

awaiting one confirmation from @jackfrancis #5261 (comment) but I think we can tag this. Jack, would you unhold when you confirm?

jackfrancis · 2025-06-20T18:02:58Z

/lgtm

jackfrancis · 2025-06-20T18:12:29Z

/hold cancel

thank your @mortent @haircommander and everyone 🚀

k8s-ci-robot added cncf-cla: yes kind/kep labels Apr 27, 2025

k8s-ci-robot requested review from dom4ha and jeremyrickard April 27, 2025 22:37

k8s-ci-robot added the sig/scheduling label Apr 27, 2025

github-project-automation bot added this to SIG Scheduling Apr 27, 2025

github-project-automation bot moved this to Needs Triage in SIG Scheduling Apr 27, 2025

k8s-ci-robot added the size/L label Apr 27, 2025

mortent commented Apr 28, 2025

View reviewed changes

jackfrancis reviewed Apr 28, 2025

View reviewed changes

keps/sig-scheduling/4816-dra-prioritized-list/README.md Show resolved Hide resolved

BenTheElder mentioned this pull request Jun 4, 2025

DRA: Prioritized Alternatives in Device Requests #4816

Open

8 tasks

BenTheElder reviewed Jun 4, 2025

View reviewed changes

johnbelamaric reviewed Jun 6, 2025

View reviewed changes

keps/sig-scheduling/4816-dra-prioritized-list/README.md Outdated Show resolved Hide resolved

keps/sig-scheduling/4816-dra-prioritized-list/README.md Show resolved Hide resolved

k8s-ci-robot added the wg/device-management label Jun 9, 2025

github-project-automation bot added this to Dynamic Resource Allocation Jun 9, 2025

github-project-automation bot moved this to 🆕 New in Dynamic Resource Allocation Jun 9, 2025

pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Jun 10, 2025

deads2k reviewed Jun 11, 2025

View reviewed changes

keps/sig-scheduling/4816-dra-prioritized-list/README.md Show resolved Hide resolved

dom4ha reviewed Jun 13, 2025

View reviewed changes

keps/sig-scheduling/4816-dra-prioritized-list/README.md Outdated Show resolved Hide resolved

keps/sig-scheduling/4816-dra-prioritized-list/README.md Outdated Show resolved Hide resolved

macsko reviewed Jun 16, 2025

View reviewed changes

k8s-ci-robot added do-not-merge/hold approved labels Jun 18, 2025

jackfrancis reviewed Jun 20, 2025

View reviewed changes

k8s-ci-robot assigned jackfrancis Jun 20, 2025

k8s-ci-robot added the lgtm label Jun 20, 2025

[WIP] Update KEP-4816 for beta

d6bbba9

mortent force-pushed the Update4816ForBeta branch from 17fe043 to d6bbba9 Compare June 20, 2025 17:47

k8s-ci-robot removed the lgtm label Jun 20, 2025

k8s-ci-robot assigned haircommander Jun 20, 2025

k8s-ci-robot added the lgtm label Jun 20, 2025

k8s-ci-robot removed the do-not-merge/hold label Jun 20, 2025

k8s-ci-robot merged commit 2197b2e into kubernetes:master Jun 20, 2025
3 of 4 checks passed

k8s-ci-robot added this to the v1.34 milestone Jun 20, 2025

github-project-automation bot moved this from Needs Triage to Done in SIG Scheduling Jun 20, 2025

pohly moved this from 👀 In review to ✅ Done in Dynamic Resource Allocation Jun 23, 2025

KEP-4816 update for beta in 1.34 #5261

KEP-4816 update for beta in 1.34 #5261

Uh oh!

Conversation

mortent commented Apr 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnbelamaric left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mortent commented Jun 7, 2025

Uh oh!

mortent commented Jun 9, 2025

Uh oh!

deads2k left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mortent commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deads2k commented Jun 16, 2025

Uh oh!

macsko commented Jun 18, 2025

Uh oh!

jackfrancis commented Jun 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackfrancis commented Jun 20, 2025

Uh oh!

k8s-ci-robot commented Jun 20, 2025

Uh oh!