Skip to content

kep-3695-beta update #5346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 19, 2025
Merged

Conversation

guptaNswati
Copy link
Contributor

/wg device-management
/assign @moshe010
/assign @klueska
/assign @@johnbelamaric for PRR

Signed-off-by: Swati Gupta <swatig@nvidia.com>
@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label May 27, 2025
Copy link

linux-foundation-easycla bot commented May 27, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @guptaNswati!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 27, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @guptaNswati. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 27, 2025
Co-authored-by: Kevin Klues <klueska@gmail.com>
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation May 28, 2025
@kannon92
Copy link
Contributor

kannon92 commented Jun 2, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 2, 2025
Signed-off-by: Swati Gupta <swatig@nvidia.com>
@ffromani
Copy link
Contributor

@SergeyKanzhelev

Please update to the latest template according to #3695 (comment)

It is using the latest template. See KEP readme using the [latest template](https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template) has been merged into the k/enhancements repo.

Sorry, i am confused if you are referring to something else.

You need to check the KEP layout of this PR matches the layout in the latest KEP template available. When folks update the KEP template some sections may move/be created/merged/removed, some new PRR questions can arise, and so forth

@SergeyKanzhelev
Copy link
Member

As @ffromani mentioned some template sections might have moved or were added. I am relying on this comment: #3695 (comment) that tells the the KEP readme is not following the last template.

@guptaNswati
Copy link
Contributor Author

Got it. thanks both. I see i updated the implementation history but that is within the format.

- Kubernetes 1.27: Alpha version of the KEP.

- Kubernetes 1.34: Beta version of the KEP.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple questions not answered, which are requirement for beta promotion.

@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier in the doc:

  1. Please make sure to check appropriate boxes in the ## Release Signoff Checklist.
  2. Missing links in the integration tests section, see template, and in the e2e section as well, see template. Either of the two is required for beta promotion, and it looks like you had a requirement for e2e during alpha, so I expect those to be completed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please make sure to check appropriate boxes in the ## Release Signoff Checklist.

This was addressed - thank you.

  • Missing links in the integration tests section, see template, and in the e2e section as well, see template. Either of the two is required for beta promotion, and it looks like you had a requirement for e2e during alpha, so I expect those to be completed.

This one still holds. We need links for integration and e2e based on the template in the appropriate section. I believe e2es were added in kubernetes/kubernetes#116846 so you should be able to quickly fill those in. Not sure if there are others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still holds, I see Francesco mentioned several tests that were added, can we make sure they are explicitly linked in this document?

Signed-off-by: Swati Gupta <swatig@nvidia.com>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 16, 2025
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sig-node review: provisional LGTM. Two outstanding items from my POV.

  1. (nonblocking, in scope for Beta, "just" extra work) turns out the CI config is obsolete, and the podresources api tests run incidentally. We should add them explicitly (like we do for example for cpumanager) and we should have a presubmit lane (like we do for example for pod level resources). I think it is unfair to account all this work into this KEP, but some of it make sense to be done in the context of the beta graduation. Already begun, PRs linked. We may need to fix or add some e2e tests, but I think this is totally fair in the context of the beta graduation.
  2. (blocking) clarify the Get semantics as discussed. Because of how the API works, there is a TOCTOU unavoidable issue. There is no atomicity or consistency guarantee between List or Get, nor a concept of a session (which I would push back, btw). So it is possible, albeit probably unlikely, that Get is called with stale information (e.g pod becomes terminated between calls). We need to clarify and document the error paths

Once the above are addressed, LGTM.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of my previous comments still hold, and are blocking to merging this KEP.

@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please make sure to check appropriate boxes in the ## Release Signoff Checklist.

This was addressed - thank you.

  • Missing links in the integration tests section, see template, and in the e2e section as well, see template. Either of the two is required for beta promotion, and it looks like you had a requirement for e2e during alpha, so I expect those to be completed.

This one still holds. We need links for integration and e2e based on the template in the appropriate section. I believe e2es were added in kubernetes/kubernetes#116846 so you should be able to quickly fill those in. Not sure if there are others.

@@ -333,7 +334,7 @@ The API becomes available again. The API is stateless, so no recovery is needed,

###### Are there any tests for feature enablement/disablement?

e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code.
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. (https://github.com/kubernetes/kubernetes/pull/116846)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked PR isn't testing feature enablement/disablement, or am I misreading it? The closest place where you test this feature gate is https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/podresources/server_v1_test.go but there you only turn this on, but I don't see the requested on/off test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a on/off test scattered across the existing tests: https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.1/test/e2e_node/podresources_test.go#L977 and https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.1/test/e2e_node/podresources_test.go#L1066
We can use a PR to make the tests more explicit and some changes are needed if the FG goes to default on: the FG status should be set explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, can we make sure this is listed here?

Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, we need one more update please

@guptaNswati
Copy link
Contributor Author

@soltysh PTAL, i tried to answers these #5346 (comment).

If these are satisfactory, then the only pending items:

  • adding e2e test link. I am still not sure what is an acceptable answer for this which unblocks this PR.
  • update Get() based on @ffromani feedback.

Signed-off-by: Swati Gupta <swatig@nvidia.com>
Signed-off-by: Swati Gupta <swatig@nvidia.com>
@guptaNswati guptaNswati force-pushed the kep-3695-beta-update branch from a804a04 to ae728ef Compare June 19, 2025 06:36
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

sig-node review: my comment about Get was addressed. We may need to review for the nect cycle (beta2? GA?), but the current changes are sufficient.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2025
@ffromani
Copy link
Contributor

my 2c about e2e tests (k/k/test/e2e_node/podresources_test.go):

  • we have FG on/off test dispersed among others, should be clear and have its own block (Context in ginkgo lingo)
  • I would like to see more tests for Get to cover e.g. multi-container pods, pods not having exclusive resources
  • I would like tests to exercise the expected happy path (List + Get on each returned pod)
  • I would like to see a test which compares the output of List and Get in the happy path
  • I would like more tests to exercise both the error paths, e.g. pod found but bogus container; asking about terminated pod.

My PR kubernetes/kubernetes#132028 does a bit of cleanup in the e2e tests so it can be wise to wait for it to merge (it's a fix we need anyway) and base the work on top of it

Fixing the test-infra lane to better handle the e2e tests is already ongoing: kubernetes/kubernetes#132345 and can proceed in parallel

My understanding is that doing all the above is fair in the same cycle on which we attempt to graduate to beta.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still answers missing inside the document itself, but since they were provided as comments, I'm willing to conditionally approve as is. Please make sure to either add the missing links before merging this PR, or shortly after.

/approve
the PRR section

@@ -333,7 +334,7 @@ The API becomes available again. The API is stateless, so no recovery is needed,

###### Are there any tests for feature enablement/disablement?

e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code.
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. (https://github.com/kubernetes/kubernetes/pull/116846)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, can we make sure this is listed here?

@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still holds, I see Francesco mentioned several tests that were added, can we make sure they are explicitly linked in this document?

@klueska
Copy link
Contributor

klueska commented Jun 19, 2025

There are still answers missing inside the document itself, but since they were provided as comments, I'm willing to conditionally approve as is. Please make sure to either add the missing links before merging this PR, or shortly after.

/approve

the PRR section

Thanks @soltysh. Given the holiday in the US today I don't think @guptaNswati will have time to make the update before tomorrow. I will work with her early next week to ensure all of the information in the comments gets incorporated.

@mrunalp, @haircommander, @SergeyKanzhelev can we please get a sig-node approval to move this forward.

Thanks!

@ffromani
Copy link
Contributor

@mrunalp, @haircommander, @SergeyKanzhelev can we please get a sig-node approval to move this forward.

I think they are all based in the US. I'd be supporting (FWIW) an exception if the KEP slips just because there is no time left for an approval review because of the holidays.

Copy link
Contributor

@mrunalp mrunalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on @klueska review.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guptaNswati, mrunalp, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2025
@k8s-ci-robot k8s-ci-robot merged commit be276fb into kubernetes:master Jun 19, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 19, 2025
@guptaNswati
Copy link
Contributor Author

Thank all you for getting this merged. I have opened a follow-up PR to address the comments #5430

@pohly pohly moved this from 👀 In review to ✅ Done in Dynamic Resource Allocation Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.