-
Notifications
You must be signed in to change notification settings - Fork 1.5k
kep-3695-beta update #5346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kep-3695-beta update #5346
Conversation
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Welcome @guptaNswati! |
Hi @guptaNswati. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Co-authored-by: Kevin Klues <klueska@gmail.com>
/ok-to-test |
Signed-off-by: Swati Gupta <swatig@nvidia.com>
You need to check the KEP layout of this PR matches the layout in the latest KEP template available. When folks update the KEP template some sections may move/be created/merged/removed, some new PRR questions can arise, and so forth |
As @ffromani mentioned some template sections might have moved or were added. I am relying on this comment: #3695 (comment) that tells the the KEP readme is not following the last template. |
Got it. thanks both. I see i updated the implementation history but that is within the format.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple questions not answered, which are requirement for beta promotion.
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier in the doc:
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
. - Missing links in the integration tests section, see template, and in the e2e section as well, see template. Either of the two is required for beta promotion, and it looks like you had a requirement for e2e during alpha, so I expect those to be completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
.
This was addressed - thank you.
This one still holds. We need links for integration and e2e based on the template in the appropriate section. I believe e2es were added in kubernetes/kubernetes#116846 so you should be able to quickly fill those in. Not sure if there are others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still holds, I see Francesco mentioned several tests that were added, can we make sure they are explicitly linked in this document?
Signed-off-by: Swati Gupta <swatig@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sig-node review: provisional LGTM. Two outstanding items from my POV.
- (nonblocking, in scope for Beta, "just" extra work) turns out the CI config is obsolete, and the podresources api tests run incidentally. We should add them explicitly (like we do for example for cpumanager) and we should have a presubmit lane (like we do for example for pod level resources). I think it is unfair to account all this work into this KEP, but some of it make sense to be done in the context of the beta graduation. Already begun, PRs linked. We may need to fix or add some e2e tests, but I think this is totally fair in the context of the beta graduation.
- (blocking) clarify the
Get
semantics as discussed. Because of how the API works, there is a TOCTOU unavoidable issue. There is no atomicity or consistency guarantee betweenList
orGet
, nor a concept of a session (which I would push back, btw). So it is possible, albeit probably unlikely, thatGet
is called with stale information (e.g pod becomes terminated between calls). We need to clarify and document the error paths
Once the above are addressed, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of my previous comments still hold, and are blocking to merging this KEP.
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
.
This was addressed - thank you.
This one still holds. We need links for integration and e2e based on the template in the appropriate section. I believe e2es were added in kubernetes/kubernetes#116846 so you should be able to quickly fill those in. Not sure if there are others.
@@ -333,7 +334,7 @@ The API becomes available again. The API is stateless, so no recovery is needed, | |||
|
|||
###### Are there any tests for feature enablement/disablement? | |||
|
|||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. | |||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. (https://github.com/kubernetes/kubernetes/pull/116846) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked PR isn't testing feature enablement/disablement, or am I misreading it? The closest place where you test this feature gate is https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/podresources/server_v1_test.go but there you only turn this on, but I don't see the requested on/off test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a on/off test scattered across the existing tests: https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.1/test/e2e_node/podresources_test.go#L977 and https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.1/test/e2e_node/podresources_test.go#L1066
We can use a PR to make the tests more explicit and some changes are needed if the FG goes to default on: the FG status should be set explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, can we make sure this is listed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, we need one more update please
@soltysh PTAL, i tried to answers these #5346 (comment). If these are satisfactory, then the only pending items:
|
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Signed-off-by: Swati Gupta <swatig@nvidia.com>
a804a04
to
ae728ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
sig-node review: my comment about Get
was addressed. We may need to review for the nect cycle (beta2? GA?), but the current changes are sufficient.
my 2c about e2e tests (k/k/test/e2e_node/podresources_test.go):
My PR kubernetes/kubernetes#132028 does a bit of cleanup in the e2e tests so it can be wise to wait for it to merge (it's a fix we need anyway) and base the work on top of it Fixing the test-infra lane to better handle the e2e tests is already ongoing: kubernetes/kubernetes#132345 and can proceed in parallel My understanding is that doing all the above is fair in the same cycle on which we attempt to graduate to beta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still answers missing inside the document itself, but since they were provided as comments, I'm willing to conditionally approve as is. Please make sure to either add the missing links before merging this PR, or shortly after.
/approve
the PRR section
@@ -333,7 +334,7 @@ The API becomes available again. The API is stateless, so no recovery is needed, | |||
|
|||
###### Are there any tests for feature enablement/disablement? | |||
|
|||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. | |||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. (https://github.com/kubernetes/kubernetes/pull/116846) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, can we make sure this is listed here?
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still holds, I see Francesco mentioned several tests that were added, can we make sure they are explicitly linked in this document?
Thanks @soltysh. Given the holiday in the US today I don't think @guptaNswati will have time to make the update before tomorrow. I will work with her early next week to ensure all of the information in the comments gets incorporated. @mrunalp, @haircommander, @SergeyKanzhelev can we please get a sig-node approval to move this forward. Thanks! |
I think they are all based in the US. I'd be supporting (FWIW) an exception if the KEP slips just because there is no time left for an approval review because of the holidays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on @klueska review.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: guptaNswati, mrunalp, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank all you for getting this merged. I have opened a follow-up PR to address the comments #5430 |
One-line PR description: Update KEP to prepare for beta in 1.34
Issue link: DRA: Extend PodResources to include resources from Dynamic Resource Allocation #3695
/wg device-management
/assign @moshe010
/assign @klueska
/assign @@johnbelamaric for PRR