-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-4033: graduate to GA #5370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4033: graduate to GA #5370
Conversation
marquiz
commented
Jun 4, 2025
- One-line PR description: graduate KEP-4033 to GA
- Issue link: Discover cgroup driver from CRI #4033
- Other comments:
<!-- | ||
Describe the consequences on existing workloads (e.g., if this is a runtime | ||
feature, can it break the existing applications?). | ||
In GA, the fallback behavior (and the kubelet `--cgroup-driver` flag) is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we require a few releases before removing the flag as per deprecation policy. Please update. (The flag can do nothing, it just should be around marked as deprecated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more than a few places within our own repos that are still setting this flag
https://cs.k8s.io/?q=cgroup(.)%3Fdriver&i=nope&literal=nope&files=&excludeFiles=&repos=
And then users may be opting to configure it indirectly, e.g. through kubeadm or kops configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check, I updated the PR to align with the official deprecation policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now changed so that the fallback behavior is kept as long as the cmdline flag and config option.
message, e.g.: | ||
|
||
``` | ||
cgroupDriver option has been deprecated and will be dropped in a future release. Please upgrade to a CRI implementation that supports cgroup-driver detection. | ||
``` | ||
|
||
The `--cgroup-driver` flag and the cgroupDriver configuration option will be | ||
deprecated and have no effect when support for the feature is graduated to GA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation behind making CRI auto-detection a hard requirement and ignoring explicit user configuration (which would enable using older runtimes for example?), the flag doesn't seem like a lot of code to retain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 same question. seems to add risk with little benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now changed so that the fallback behavior is kept as long as the cmdline flag and config option. These will be marked as deprecated and will eventually be removed as per the K8s deprecation policy.
flag). Further, the kubeletConfig field and `--cgroup-driver` flag will be | ||
marked as deprecated, to be dropped when support for the feature is adopted by | ||
CRI-O and containerd. Usage of the deprecated setting will produce a log | ||
flag). For beta, the kubeletConfig field and `--cgroup-driver` flag will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"will be" or "have been"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were not truly deprecated in beta (except for the log message saying/lying so). Now changed the wording to
In beta, resorting to the fallback behavior will produce a log message like:
message, e.g.: | ||
|
||
``` | ||
cgroupDriver option has been deprecated and will be dropped in a future release. Please upgrade to a CRI implementation that supports cgroup-driver detection. | ||
``` | ||
|
||
The `--cgroup-driver` flag and the cgroupDriver configuration option will be | ||
deprecated and have no effect when support for the feature is graduated to GA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 same question. seems to add risk with little benefit?
You can take a look at one potential example of such test in: | ||
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282 | ||
--> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably the line below can now reference those tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removed block was a comment block and not related to this KEP or feature.
checking if there are objects with field X set) may be a last resort. Avoid | ||
logs or events for this purpose. | ||
--> | ||
|
||
Kubelet and container runtime version. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if my nodes are going NotReady, how can I tell that it is due to this feature, without having to log in to all 10,000 nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not now happen as long as the fallback behavior is in place.
After the deprecated functionality is removed this will be similar to other kubelet startup errors. The node will be set to NotReady by the node-controller
(visible in node events). When the kubelet does not start I don't know what other methods you have other than examine kubelet logs (log in or via scraped logs)
Thoughts @haircommander ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could maybe have an event emitted but that's not all that reliable
Propose to retain fallback behavior in GA, but, mark the the --cgroup-driver flag and cgroupDriver config option as deprecated and eventually drop the according to the Kubernetes deprecation policy.
Updated: Changed to proposal to retain fallback behavior in GA, but, mark the the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/lgtm |
Signed-off-by: Peter Hunt <pehunt@redhat.com>
5c040c3
to
caadc0f
Compare
a few questions
These are similar in spirit to the thread here: #5370 (comment) and I think we can do much better than, "oh, the kubelet exits". For instance, in 1.34, you could add a metric indicating that the CRI is too old for a future version. |
/reopen not sure why it closed |
@haircommander: Failed to re-open PR: state cannot be changed. There are no new commits on the marquiz:devel/kep-cgroup-driver branch. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@haircommander: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I added a metric |
…d 1.7 will be dropped in 1.36 Signed-off-by: Peter Hunt <pehunt@redhat.com>
15db405
to
c783802
Compare
I think that addresses both John and my concerns. Thanks. PRR lgtm |
/assign @mrunalp as official approver for the KEP |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, marquiz, mrunalp, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |