Skip to content

KEP-4033: graduate to GA #5370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 18, 2025

Conversation

marquiz
Copy link
Contributor

@marquiz marquiz commented Jun 4, 2025

  • One-line PR description: graduate KEP-4033 to GA
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 4, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 4, 2025
@marquiz
Copy link
Contributor Author

marquiz commented Jun 4, 2025

/cc @haircommander @mrunalp @mikebrow

<!--
Describe the consequences on existing workloads (e.g., if this is a runtime
feature, can it break the existing applications?).
In GA, the fallback behavior (and the kubelet `--cgroup-driver` flag) is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we require a few releases before removing the flag as per deprecation policy. Please update. (The flag can do nothing, it just should be around marked as deprecated)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more than a few places within our own repos that are still setting this flag
https://cs.k8s.io/?q=cgroup(.)%3Fdriver&i=nope&literal=nope&files=&excludeFiles=&repos=

And then users may be opting to configure it indirectly, e.g. through kubeadm or kops configurations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check, I updated the PR to align with the official deprecation policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now changed so that the fallback behavior is kept as long as the cmdline flag and config option.

message, e.g.:

```
cgroupDriver option has been deprecated and will be dropped in a future release. Please upgrade to a CRI implementation that supports cgroup-driver detection.
```

The `--cgroup-driver` flag and the cgroupDriver configuration option will be
deprecated and have no effect when support for the feature is graduated to GA.
Copy link
Member

@BenTheElder BenTheElder Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation behind making CRI auto-detection a hard requirement and ignoring explicit user configuration (which would enable using older runtimes for example?), the flag doesn't seem like a lot of code to retain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 same question. seems to add risk with little benefit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now changed so that the fallback behavior is kept as long as the cmdline flag and config option. These will be marked as deprecated and will eventually be removed as per the K8s deprecation policy.

flag). Further, the kubeletConfig field and `--cgroup-driver` flag will be
marked as deprecated, to be dropped when support for the feature is adopted by
CRI-O and containerd. Usage of the deprecated setting will produce a log
flag). For beta, the kubeletConfig field and `--cgroup-driver` flag will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will be" or "have been"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were not truly deprecated in beta (except for the log message saying/lying so). Now changed the wording to

In beta, resorting to the fallback behavior will produce a log message like:

message, e.g.:

```
cgroupDriver option has been deprecated and will be dropped in a future release. Please upgrade to a CRI implementation that supports cgroup-driver detection.
```

The `--cgroup-driver` flag and the cgroupDriver configuration option will be
deprecated and have no effect when support for the feature is graduated to GA.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 same question. seems to add risk with little benefit?

You can take a look at one potential example of such test in:
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably the line below can now reference those tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed block was a comment block and not related to this KEP or feature.

checking if there are objects with field X set) may be a last resort. Avoid
logs or events for this purpose.
-->

Kubelet and container runtime version. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if my nodes are going NotReady, how can I tell that it is due to this feature, without having to log in to all 10,000 nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not now happen as long as the fallback behavior is in place.

After the deprecated functionality is removed this will be similar to other kubelet startup errors. The node will be set to NotReady by the node-controller (visible in node events). When the kubelet does not start I don't know what other methods you have other than examine kubelet logs (log in or via scraped logs)

Thoughts @haircommander ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could maybe have an event emitted but that's not all that reliable

Propose to retain fallback behavior in GA, but, mark the the
--cgroup-driver flag and cgroupDriver config option as deprecated and
eventually drop the according to the Kubernetes deprecation policy.
@marquiz
Copy link
Contributor Author

marquiz commented Jun 11, 2025

Updated: Changed to proposal to retain fallback behavior in GA, but, mark the the --cgroup-driver flag and the CgroupDriver config option as deprecated and eventually drop the according to the Kubernetes deprecation policy.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2025
@haircommander
Copy link
Contributor

/lgtm

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@haircommander haircommander force-pushed the devel/kep-cgroup-driver branch from 5c040c3 to caadc0f Compare June 16, 2025 17:06
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 16, 2025
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 16, 2025
@deads2k
Copy link
Contributor

deads2k commented Jun 16, 2025

a few questions

  1. How will the sig decide on when the right time is?
  2. How will the kubelet warn cluster-admins that their CRIs are too old or incompatible before hard-failing in an unrecoverable way?

These are similar in spirit to the thread here: #5370 (comment) and I think we can do much better than, "oh, the kubelet exits". For instance, in 1.34, you could add a metric indicating that the CRI is too old for a future version.

@haircommander
Copy link
Contributor

/reopen

not sure why it closed

@k8s-ci-robot
Copy link
Contributor

@haircommander: Failed to re-open PR: state cannot be changed. There are no new commits on the marquiz:devel/kep-cgroup-driver branch.

In response to this:

/reopen

not sure why it closed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@haircommander
Copy link
Contributor

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Jun 16, 2025
@k8s-ci-robot
Copy link
Contributor

@haircommander: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 16, 2025
@haircommander
Copy link
Contributor

I added a metric cri_losing_support to give admins a hint that there will be a future version that will be incompatible. I think this + documentation will give admins warning enough. I also clarified we're dropping support for containerd 1.7 in k8s 1.36, which corresponds with containerd 1.7 losing upstream support. WDYT @deads2k

…d 1.7 will be dropped in 1.36

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@haircommander haircommander force-pushed the devel/kep-cgroup-driver branch from 15db405 to c783802 Compare June 16, 2025 18:50
@deads2k
Copy link
Contributor

deads2k commented Jun 17, 2025

I added a metric cri_losing_support to give admins a hint that there will be a future version that will be incompatible. I think this + documentation will give admins warning enough. I also clarified we're dropping support for containerd 1.7 in k8s 1.36, which corresponds with containerd 1.7 losing upstream support. WDYT @deads2k

I think that addresses both John and my concerns. Thanks.

PRR lgtm
/approve

@SergeyKanzhelev
Copy link
Member

/assign @mrunalp

as official approver for the KEP

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, marquiz, mrunalp, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2025
@SergeyKanzhelev
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2025
@k8s-ci-robot k8s-ci-robot merged commit a0f8483 into kubernetes:master Jun 18, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants