KEP-5233: proposal for NodeReadinessGates #5416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

ajaysundark wants to merge 4 commits into kubernetes:master from ajaysundark:kep_5233_nodereadinessgates_v134

+680 −0

ajaysundark commented Jun 17, 2025

One-line PR description: adding new KEP

Issue link: Node Readiness Gates #5233

Other comments: Including feedback from API review to include probing mechanisms as a inherent part of the design.


          KEP:5233 NodeReadinessGates proposal

088ab72

k8s-ci-robot added cncf-cla: yes kind/kep sig/node labels

Contributor

k8s-ci-robot commented Jun 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ajaysundark
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from derekwaynecarr and mrunalp

June 17, 2025 05:52

k8s-ci-robot added the size/XL label


          add flow diagram

f517976

Author

ajaysundark commented Jun 17, 2025

This design has been discussed with more folks, and below is the summary of the key feedback:

The current design with a new, explicit API is likely not necessary for the identified use cases. The recommended path forward is to first explore a simpler design that does not require a new API. This decision can be revisited if there's a POC or use-cases that demonstrate that a simpler approach is impractical or introduces unforeseen complexities.
There was a strong preference for using a node-local probing mechanism to report readiness. This approach is favored for high-fidelity signals and a better security posture compared to granting nodes/status patch permissions to multiple external agents.
An alternate proposal based on global control (crd) for node readiness is undesirable due to the risk of large scale impact on misconfiguration.
Admins typically know readiness requirements before node provisioning, mutable readiness-gates are not necessary. The conditions themselves are what may change.
It is important to differentiate (at handling the readiness-states) between an agent being present and an agent failing.

ajaysundark added 2 commits

June 17, 2025 06:28


          fix typo in link

3b458e3


          fix link again

1dfa3f8

Contributor

k8s-ci-robot commented Jun 17, 2025

@ajaysundark: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-enhancements-verify	`1dfa3f8`	link	true	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

sanposhiho reviewed

View reviewed changes

keps/sig-node/5233-node-readiness-gates/kep.yaml

    
              authors:

                - "@ajaysundark"

              owning-sig: sig-node

              participating-sigs: []

Member

sanposhiho Jun 17, 2025

need to put sig-scheduling and reviewer/approver from us.

jsafrane mentioned this pull request

Deletion of csi-node-plugin Pod causes driver entry to be removed from CSINode object; kube-scheduler schedules more than driver's allocatable kubernetes/kubernetes#126921

Open

lmktfy reviewed

View reviewed changes

lmktfy left a comment

Overall, we have a lot of existing extension points to allow building something a lot like this, but out of tree.

We put in those extension points for a reason. So, I think we should:

build this out of tree
add our voices to calls for better add-on management

keps/sig-node/5233-node-readiness-gates/README.md

Comment on lines +560 to +562

    
              ### Initial Taints without a Central API

              This approach uses `--register-with-taints` to apply multiple readiness taints at startup. Each component is then responsible for removing its own taint. This is less flexible and discoverable than a formal, versioned API for defining the readiness requirements. In addition,  due to operational complexity where every critical daemonset needs to be tolerating every other potential readiness taint possible. This is unmanageable in a practical scenario where the components could be managed by different teams / providers.

lmktfy Jun 19, 2025

Suggested change

      
            ### Initial Taints without a Central API
          
            This approach uses `--register-with-taints` to apply multiple readiness taints at startup. Each component is then responsible for removing its own taint. This is less flexible and discoverable than a formal, versioned API for defining the readiness requirements. In addition,  due to operational complexity where every critical daemonset needs to be tolerating every other potential readiness taint possible. This is unmanageable in a practical scenario where the components could be managed by different teams / providers. 
          
            ### Initial taints (replaced), with out-of-tree controller
          
            This approach uses `--register-with-taints` to apply a single initial taint at startup. A controller then atomically sets a set of replacement taints (configured using a custom resource) and removes the initial taint.
          
            For each replacement taint, each component is then responsible for removing its own taint.
          
            This is easier to maintain (no in-tree code) but requires people to run an additional
          
            controller in their cluster.

keps/sig-node/5233-node-readiness-gates/README.md

    
                  matchLabels:

                    readiness-requirement: "network"

                requiredConditions:

                - type: "network.k8s.io/CalicoReady"

lmktfy Jun 19, 2025

Suggested change

      
              - type: "network.k8s.io/CalicoReady"
          
              - type: "vendor.example/NetworkReady"

keps/sig-node/5233-node-readiness-gates/README.md

    
                requiredConditions:

                - type: "network.k8s.io/CalicoReady"

                - type: "network.k8s.io/NetworkProxyReady"

                - type: "network.k8s.io/DRANetReady"

lmktfy Jun 19, 2025

Suggested change

      
              - type: "network.k8s.io/DRANetReady"
          
              - type: "vendor.example/LowLatencyInterconnectReady"

keps/sig-node/5233-node-readiness-gates/README.md

    
                  key: "readiness.k8s.io/network-pending"

                  effect: NoSchedule  

              ```

lmktfy Jun 19, 2025

How about a CRD that defines a set of rules that map (custom) conditions to taints?

Yes, you can break your cluster with a single, misguided cluster-scoped policy, but we already have that in other places (eg ValidatingAdmissionPolicy).

keps/sig-node/5233-node-readiness-gates/README.md

    
              # Hypothetical Kubelet Configuration

              nodeReadinessProbes:

                - name: "CNIReady"

                  conditionType: "network.k8s.io/CalicoCNIReady"

lmktfy Jun 19, 2025

Suggested change

      
                conditionType: "network.k8s.io/CalicoCNIReady"
          
                conditionType: "vendor.example/NetworkReady"

keps/sig-node/5233-node-readiness-gates/README.md

Comment on lines +276 to +277

    
                  - conditionType: "vendor.com/DeviceDriverReady"

                  - conditionType: "network.k8s.io/CalicoCNIReady"

lmktfy Jun 19, 2025

Suggested change

      
                - conditionType: "vendor.com/DeviceDriverReady"
          
                - conditionType: "network.k8s.io/CalicoCNIReady"
          
                - conditionType: "vendor.example.com/DeviceDriverReady"
          
                - conditionType: "vendor.example/NetworkReady"

keps/sig-node/5233-node-readiness-gates/README.md

    
                  Note over NA, CNI: Node-Agent Probes for Readiness

                  NA->>CNI: Probe for readiness (e.g., check health endpoint)

                  CNI-->>NA: Report Ready

                  NA->>N: Patch status.conditions:<br/>network.k8s.io/CNIReady=True

lmktfy Jun 19, 2025

We shouldn't make an assumption that network plugins use CNI.

keps/sig-node/5233-node-readiness-gates/README.md

    
              This approach allows critical components to directly influence when a node is ready, complementing the existing `Ready` condition with more granular, user-defined control.

              ### User Stories (Optional)

lmktfy Jun 19, 2025

Suggested change

      
            ### User Stories (Optional)
          
            ### User Stories

keps/sig-node/5233-node-readiness-gates/README.md

    
              * Defining the right set of gates requires careful consideration by the cluster administrator.

              ## Alternatives

lmktfy Jun 19, 2025

For autoscaling, cluster autoscalers can directly pay attention to the existing .status.conditions on nodes. I think that's a viable alternative and one we should list.

keps/sig-node/5233-node-readiness-gates/README.md

    
              ###### How can this feature be enabled / disabled in a live cluster?

              1. Feature gate (also fill in values in `kep.yaml`)

                  -  Feature gate name:

lmktfy Jun 19, 2025

Mention NodeReadinessGates here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes kind/kep sig/node size/XL