New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeGracefulShutdown fail to update Pod status for system critical pods #124448
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
@avestuk: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
cc @kwilczynski I'm wondering if we can state this as bug or should be handled as enhancement to the graceful shutdown feature. |
This could be due to the kubelet receiving an shutdown event of node, leading it to delete the pod, while the KCM (kube-controller-manager) initiates pod eviction after a 30-second delay. By the time the KCM starts the eviction process, the node has already shut down, but the API server has not yet been updated with the information that the pod has been deleted, so the pod is always in the If I understand correctly, I do think it can be improved. |
Graceful node shutdown feature does not evict pods based on taints & tolerations. It first evicts all the pods and then the system pods. In this case, coredns might be running as a system pod? |
/triage needs-information |
@AnishShah I found yesterday that my intital report was spurious, this does seem to be due to priority classes and CoreDNS was indeed running as a I'm able to consistently reproduce the failure using a minimal example deployment, where when the PriorityClassName for the deployment is set to "system-cluster-critical", then NodeGracefulShutdown fails. Below are logs from a successful pod eviction due to graceful node shutdown. The pod is part of a deployment without a PriorityClassName specified. Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.621576 718 nodeshutdown_manager_linux.go:375] "Shutdown manager killing pod with gracePeriod" pod="kube-system/test-68969998d9-7nqcn" gracePeriod=20
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.621746 718 pod_workers.go:867] "Pod is being removed by the kubelet, begin teardown" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" updateType="kill"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.621834 718 pod_workers.go:963] "Notifying pod of pending update" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" workType="terminating"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.621926 718 pod_workers.go:970] "Cancelling current pod sync" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" workType="terminating"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622017 718 pod_workers.go:1230] "Processing pod event" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" updateType="terminating"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622101 718 pod_workers.go:1352] "Pod worker has observed request to terminate" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622188 718 kubelet.go:2011] "SyncTerminatingPod enter" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622262 718 kubelet_pods.go:1673] "Generating pod status" podIsTerminal=false pod="kube-system/test-68969998d9-7nqcn"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622415 718 kubelet_pods.go:1686] "Got phase for pod" pod="kube-system/test-68969998d9-7nqcn" oldPhase="Running" phase="Running"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622546 718 status_manager.go:687] "updateStatusInternal" version=3 podIsFinished=false pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" containers="(busybox state=running previous=<none>)"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622648 718 kubelet.go:2021] "Pod terminating with grace period" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" gracePeriod=20
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622757 718 kuberuntime_container.go:749] "Killing container with a grace period override" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" containerName="busybox" containerID="containerd://771f1240ede543dccb522e67e8bb099e79bf716f6b5a06ef117e72a220ce7d60" gracePerio>
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.622864 718 kuberuntime_container.go:770] "Killing container with a grace period" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" containerName="busybox" containerID="containerd://771f1240ede543dccb522e67e8bb099e79bf716f6b5a06ef117e72a220ce7d60" gracePeriod=20
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.623182 718 status_manager.go:227] "Syncing updated statuses"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.623306 718 status_manager.go:833] "Sync pod status" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" statusUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" version=3
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.623567 718 event.go:376] "Event occurred" object="kube-system/test-68969998d9-7nqcn" fieldPath="spec.containers{busybox}" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container busybox"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.630224 718 status_manager.go:874] "Patch status for pod" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" patch="{\"metadata\":{\"uid\":\"459ad118-8ed7-476b-b92f-5fa2753b49e6\"}}"
Apr 30 09:38:16 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:16.632487 718 status_manager.go:881] "Status for pod is up-to-date" pod="kube-system/test-68969998d9-7nqcn" statusVersion=3
pr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:18.729875 718 status_manager.go:984] "Pod status is inconsistent with cached status for pod, a reconciliation should be triggered" pod="kube-system/test-68969998d9-7nqcn" statusDiff=<
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: &v1.PodStatus{
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: - Phase: "Running",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Phase: "Failed",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: Conditions: []v1.PodCondition{
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: ... // 3 identical elements
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: {Type: "ContainersReady", Status: "True", LastTransitionTime: {Time: s"2024-04-30 09:37:18 +0000 UTC"}},
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: {Type: "PodScheduled", Status: "True", LastTransitionTime: {Time: s"2024-04-30 09:37:16 +0000 UTC"}},
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + {
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Type: "DisruptionTarget",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Status: "True",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + LastTransitionTime: s"2024-04-30 09:38:16 +0000 UTC",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Reason: "TerminationByKubelet",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Message: "Pod was terminated in response to imminent node shutdown.",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + },
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: },
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: - Message: "",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Message: "Pod was terminated in response to imminent node shutdown.",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: - Reason: "",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Reason: "Terminated",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: NominatedNodeName: "",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: HostIP: "192.168.161.11",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: ... // 2 identical fields
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: PodIPs: {{IP: "10.2.0.131"}},
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: StartTime: s"2024-04-30 09:37:16 +0000 UTC",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: - InitContainerStatuses: nil,
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + InitContainerStatuses: []v1.ContainerStatus{},
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: ContainerStatuses: {{Name: "busybox", State: {Running: &{StartedAt: {Time: s"2024-04-30 09:37:17 +0000 UTC"}}}, Ready: true, Image: "docker.io/library/busybox:stable-glibc", ...}},
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: QOSClass: "BestEffort",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: - EphemeralContainerStatuses: nil,
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: + EphemeralContainerStatuses: []v1.ContainerStatus{},
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: Resize: "",
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: ResourceClaimStatuses: nil,
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: }
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: >
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:18.730751 718 status_manager.go:833] "Sync pod status" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" statusUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" version=3
pr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:18.735733 718 status_manager.go:874] "Patch status for pod" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" patch="{\"metadata\":{\"uid\":\"459ad118-8ed7-476b-b92f-5fa2753b49e6\"}}"
Apr 30 09:38:18 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:18.735835 718 status_manager.go:881] "Status for pod is up-to-date" pod="kube-system/test-68969998d9-7nqcn" statusVersion=3
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:28.730445 718 status_manager.go:984] "Pod status is inconsistent with cached status for pod, a reconciliation should be triggered" pod="kube-system/test-68969998d9-7nqcn" statusDiff=<
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: &v1.PodStatus{
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: - Phase: "Running",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Phase: "Failed",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: Conditions: []v1.PodCondition{
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: ... // 3 identical elements
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: {Type: "ContainersReady", Status: "True", LastTransitionTime: {Time: s"2024-04-30 09:37:18 +0000 UTC"}},
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: {Type: "PodScheduled", Status: "True", LastTransitionTime: {Time: s"2024-04-30 09:37:16 +0000 UTC"}},
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + {
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Type: "DisruptionTarget",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Status: "True",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + LastTransitionTime: s"2024-04-30 09:38:16 +0000 UTC",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Reason: "TerminationByKubelet",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Message: "Pod was terminated in response to imminent node shutdown.",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + },
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: },
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: - Message: "",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Message: "Pod was terminated in response to imminent node shutdown.",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: - Reason: "",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + Reason: "Terminated",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: NominatedNodeName: "",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: HostIP: "192.168.161.11",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: ... // 2 identical fields
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: PodIPs: {{IP: "10.2.0.131"}},
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: StartTime: s"2024-04-30 09:37:16 +0000 UTC",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: - InitContainerStatuses: nil,
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + InitContainerStatuses: []v1.ContainerStatus{},
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: ContainerStatuses: {{Name: "busybox", State: {Running: &{StartedAt: {Time: s"2024-04-30 09:37:17 +0000 UTC"}}}, Ready: true, Image: "docker.io/library/busybox:stable-glibc", ...}},
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: QOSClass: "BestEffort",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: - EphemeralContainerStatuses: nil,
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: + EphemeralContainerStatuses: []v1.ContainerStatus{},
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: Resize: "",
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: ResourceClaimStatuses: nil,
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: }
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: >
Apr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:28.731178 718 status_manager.go:833] "Sync pod status" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" statusUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" version=3
pr 30 09:38:28 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:28.732865 718 kubelet.go:2505] "SyncLoop (housekeeping) end" duration="5ms"
Apr 30 09:38:29 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:29.241307 718 kubelet.go:2885] "Container runtime status" status="Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=true reason: message:, ContainerdHasNoDeprecationWarnings=true reason: message:"
Apr 30 09:38:29 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:29.242771 718 generic.go:224] "GenericPLEG: Relisting"
Apr 30 09:38:29 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:29.243558 718 kuberuntime_manager.go:437] "Retrieved pods from runtime" all=true
Apr 30 09:38:29 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:29.362998 718 kubelet_node_status.go:538] "Updating node status"
Apr 30 09:38:29 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:29.592638 718 status_manager.go:874] "Patch status for pod" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" patch="{\"metadata\":{\"uid\":\"459ad118-8ed7-476b-b92f-5fa2753b49e6\"}}"
Apr 30 09:38:29 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:29.593094 718 status_manager.go:881] "Status for pod is up-to-date" pod="kube-system/test-68969998d9-7nqcn" statusVersion=3
Apr 30 09:38:30 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:30.727400 718 kubelet.go:2466] "SyncLoop (SYNC) pods" total=1 pods=["kube-system/test-68969998d9-7nqcn"]
Apr 30 09:38:30 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:30.727876 718 pod_workers.go:963] "Notifying pod of pending update" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" workType="terminating"
Apr 30 09:38:38 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:38.047774 718 kubelet.go:2428] "SyncLoop RECONCILE" source="api" pods=["kube-system/test-68969998d9-7nqcn"]
Apr 30 09:38:38 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:38.049563 718 status_manager.go:874] "Patch status for pod" pod="kube-system/test-68969998d9-7nqcn" podUID="459ad118-8ed7-476b-b92f-5fa2753b49e6" patch="{\"metadata\":{\"uid\":\"459ad118-8ed7-476b-b92f-5fa2753b49e6\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"DisruptionTarget\"},{\"type\":>
Apr 30 09:38:38 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:38.049894 718 status_manager.go:883] "Status for pod updated successfully" pod="kube-system/test-68969998d9-7nqcn" statusVersion=4 status={"phase":"Failed","conditions":[{"type":"DisruptionTarget","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-04-30T09:38:37Z","reason":"TerminationByKube>
Apr 30 09:38:38 lke-e2e-a77508-test-392f71e0 kubelet[718]: I0430 09:38:38.050132 718 status_manager.go:833] "Sync pod status" podUID="8999dbb6-9b04-4300-acde-41e81f01c62a" statusUID="8999dbb6-9b04-4300-acde-41e81f01c62a" version=1 Now logs from the same test but with the Priority Class name set to
My deployment definition: deployment, err := kubernetes.AppsV1().Deployments("kube-system").Create(ctx, &appsv1.Deployment{
ObjectMeta: metaV1.ObjectMeta{
Name: "test",
Labels: map[string]string{"foo": "bar"},
},
Spec: appsv1.DeploymentSpec{
Replicas: &replicas,
Selector: &metaV1.LabelSelector{
MatchLabels: map[string]string{"foo": "bar"},
},
Template: v1.PodTemplateSpec{
ObjectMeta: metaV1.ObjectMeta{
Labels: map[string]string{"foo": "bar"},
},
Spec: v1.PodSpec{
PriorityClassName: "system-cluster-critical",
Containers: []v1.Container{
{
Name: "busybox",
Image: "busybox:stable-glibc",
Command: []string{"sleep"},
Args: []string{"infinity"},
},
},
},
},
},
}, metaV1.CreateOptions{}) |
What do you mean by NodeGracefulShutdown fails? do you not see a pod status update? Is it possible to share the full kubelet logs? /retitle NodeGracefulShutdown fail to update Pod status for system critical pods |
@avestuk, so this is no longer a problem related to setting up tolerations for some Pods? Graceful Node Shutdown manager does not have any means to convey that it will terminate some Pods, other than setting up the an appropriate node status and taints, as such, a potential race to evict and/or terminate Pods with the Taint (Eviction?) manager could take place. The KEP and then the implementation most likely did not take such a scenario into account. However, you sound as if this is no longer an issue? |
@avestuk, for your recent update. How is your Graceful Node Shutdown configured at the moment? Would you be able to share the relevant configuration? Also, are you testing on a worker node or a member of the control plane? There is nothing specific in terms of how the Pod of a certain priority is terminated. The order in which these are terminated is different, but the actual process is the same. Thus, I am not sure why the problem you are seeing only applies to Pods with some priority class set. (taken from kubernetes/kubernetes//pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go#L452) func groupByPriority(shutdownGracePeriodByPodPriority []kubeletconfig.ShutdownGracePeriodByPodPriority, pods []*v1.Pod) []podShutdownGroup {
groups := make([]podShutdownGroup, 0, len(shutdownGracePeriodByPodPriority))
for _, period := range shutdownGracePeriodByPodPriority {
groups = append(groups, podShutdownGroup{
ShutdownGracePeriodByPodPriority: period,
})
}
for _, pod := range pods {
var priority int32
if pod.Spec.Priority != nil {
priority = *pod.Spec.Priority
}
// Find the group index according to the priority.
index := sort.Search(len(groups), func(i int) bool {
return groups[i].Priority >= priority
})
// 1. Those higher than the highest priority default to the highest priority
// 2. Those lower than the lowest priority default to the lowest priority
// 3. Those boundary priority default to the lower priority
// if priority of pod is:
// groups[index-1].Priority <= pod priority < groups[index].Priority
// in which case we want to pick lower one (i.e index-1)
if index == len(groups) {
index = len(groups) - 1
} else if index < 0 {
index = 0
} else if index > 0 && groups[index].Priority > priority {
index--
}
groups[index].Pods = append(groups[index].Pods, pod)
}
return groups
} What happens when you add another Pod and set the priority class to What is this issue you are seeing? Are the pods' statuses not updated? Are the pods not terminated? Is the order in which things are terminated incorrect? Related: |
Graceful Node Shutdown settings are:
I am testing on a worker node. The issue is that I'm seeing that the pod status is not being updated to reflect I'll update the initial post to clarify what the current state of affairs is. |
What happened?
Pods with a
system-cluster-critical
priority class set are not evicted by Graceful node shutdown when a node is slated to be shutdown. Instead they are evicted by the Taint Manager once the tolerationSeconds period expires.What did you expect to happen?
I'd expect the presence of priority classes to be orthogonal to whether or not Graceful node shutdown evicts the pods or not. If graceful node shutdown runs on node shutdown then pods should be evicted before Taint Manager runs.
How can we reproduce it (as minimally and precisely as possible)?
Run a deployment with the below tolerations
Schedule a node shutdown or trigger a shutdown event using gdbus
Observe that pods from the deployment enter a "Terminating" state and have conditions indicating they were evicted by the Taint manager
The pod status is never updated to show TerminationByKubelet and stays in Running state until the eviction by Taint Manager
Anything else we need to know?
Kubelet logs will indicate that the node shutdown manager finished terminating the pod but this doesn't get reflected in the API.
Pod in the API server
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: