Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make drain ignore DaemonSets & bypass PodDisruptionBudgets #1414

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mjgallag
Copy link

@mjgallag mjgallag commented Mar 14, 2024

What

Make kubectl drain ignore DaemonSets & bypass PodDisruptionBudgets on cluster stop.

Why

Currently it fails if you have DaemonSets & loops infinitely (note there may be a timeout elsewhere preventing this from hanging k3d cluster/node stop) if you have PodDisruptionBudgets.

➜  runhub git:(main) ✗ kubectl get pods --all-namespaces 
NAMESPACE      NAME                                                        READY   STATUS    RESTARTS   AGE
argocd         argo-cd-argocd-redis-7694c74974-xtvsk                       1/1     Running   0          3m23s
kube-system    local-path-provisioner-6c86858495-7qzgk                     1/1     Running   0          3m23s
kube-system    coredns-6799fbcd5-xgwjl                                     1/1     Running   0          3m23s
argocd         argo-cd-argocd-notifications-controller-688fc8f6bb-j2cbm    1/1     Running   0          3m24s
argocd         argo-cd-argocd-applicationset-controller-85545c8678-54tdw   1/1     Running   0          3m24s
kube-system    metrics-server-67c658944b-2vrf5                             1/1     Running   0          3m23s
argocd         argo-cd-argocd-dex-server-69959f496-997xd                   1/1     Running   0          3m24s
argocd         argo-cd-argocd-application-controller-0                     1/1     Running   0          3m24s
argocd         argo-cd-argocd-server-fbf845cd4-h9pks                       1/1     Running   0          3m24s
argocd         argo-cd-argocd-repo-server-b4cfb85b6-vwm9d                  1/1     Running   0          3m24s
istio-system   istiod-bc4584967-9l5j7                                      1/1     Running   0          94s
kube-system    svclb-istio-ingressgateway-00476101-8p5dg                   3/3     Running   0          77s
istio-system   istio-ingressgateway-9cc99c9db-h2zk6                        1/1     Running   0          77s
➜  runhub git:(main) ✗ kubectl drain k3d-dev-runhub-server-0 --force --delete-emptydir-data
node/k3d-dev-runhub-server-0 cordoned
error: unable to drain node "k3d-dev-runhub-server-0" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/svclb-istio-ingressgateway-00476101-8p5dg, continuing command...
There are pending nodes to be drained:
 k3d-dev-runhub-server-0
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/svclb-istio-ingressgateway-00476101-8p5dg
➜  runhub git:(main) ✗ kubectl get pods --all-namespaces                                   
NAMESPACE      NAME                                                        READY   STATUS    RESTARTS   AGE
argocd         argo-cd-argocd-redis-7694c74974-xtvsk                       1/1     Running   0          4m
kube-system    local-path-provisioner-6c86858495-7qzgk                     1/1     Running   0          4m
kube-system    coredns-6799fbcd5-xgwjl                                     1/1     Running   0          4m
argocd         argo-cd-argocd-notifications-controller-688fc8f6bb-j2cbm    1/1     Running   0          4m1s
argocd         argo-cd-argocd-applicationset-controller-85545c8678-54tdw   1/1     Running   0          4m1s
kube-system    metrics-server-67c658944b-2vrf5                             1/1     Running   0          4m
argocd         argo-cd-argocd-dex-server-69959f496-997xd                   1/1     Running   0          4m1s
argocd         argo-cd-argocd-application-controller-0                     1/1     Running   0          4m1s
argocd         argo-cd-argocd-server-fbf845cd4-h9pks                       1/1     Running   0          4m1s
argocd         argo-cd-argocd-repo-server-b4cfb85b6-vwm9d                  1/1     Running   0          4m1s
istio-system   istiod-bc4584967-9l5j7                                      1/1     Running   0          2m11s
kube-system    svclb-istio-ingressgateway-00476101-8p5dg                   3/3     Running   0          114s
istio-system   istio-ingressgateway-9cc99c9db-h2zk6                        1/1     Running   0          114s
➜  runhub git:(main) ✗ kubectl drain k3d-dev-runhub-server-0 --force --delete-emptydir-data --ignore-daemonsets
node/k3d-dev-runhub-server-0 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/svclb-istio-ingressgateway-00476101-8p5dg
evicting pod istio-system/istio-ingressgateway-9cc99c9db-h2zk6
evicting pod kube-system/metrics-server-67c658944b-2vrf5
evicting pod argocd/argo-cd-argocd-server-fbf845cd4-h9pks
evicting pod argocd/argo-cd-argocd-redis-7694c74974-xtvsk
evicting pod argocd/argo-cd-argocd-repo-server-b4cfb85b6-vwm9d
evicting pod argocd/argo-cd-argocd-notifications-controller-688fc8f6bb-j2cbm
evicting pod istio-system/istiod-bc4584967-9l5j7
evicting pod kube-system/local-path-provisioner-6c86858495-7qzgk
evicting pod argocd/argo-cd-argocd-dex-server-69959f496-997xd
evicting pod kube-system/coredns-6799fbcd5-xgwjl
evicting pod argocd/argo-cd-argocd-application-controller-0
evicting pod argocd/argo-cd-argocd-applicationset-controller-85545c8678-54tdw
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/argo-cd-argocd-notifications-controller-688fc8f6bb-j2cbm evicted
pod/argo-cd-argocd-redis-7694c74974-xtvsk evicted
pod/coredns-6799fbcd5-xgwjl evicted
I0314 08:37:18.742536   17715 request.go:697] Waited for 1.067466008s due to client-side throttling, not priority and fairness, request: GET:https://0.0.0.0:52964/api/v1/namespaces/argocd/pods/argo-cd-argocd-applicationset-controller-85545c8678-54tdw
pod/argo-cd-argocd-applicationset-controller-85545c8678-54tdw evicted
pod/argo-cd-argocd-application-controller-0 evicted
pod/argo-cd-argocd-dex-server-69959f496-997xd evicted
pod/argo-cd-argocd-repo-server-b4cfb85b6-vwm9d evicted
pod/argo-cd-argocd-server-fbf845cd4-h9pks evicted
pod/metrics-server-67c658944b-2vrf5 evicted
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/istio-ingressgateway-9cc99c9db-h2zk6 evicted
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/local-path-provisioner-6c86858495-7qzgk evicted
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod istio-system/istiod-bc4584967-9l5j7
error when evicting pods/"istiod-bc4584967-9l5j7" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
^C
➜  runhub git:(main) ✗ kubectl get pods --all-namespaces                                                       
NAMESPACE      NAME                                                        READY   STATUS    RESTARTS   AGE
istio-system   istiod-bc4584967-9l5j7                                      1/1     Running   0          4m8s
kube-system    svclb-istio-ingressgateway-00476101-8p5dg                   3/3     Running   0          3m51s
argocd         argo-cd-argocd-applicationset-controller-85545c8678-wm6xs   0/1     Pending   0          97s
kube-system    coredns-6799fbcd5-456rg                                     0/1     Pending   0          97s
kube-system    local-path-provisioner-6c86858495-6p4l7                     0/1     Pending   0          97s
argocd         argo-cd-argocd-redis-7694c74974-p7cgj                       0/1     Pending   0          97s
kube-system    metrics-server-67c658944b-75r9d                             0/1     Pending   0          96s
argocd         argo-cd-argocd-repo-server-b4cfb85b6-s55wr                  0/1     Pending   0          96s
istio-system   istio-ingressgateway-9cc99c9db-p9r7d                        0/1     Pending   0          97s
argocd         argo-cd-argocd-dex-server-69959f496-xfnnb                   0/1     Pending   0          96s
argocd         argo-cd-argocd-server-fbf845cd4-c6bns                       0/1     Pending   0          96s
argocd         argo-cd-argocd-notifications-controller-688fc8f6bb-k7sfr    0/1     Pending   0          96s
argocd         argo-cd-argocd-application-controller-0                     0/1     Pending   0          95s
➜  runhub git:(main) ✗ kubectl drain k3d-dev-runhub-server-0 --force --delete-emptydir-data --ignore-daemonsets --disable-eviction
node/k3d-dev-runhub-server-0 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/svclb-istio-ingressgateway-00476101-8p5dg
pod/istiod-bc4584967-9l5j7 deleted
node/k3d-dev-runhub-server-0 drained
➜  runhub git:(main) ✗ kubectl get pods --all-namespaces                                                                          
NAMESPACE      NAME                                                        READY   STATUS    RESTARTS   AGE
kube-system    svclb-istio-ingressgateway-00476101-8p5dg                   3/3     Running   0          4m22s
argocd         argo-cd-argocd-applicationset-controller-85545c8678-wm6xs   0/1     Pending   0          2m8s
kube-system    coredns-6799fbcd5-456rg                                     0/1     Pending   0          2m8s
kube-system    local-path-provisioner-6c86858495-6p4l7                     0/1     Pending   0          2m8s
argocd         argo-cd-argocd-redis-7694c74974-p7cgj                       0/1     Pending   0          2m8s
kube-system    metrics-server-67c658944b-75r9d                             0/1     Pending   0          2m7s
argocd         argo-cd-argocd-repo-server-b4cfb85b6-s55wr                  0/1     Pending   0          2m7s
istio-system   istio-ingressgateway-9cc99c9db-p9r7d                        0/1     Pending   0          2m8s
argocd         argo-cd-argocd-dex-server-69959f496-xfnnb                   0/1     Pending   0          2m7s
argocd         argo-cd-argocd-server-fbf845cd4-c6bns                       0/1     Pending   0          2m7s
argocd         argo-cd-argocd-notifications-controller-688fc8f6bb-k7sfr    0/1     Pending   0          2m7s
argocd         argo-cd-argocd-application-controller-0                     0/1     Pending   0          2m6s
istio-system   istiod-bc4584967-cxmjn                                      0/1     Pending   0          9s
➜  runhub git:(main) ✗ 

Implications

There were discussions in #1119 of reverting drain entirely due to more complex uses cases, i.e. agents, than the default single node cluster in which case I can continue using my workaround of calling drain node before I call cluster stop https://github.com/runhub-dev/runhub/blob/7d49dc314908063699b38458240d8e2540b89d88/scripts/dev.sh#L119-L120.

While ignoring DaemonSets makes sense in a multiple node use case bypassing PodDisruptionBudgets doesn't if you are only stopping a single node, i.e. k3d node stop, and not all the nodes in the cluster, i.e. k3d cluster stop. But I feel if you are going to keep drain as implemented for now bypassing PodDisruptionBudgets is better than an infinite loop (note there may be a timeout elsewhere preventing this from hanging k3d cluster/node stop).

@iwilltry42 iwilltry42 added this to the v5.7.0 milestone Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants