Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter doesn't delete an empty NotReady node #2489

Closed
ace22b opened this issue Sep 10, 2022 · 3 comments
Closed

Karpenter doesn't delete an empty NotReady node #2489

ace22b opened this issue Sep 10, 2022 · 3 comments
Labels
feature New feature or request

Comments

@ace22b
Copy link

ace22b commented Sep 10, 2022

Version

Karpenter: v0.13.2

Kubernetes: v1.22.11-eks-18ef993

Expected Behavior

Karpenter deletes empty Nodes with no workloads after creating them even if Node is not ready.

Actual Behavior

Karpenter keeps trying to schedule pods on NotReady node and doesn't delete it when those pods finally get scheduled elsewhere.

Steps to Reproduce the Problem

This is a pretty rare problem and I think I've only seen it once but didn't have time to investigate.

This node never became ready:

NAME                                          STATUS     ROLES    AGE     VERSION
ip-x-x-x-x.us-west-2.compute.internal   NotReady   <none>   172m    v1.22.12-eks-ba74326
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 09 Sep 2022 17:27:30 -0700   Fri, 09 Sep 2022 14:46:52 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 09 Sep 2022 17:27:30 -0700   Fri, 09 Sep 2022 14:46:52 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 09 Sep 2022 17:27:30 -0700   Fri, 09 Sep 2022 14:46:52 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Fri, 09 Sep 2022 17:27:30 -0700   Fri, 09 Sep 2022 14:46:52 -0700   KubeletNotReady              [container runtime status check may not have completed yet, container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]

After trying to schedule pods on it for a while they got scheduled elsewhere:

2022-09-09T22:40:17.842Z	DEBUG	controller.events	Normal	{"commit": "062a029", "object": {"kind":"Pod","namespace":"dev","name":"pod-683530644","uid":"c3ce48e0-28f1-478b-a55f-bf14029800a3","apiVersion":"v1","resourceVersion":"119449605"}, "reason": "NominatePod", "message": "Pod should schedule on ip-x-x-x-x.us-west-2.compute.internal"}
2022-09-09T22:40:33.287Z	DEBUG	controller.events	Normal	{"commit": "062a029", "object": {"kind":"Pod","namespace":"dev","name":"pod-724336772","uid":"af5bd6cd-6518-4d44-8aa5-6eb8dcf319e2","apiVersion":"v1","resourceVersion":"119449611"}, "reason": "NominatePod", "message": "Pod should schedule on ip-x-x-x-x.us-west-2.compute.internal"}

The only Pending pods belong to daemonsets for that node:

kubectl get pods -A | grep "Pending"

kube-system     aws-node-8kwxx                                               0/1     Pending   0               164m
...

Karpenter logs don't have any mentions of Node being empty. Only launching a node and assigning many pods.

After doing manual kubectl delete node:

2022-09-10T00:40:35.393Z	INFO	controller.termination	Cordoned node	{"commit": "062a029", "node": "ip-x-x-x-x.us-west-2.compute.internal"}
2022-09-10T00:40:35.660Z	INFO	controller.termination	Deleted node	{"commit": "062a029", "node": "ip-x-x-x-x.us-west-2.compute.internal"}

Resource Specs and Logs

I included all logs I could think of here: https://gist.github.com/ace22b/cdc0f7ff8fcf333d54f1688c9e984255

I am not able to access the node over SSM to get the kubelet log.

Thanks.

@ace22b ace22b added the bug Something isn't working label Sep 10, 2022
@spring1843
Copy link
Contributor

This is expected if a node is not Ready, it's not considered for deletion.

If Karpenter deletes a node that never became Ready, in some cases it will take away the possibility of finding out why that node never became ready. I'm not sure if this is the desired behavior for most cases.

@spring1843 spring1843 added feature New feature or request and removed bug Something isn't working labels Sep 12, 2022
@ace22b
Copy link
Author

ace22b commented Sep 12, 2022

Ok makes sense, thanks. I'll have to work around this with some sort of node-watcher.

I would love for this to be configurable in the future.

@jonathan-innis
Copy link
Contributor

Duplicate of kubernetes-sigs/karpenter#750, please +1 that issue if you are interesting in bringing that up in the feature roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants