Prior Search
What is your question?
I'm noticing a node in our production cluster with the following event Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim which has been running for 3 days. Could this be related to this issue #127 ?
@mschnee also found this issue which seems related aws/karpenter-provider-aws#6803
Name: ip-10-0-166-82.us-west-2.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/instance-type=m6g.medium
beta.kubernetes.io/os=linux
eks.amazonaws.com/capacityType=SPOT
eks.amazonaws.com/nodegroup=controllers-20240728225511410500000002
eks.amazonaws.com/nodegroup-image=ami-0835c99467c24da9b
eks.amazonaws.com/sourceLaunchTemplateId=lt-04000b2f2434662ae
eks.amazonaws.com/sourceLaunchTemplateVersion=12
failure-domain.beta.kubernetes.io/region=us-west-2
failure-domain.beta.kubernetes.io/zone=us-west-2b
k8s.io/cloud-provider-aws=1eca48abf50de6dbb7b17d2b5d457797
kubernetes.io/arch=arm64
kubernetes.io/hostname=ip-10-0-166-82.us-west-2.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=m6g.medium
panfactum.com/class=controller
topology.ebs.csi.aws.com/zone=us-west-2b
topology.kubernetes.io/region=us-west-2
topology.kubernetes.io/zone=us-west-2b
Annotations: alpha.kubernetes.io/provided-node-ip: 10.0.166.82
csi.volume.kubernetes.io/nodeid:
{"ebs.csi.aws.com":"i-028527e376b17a21e","secrets-store.csi.k8s.io":"ip-10-0-166-82.us-west-2.compute.internal"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 11 Oct 2024 08:15:46 -0500
Taints: arm64=true:NoSchedule
burstable=true:NoSchedule
controller=true:NoSchedule
spot=true:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: ip-10-0-166-82.us-west-2.compute.internal
AcquireTime: <unset>
RenewTime: Mon, 14 Oct 2024 20:20:39 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 11 Oct 2024 08:16:14 -0500 Fri, 11 Oct 2024 08:16:14 -0500 CiliumIsUp Cilium is running on this node
MemoryPressure False Mon, 14 Oct 2024 20:16:59 -0500 Fri, 11 Oct 2024 08:15:45 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 14 Oct 2024 20:16:59 -0500 Fri, 11 Oct 2024 08:15:45 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 14 Oct 2024 20:16:59 -0500 Fri, 11 Oct 2024 08:15:45 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 14 Oct 2024 20:16:59 -0500 Fri, 11 Oct 2024 08:16:06 -0500 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.166.82
InternalDNS: ip-10-0-166-82.us-west-2.compute.internal
Hostname: ip-10-0-166-82.us-west-2.compute.internal
Capacity:
cpu: 1
ephemeral-storage: 40894Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
hugepages-32Mi: 0
hugepages-64Ki: 0
memory: 3880624Ki
pods: 110
Allocatable:
cpu: 940m
ephemeral-storage: 37518678362
hugepages-1Gi: 0
hugepages-2Mi: 0
hugepages-32Mi: 0
hugepages-64Ki: 0
memory: 3163824Ki
pods: 110
System Info:
Machine ID: ec2ade84bc798e1284d85a506964467e
System UUID: ec2ade84-bc79-8e12-84d8-5a506964467e
Boot ID: 0ca4357a-7367-45d9-b8c4-7d3c7cae8d98
Kernel Version: 6.1.109
OS Image: Bottlerocket OS 1.24.0 (aws-k8s-1.29)
Operating System: linux
Architecture: arm64
Container Runtime Version: containerd://1.7.22+bottlerocket
Kubelet Version: v1.29.5-eks-1109419
Kube-Proxy Version: v1.29.5-eks-1109419
ProviderID: aws:///us-west-2b/i-028527e376b17a21e
Non-terminated Pods: (29 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
alb-controller alb-controller-dd56b78d6-whkgc 11m (1%) 100m (10%) 83464877 (2%) 312005503 (9%) 25m
alloy alloy-tnqng 34m (3%) 100m (10%) 179272160 (5%) 429137520 (13%) 3d
argo argo-events-controller-manager-68487594-rr7jb 11m (1%) 100m (10%) 88707757 (2%) 334870395 (10%) 81m
argo events-webhook-6d98c7b976-rcc25 11m (1%) 100m (10%) 46739508 (1%) 267721196 (8%) 25m
authentik redis-4833-node-0 56m (5%) 100m (10%) 107425154 (3%) 305613202 (9%) 8m59s
aws-ebs-csi-driver ebs-csi-controller-676849595f-rzg5x 66m (7%) 100m (10%) 177293248 (5%) 465731052 (14%) 79m
aws-ebs-csi-driver ebs-csi-node-bwh2g 33m (3%) 100m (10%) 81814506 (2%) 323841192 (9%) 6m16s
cert-manager cert-manager-cainjector-6f67f8649c-mknt8 11m (1%) 100m (10%) 155131523 (4%) 397754691 (12%) 6m17s
cert-manager cert-manager-webhook-66db579977-xfgnw 11m (1%) 100m (10%) 34060758 (1%) 240362697 (7%) 25m
cilium cilium-xqp62 100m (10%) 0 (0%) 380258472 (11%) 494336013 (15%) 155m
cloudnative-pg cloudnative-pg-787ff9548d-lq79d 11m (1%) 100m (10%) 155131523 (4%) 397754691 (12%) 158m
external-snapshotter external-snapshotter-webhook-7d7c8c678d-6hvxg 11m (1%) 100m (10%) 34060758 (1%) 240362697 (7%) 76m
implentio eventbus-default-js-0 33m (3%) 100m (10%) 57060758 (1%) 270262697 (8%) 9m17s
kube-system core-dns-664d5dfc4f-bqdxs 34m (3%) 0 (0%) 99798506 (3%) 129738057 (4%) 93m
linkerd linkerd-identity-69bb59b957-n4zlb 22m (2%) 100m (10%) 35074998 (1%) 253574998 (7%) 158m
linkerd linkerd-proxy-injector-6d5778cb4d-q9dtl 11m (1%) 100m (10%) 74030518 (2%) 336804716 (10%) 3h32m
logging loki-backend-2 22m (2%) 100m (10%) 307818158 (9%) 474524292 (14%) 150m
logging loki-canary-grzjp 11m (1%) 100m (10%) 41496628 (1%) 256845072 (7%) 3d12h
logging loki-read-7f98fd5b98-n6fn7 23m (2%) 100m (10%) 235870026 (7%) 502714745 (15%) 69m
logging redis-de1a-node-2 56m (5%) 100m (10%) 121564556 (3%) 339167634 (10%) 6h5m
metabase pg-bce1-pooler-rw-6c687bcc4-cgktr 10m (1%) 100m (10%) 60Mi (1%) 280Mi (9%) 5h21m
monitoring node-exporter-gm8dh 22m (2%) 0 (0%) 47149996 (1%) 61294994 (1%) 17h
monitoring oauth2-proxy-ec6215c0214caf95-5f5f4c6dbb-bxh25 11m (1%) 100m (10%) 28817878 (0%) 51619017 (1%) 8m29s
monitoring open-telemetry-opentelemetry-operator-9c9f7f65c-t8xwd 22m (2%) 1111m (118%) 83627194 (2%) 355998068 (10%) 85m
pvc-autoresizer pvc-autoresizer-controller-5775b9dfff-nntft 23m (2%) 100m (10%) 46739508 (1%) 256845072 (7%) 81m
secrets-csi secrets-csi-h4qgs 33m (3%) 264m (28%) 69135756 (2%) 285960194 (8%) 5h25m
vault vault-2 35m (3%) 100m (10%) 258639240 (7%) 532314724 (16%) 85m
vault vault-csi-provider-2dsjz 22m (2%) 100m (10%) 58239508 (1%) 271795072 (8%) 25h
vertical-pod-autoscaler vpa-admission-controller-5584bfb85d-jcjlw 11m (1%) 100m (10%) 74030518 (2%) 292323385 (9%) 66m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 767m (81%) 3775m (401%)
memory 3225368550 (99%) 9174874866 (283%)
ephemeral-storage 200Mi (0%) 200Mi (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
hugepages-32Mi 0 (0%) 0 (0%)
hugepages-64Ki 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DisruptionBlocked 30s (x40 over 81m) karpenter Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim
What primary components of the stack does this relate to?
terraform
Code of Conduct
Prior Search
What is your question?
I'm noticing a node in our production cluster with the following event
Cannot disrupt Node: state node doesn't contain both a node and a nodeclaimwhich has been running for 3 days. Could this be related to this issue #127 ?@mschnee also found this issue which seems related aws/karpenter-provider-aws#6803
What primary components of the stack does this relate to?
terraform
Code of Conduct