Skip to content

[question]: Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim #163

@wesbragagt

Description

@wesbragagt

Prior Search

  • I have already searched this project's issues to determine if a similar question has already been asked.

What is your question?

I'm noticing a node in our production cluster with the following event Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim which has been running for 3 days. Could this be related to this issue #127 ?

@mschnee also found this issue which seems related aws/karpenter-provider-aws#6803

Name:               ip-10-0-166-82.us-west-2.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/instance-type=m6g.medium
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=SPOT
                    eks.amazonaws.com/nodegroup=controllers-20240728225511410500000002
                    eks.amazonaws.com/nodegroup-image=ami-0835c99467c24da9b
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-04000b2f2434662ae
                    eks.amazonaws.com/sourceLaunchTemplateVersion=12
                    failure-domain.beta.kubernetes.io/region=us-west-2
                    failure-domain.beta.kubernetes.io/zone=us-west-2b
                    k8s.io/cloud-provider-aws=1eca48abf50de6dbb7b17d2b5d457797
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=ip-10-0-166-82.us-west-2.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m6g.medium
                    panfactum.com/class=controller
                    topology.ebs.csi.aws.com/zone=us-west-2b
                    topology.kubernetes.io/region=us-west-2
                    topology.kubernetes.io/zone=us-west-2b
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.0.166.82
                    csi.volume.kubernetes.io/nodeid:
                      {"ebs.csi.aws.com":"i-028527e376b17a21e","secrets-store.csi.k8s.io":"ip-10-0-166-82.us-west-2.compute.internal"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 11 Oct 2024 08:15:46 -0500
Taints:             arm64=true:NoSchedule
                    burstable=true:NoSchedule
                    controller=true:NoSchedule
                    spot=true:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-166-82.us-west-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Mon, 14 Oct 2024 20:20:39 -0500
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 11 Oct 2024 08:16:14 -0500   Fri, 11 Oct 2024 08:16:14 -0500   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Mon, 14 Oct 2024 20:16:59 -0500   Fri, 11 Oct 2024 08:15:45 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 14 Oct 2024 20:16:59 -0500   Fri, 11 Oct 2024 08:15:45 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 14 Oct 2024 20:16:59 -0500   Fri, 11 Oct 2024 08:15:45 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 14 Oct 2024 20:16:59 -0500   Fri, 11 Oct 2024 08:16:06 -0500   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.166.82
  InternalDNS:  ip-10-0-166-82.us-west-2.compute.internal
  Hostname:     ip-10-0-166-82.us-west-2.compute.internal
Capacity:
  cpu:                1
  ephemeral-storage:  40894Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             3880624Ki
  pods:               110
Allocatable:
  cpu:                940m
  ephemeral-storage:  37518678362
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             3163824Ki
  pods:               110
System Info:
  Machine ID:                 ec2ade84bc798e1284d85a506964467e
  System UUID:                ec2ade84-bc79-8e12-84d8-5a506964467e
  Boot ID:                    0ca4357a-7367-45d9-b8c4-7d3c7cae8d98
  Kernel Version:             6.1.109
  OS Image:                   Bottlerocket OS 1.24.0 (aws-k8s-1.29)
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  containerd://1.7.22+bottlerocket
  Kubelet Version:            v1.29.5-eks-1109419
  Kube-Proxy Version:         v1.29.5-eks-1109419
ProviderID:                   aws:///us-west-2b/i-028527e376b17a21e
Non-terminated Pods:          (29 in total)
  Namespace                   Name                                                     CPU Requests  CPU Limits    Memory Requests  Memory Limits    Age
  ---------                   ----                                                     ------------  ----------    ---------------  -------------    ---
  alb-controller              alb-controller-dd56b78d6-whkgc                           11m (1%)      100m (10%)    83464877 (2%)    312005503 (9%)   25m
  alloy                       alloy-tnqng                                              34m (3%)      100m (10%)    179272160 (5%)   429137520 (13%)  3d
  argo                        argo-events-controller-manager-68487594-rr7jb            11m (1%)      100m (10%)    88707757 (2%)    334870395 (10%)  81m
  argo                        events-webhook-6d98c7b976-rcc25                          11m (1%)      100m (10%)    46739508 (1%)    267721196 (8%)   25m
  authentik                   redis-4833-node-0                                        56m (5%)      100m (10%)    107425154 (3%)   305613202 (9%)   8m59s
  aws-ebs-csi-driver          ebs-csi-controller-676849595f-rzg5x                      66m (7%)      100m (10%)    177293248 (5%)   465731052 (14%)  79m
  aws-ebs-csi-driver          ebs-csi-node-bwh2g                                       33m (3%)      100m (10%)    81814506 (2%)    323841192 (9%)   6m16s
  cert-manager                cert-manager-cainjector-6f67f8649c-mknt8                 11m (1%)      100m (10%)    155131523 (4%)   397754691 (12%)  6m17s
  cert-manager                cert-manager-webhook-66db579977-xfgnw                    11m (1%)      100m (10%)    34060758 (1%)    240362697 (7%)   25m
  cilium                      cilium-xqp62                                             100m (10%)    0 (0%)        380258472 (11%)  494336013 (15%)  155m
  cloudnative-pg              cloudnative-pg-787ff9548d-lq79d                          11m (1%)      100m (10%)    155131523 (4%)   397754691 (12%)  158m
  external-snapshotter        external-snapshotter-webhook-7d7c8c678d-6hvxg            11m (1%)      100m (10%)    34060758 (1%)    240362697 (7%)   76m
  implentio                   eventbus-default-js-0                                    33m (3%)      100m (10%)    57060758 (1%)    270262697 (8%)   9m17s
  kube-system                 core-dns-664d5dfc4f-bqdxs                                34m (3%)      0 (0%)        99798506 (3%)    129738057 (4%)   93m
  linkerd                     linkerd-identity-69bb59b957-n4zlb                        22m (2%)      100m (10%)    35074998 (1%)    253574998 (7%)   158m
  linkerd                     linkerd-proxy-injector-6d5778cb4d-q9dtl                  11m (1%)      100m (10%)    74030518 (2%)    336804716 (10%)  3h32m
  logging                     loki-backend-2                                           22m (2%)      100m (10%)    307818158 (9%)   474524292 (14%)  150m
  logging                     loki-canary-grzjp                                        11m (1%)      100m (10%)    41496628 (1%)    256845072 (7%)   3d12h
  logging                     loki-read-7f98fd5b98-n6fn7                               23m (2%)      100m (10%)    235870026 (7%)   502714745 (15%)  69m
  logging                     redis-de1a-node-2                                        56m (5%)      100m (10%)    121564556 (3%)   339167634 (10%)  6h5m
  metabase                    pg-bce1-pooler-rw-6c687bcc4-cgktr                        10m (1%)      100m (10%)    60Mi (1%)        280Mi (9%)       5h21m
  monitoring                  node-exporter-gm8dh                                      22m (2%)      0 (0%)        47149996 (1%)    61294994 (1%)    17h
  monitoring                  oauth2-proxy-ec6215c0214caf95-5f5f4c6dbb-bxh25           11m (1%)      100m (10%)    28817878 (0%)    51619017 (1%)    8m29s
  monitoring                  open-telemetry-opentelemetry-operator-9c9f7f65c-t8xwd    22m (2%)      1111m (118%)  83627194 (2%)    355998068 (10%)  85m
  pvc-autoresizer             pvc-autoresizer-controller-5775b9dfff-nntft              23m (2%)      100m (10%)    46739508 (1%)    256845072 (7%)   81m
  secrets-csi                 secrets-csi-h4qgs                                        33m (3%)      264m (28%)    69135756 (2%)    285960194 (8%)   5h25m
  vault                       vault-2                                                  35m (3%)      100m (10%)    258639240 (7%)   532314724 (16%)  85m
  vault                       vault-csi-provider-2dsjz                                 22m (2%)      100m (10%)    58239508 (1%)    271795072 (8%)   25h
  vertical-pod-autoscaler     vpa-admission-controller-5584bfb85d-jcjlw                11m (1%)      100m (10%)    74030518 (2%)    292323385 (9%)   66m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests          Limits
  --------           --------          ------
  cpu                767m (81%)        3775m (401%)
  memory             3225368550 (99%)  9174874866 (283%)
  ephemeral-storage  200Mi (0%)        200Mi (0%)
  hugepages-1Gi      0 (0%)            0 (0%)
  hugepages-2Mi      0 (0%)            0 (0%)
  hugepages-32Mi     0 (0%)            0 (0%)
  hugepages-64Ki     0 (0%)            0 (0%)
Events:
  Type    Reason             Age                 From       Message
  ----    ------             ----                ----       -------
  Normal  DisruptionBlocked  30s (x40 over 81m)  karpenter  Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim

What primary components of the stack does this relate to?

terraform

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

questionFurther information is requestedtriagingNeeds to be triaged

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions