Karpenter nodes get stuck on "NotReady" state #1415

devopsjnr · 2022-02-24T23:07:04Z

Version 0.5.6

Karpenter has started creating new nodes and work as expected.
After a while (approximately 40min), some nodes are switching from Ready to NotReady. They stay like that for hours.. nothing moves.
It feels like it happens randomly, most of the pods inside the NotReady nodes are in "running" state and then moving to "Terminating".

Provisioner has ttlSecondsAfterEmpty: 60 and ttlSecondsUntilExpired isn't defined.

Posting the Node description Events:

NodeHasDiskPressure - I think Karpenter nodes are starting with 20gb disk. Is it possible to extend the disk size through the provisioner? It may help with this situation.

Here is another Node that has just has became NotReady . This time I can't really understand why:

The text was updated successfully, but these errors were encountered:

ellistarn · 2022-02-24T23:28:07Z

Interesting -- it looks like your pods are consuming the ephemeral storage on the node. Can you list the pod specs applied to the node? Can you provide provisioner specs as well?

devopsjnr · 2022-02-24T23:44:31Z

This is an example of one of the pods spec:

spec:
  containers:
  - env:
    - name: JAVA_TOOL_OPTIONS
      value: -Xmx750m -Xms750m
    - name: prf-url
      value: prf:8080/v1
    - name: custom-url
      value: custom:8080/v1
    - name: con-url
    image: <image>
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health/liveness
        port: 8080
        scheme: HTTP
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 3
    name: <name>
    ports:
    - containerPort: 8080
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health/readiness
        port: 8080
        scheme: HTTP
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 3
    resources: {}
    securityContext: {}
    startupProbe:
      failureThreshold: 14
      httpGet:
        path: /actuator/health/liveness
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: <name>
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: <name>
  - name: <name>
  initContainers:
  - args:
    - migrate
    env:
    - name: FLYWAY_CONFIG_FILES
      value: /flyway/configs/flyway.conf
    image: <image>
    imagePullPolicy: Always
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /flyway/migrations
      name: <name>
    - mountPath: /flyway/configs
      name: <name>
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: <name>
      readOnly: true
  nodeName: ip-172-31-9-224.<region>.compute.internal
  nodeSelector:
    env: integration
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: <sa>
  serviceAccountName: <name>
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      name: <name>
    name: <name>
  - configMap:
      defaultMode: 420
      name: <name>
    name: <name>
  - name: kube-api-access
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

This is the provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: integration-provisioner
spec:
  requirements:
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["m5.large", "m5.xlarge"]
  labels:
    env: integration
  limits:
    resources:
      cpu: 1000
  provider:
    subnetSelector: 
      Name: integration-private
    securityGroupSelector: 
      aws:eks:cluster-name: integration
  ttlSecondsAfterEmpty: 60

ellistarn · 2022-02-25T00:51:43Z

Great. I see you're not using custom launch templates or bottlerocket, so that simplifies some concerns I had about the disk.

I'm focused on the log line NodeHasDiskPressure. You're running out of disk due to the some combination of image size, logging output, etc. Have you run this workload without Karpenter before? Can you check your EC2 console and see how big the root EBS volume is on this instance? I assume it should be 20GB. Then it may be worth connecting to the instance aws ssm start-session --target $INSTANCE_ID and checking whats using up your disk with something like du -h.

devopsjnr · 2022-02-25T06:55:54Z

@ellistarn I have ran this workload with Node Group before, EBS volume used to be 200GB. Now with Karpenter it has only 20GB (I think this is the default size by eks). Hence I asked if there's a way to choose the volume size from the provisioner, or 20gb is unchangeable. I really rather not dealing with launch templates.

devopsjnr · 2022-02-25T08:56:07Z

@ellistarn Now there's a new message. all these pods used to work before within a Node Group. The provisioner configurations are similar to those used to be in the node Group.

"failed to garbage collect required amount of images. Wanted to free X bytes, but freed 0 bytes"
"System OOM encountered, victim process: java, pid: 17173"

devopsjnr · 2022-02-25T17:54:25Z

@ellistarn Do you have any insights?
My cluster is burning.

ellistarn · 2022-02-25T18:47:14Z

@bwagner5 is working on this right now.
#939

devopsjnr · 2022-02-26T07:44:02Z

Thank you. In the mean time I am running with my own launch template. Now disk size is 200gb.
However, nodes are still moving from Ready to NotReady state some time after they come up, and they stay get stuck like that for hours, I need to delete them manually (all pods are in Terminating).

@ellistarn @bwagner5 Any idea why this keeps happening?

This is how my cluster looking currently. I really need your advise because the env is down and other people are using it :(

tzneal · 2022-02-26T13:49:33Z

The example pod spec you have listed above doesn't have any resource requests. Without that, I believe karpenter will pack nodes until it reaches the ENI limit for pods which varies based on instance type. Are you seeing lots of OOM errors, or does kubectl describe node nodename show memory pressure issues?

devopsjnr · 2022-02-26T14:08:20Z

@tzneal I do see OMM errors.

This is an example of one of the nodes description:

Allocated resources:

  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests    Limits
  --------                    --------    ------
  cpu                         800m (20%)  1 (25%)
  memory                      712Mi (4%)  1224Mi (8%)
  ephemeral-storage           0 (0%)      0 (0%)
  hugepages-1Gi               0 (0%)      0 (0%)
  hugepages-2Mi               0 (0%)      0 (0%)
  attachable-volumes-aws-ebs  0           0
Events:
  Type     Reason                   Age                   From     Message
  ----     ------                   ----                  ----     -------
  Normal   NodeNotReady             57m (x6 over 3h12m)   kubelet  Node ip-172-31-9-97.<region>.compute.internal status is now: NodeNotReady
  Warning  SystemOOM                57m                   kubelet  System OOM encountered, victim process: java, pid: 1357
  Warning  SystemOOM                48m                   kubelet  System OOM encountered, victim process: java, pid: 648
  Warning  SystemOOM                36m                   kubelet  System OOM encountered, victim process: java, pid: 4364
  Warning  SystemOOM                19m                   kubelet  System OOM encountered, victim process: java, pid: 12247
  Normal   NodeHasNoDiskPressure    19m (x14 over 3h12m)  kubelet  Node ip-172-31-9-97.<region>.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientMemory  19m (x14 over 3h12m)  kubelet  Node ip-172-31-9-97.<region>.compute.internal status is now: NodeHasSufficientMemory
  Warning  SystemOOM                13m                   kubelet  System OOM encountered, victim process: java, pid: 5325

Should I add memory resource limits or requests to the provisioner itself?

tzneal · 2022-02-26T14:15:49Z

Karpenter has resource requests already defined in it's helm chart for itself.

In my experience, you really need memory resource requests on your containers or scheduling won't work well within Kubernetes regardless of usage of Karpenter or any other auto-scaler.

If you look at your node output above, it says that Kubernetes is only aware of 712Mi of memory requests, but you have Java processes getting OOM killed by the kernel, so you are running out of physical memory on the node.

tzneal · 2022-02-26T14:41:35Z

Setting resource requests is also a best practice listed here

It's a best practice to define these requests and limits in your pod definitions. If you don't include these values, the scheduler doesn't understand what resources are needed. Without this information, the scheduler might schedule the pod on a node without sufficient resources to provide acceptable application performance.

devopsjnr · 2022-02-26T17:19:30Z

@tzneal Thanks again for your attention.

Karpenter has resource requests already defined in it's helm chart for itself.

So what is the meaning of resource requests and limits in the provisioner's spec ? (you can see my provisioner above for reference)

In my experience, you really need memory resource requests on your containers or scheduling won't work well within Kubernetes regardless of usage of Karpenter or any other auto-scaler.

Are you certain that adding resources requests and limits for my deployments will solve the issue?

ellistarn · 2022-02-26T17:59:42Z

This might help https://aws.github.io/aws-eks-best-practices/karpenter/#use-quotas-or-limitranges-to-force-people-to-set-resource-requests-andor-configure-sensible-defaults-for-requestslimits

bwagner5 · 2022-03-10T23:28:39Z

Block Device Mappings PR has been merged and should be released next week. You can checkout the preview docs here: https://karpenter.sh/preview/aws/provisioning/#block-device-mappings

If you are fine with the other defaults Karpenter is currently providing, you should be able to use the following mapping once we do a release:

spec:
  provider:
    blockDeviceMappings:
      - deviceName: /dev/xvda
        volumeSize: 200Gi
        volumeType: gp3
        encrypted: true

devopsjnr added the bug Something isn't working label Feb 24, 2022

bwagner5 self-assigned this Feb 25, 2022

bwagner5 mentioned this issue Feb 25, 2022

Add BlockDeviceMappings to the AWS cloudprovider #1420

Merged

3 tasks

bwagner5 changed the title ~~Karpenter nods get stuck on "NotReady" state~~ Karpenter nodes get stuck on "NotReady" state Mar 9, 2022

bwagner5 closed this as completed Mar 10, 2022

ellistarn mentioned this issue Dec 19, 2022

Node Repair kubernetes-sigs/karpenter#750

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter nodes get stuck on "NotReady" state #1415

Karpenter nodes get stuck on "NotReady" state #1415

devopsjnr commented Feb 24, 2022 •

edited

Loading

ellistarn commented Feb 24, 2022

devopsjnr commented Feb 24, 2022

ellistarn commented Feb 25, 2022

devopsjnr commented Feb 25, 2022

devopsjnr commented Feb 25, 2022

devopsjnr commented Feb 25, 2022 •

edited

Loading

ellistarn commented Feb 25, 2022

devopsjnr commented Feb 26, 2022 •

edited

Loading

tzneal commented Feb 26, 2022

devopsjnr commented Feb 26, 2022

tzneal commented Feb 26, 2022

tzneal commented Feb 26, 2022

devopsjnr commented Feb 26, 2022

ellistarn commented Feb 26, 2022

bwagner5 commented Mar 10, 2022

Karpenter nodes get stuck on "NotReady" state #1415

Karpenter nodes get stuck on "NotReady" state #1415

Comments

devopsjnr commented Feb 24, 2022 • edited Loading

ellistarn commented Feb 24, 2022

devopsjnr commented Feb 24, 2022

ellistarn commented Feb 25, 2022

devopsjnr commented Feb 25, 2022

devopsjnr commented Feb 25, 2022

devopsjnr commented Feb 25, 2022 • edited Loading

ellistarn commented Feb 25, 2022

devopsjnr commented Feb 26, 2022 • edited Loading

tzneal commented Feb 26, 2022

devopsjnr commented Feb 26, 2022

tzneal commented Feb 26, 2022

tzneal commented Feb 26, 2022

devopsjnr commented Feb 26, 2022

ellistarn commented Feb 26, 2022

bwagner5 commented Mar 10, 2022

devopsjnr commented Feb 24, 2022 •

edited

Loading

devopsjnr commented Feb 25, 2022 •

edited

Loading

devopsjnr commented Feb 26, 2022 •

edited

Loading