Daemonset-driven consolidation #731

wdonne · 2023-01-23T20:32:48Z

Version

Karpenter Version: v0.22.1

Kubernetes Version: v1.24.8

Hi,

I have set up Karpenter with the following cluster configuration:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: klss
  region: eu-central-1
  version: "1.24"
  tags:
    karpenter.sh/discovery: klss
managedNodeGroups:
  - instanceType: t3.small
    amiFamily: AmazonLinux2
    name: karpenter
    desiredCapacity: 2
    minSize: 2
    maxSize: 2
iam:
  withOIDC: true

This is the provisioner:

---
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
        - "on-demand"
        - "spot"
    - key: "kubernetes.io/arch"
      operator: In
      values:
        - "arm64"
        - "amd64"
    - key: "topology.kubernetes.io/zone"
      operator: In
      values:
        - "eu-central-1a"
        - "eu-central-1b"
        - "eu-central-1c"
  limits:
[nodes.zip](https://github.com/aws/karpenter/files/10483452/nodes.zip)

    resources:
      cpu: 32
      memory: 64Gi
  providerRef:
    name: default
  consolidation:
    enabled: true
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector:
    karpenter.sh/discovery: klss
  securityGroupSelector:
    karpenter.sh/discovery: klss

Karpenter has currently provisioned three spot instances. When installing Prometheus with Helm chart version 19.3.1, two of the five node exporters can't be scheduled. The message is: "0/5 nodes are available: 1 Too many pods. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.". The Karpenter controllers didn't output any log entries.

This is the values file for the chart:

prometheus:
  serviceAccounts:
    server:
      create: false
      name: "amp-iamproxy-ingest-service-account"
  server:
    remoteWrite:
      - url: https://aps-workspaces.eu-central-1.amazonaws.com/workspaces/xxxxxxxxxxxxxxxxxxxxx/api/v1/query
        sigv4:
          region: eu-central-1
        queue_config:
          max_samples_per_send: 1000
          max_shards: 200
          capacity: 2500
    persistentVolume:
      enabled: false

This is the live manifest of the DaemonSet of the Prometheus node exporter:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: '1'
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"metrics","app.kubernetes.io/instance":"prometheus","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"prometheus-node-exporter","app.kubernetes.io/part-of":"prometheus-node-exporter","app.kubernetes.io/version":"1.5.0","helm.sh/chart":"prometheus-node-exporter-4.8.1"},"name":"prometheus-prometheus-node-exporter","namespace":"prometheus"},"spec":{"selector":{"matchLabels":{"app.kubernetes.io/instance":"prometheus","app.kubernetes.io/name":"prometheus-node-exporter"}},"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict":"true"},"labels":{"app.kubernetes.io/component":"metrics","app.kubernetes.io/instance":"prometheus","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"prometheus-node-exporter","app.kubernetes.io/part-of":"prometheus-node-exporter","app.kubernetes.io/version":"1.5.0","helm.sh/chart":"prometheus-node-exporter-4.8.1"}},"spec":{"automountServiceAccountToken":false,"containers":[{"args":["--path.procfs=/host/proc","--path.sysfs=/host/sys","--path.rootfs=/host/root","--web.listen-address=[$(HOST_IP)]:9100"],"env":[{"name":"HOST_IP","value":"0.0.0.0"}],"image":"quay.io/prometheus/node-exporter:v1.5.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"failureThreshold":3,"httpGet":{"httpHeaders":null,"path":"/","port":9100,"scheme":"HTTP"},"initialDelaySeconds":0,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1},"name":"node-exporter","ports":[{"containerPort":9100,"name":"metrics","protocol":"TCP"}],"readinessProbe":{"failureThreshold":3,"httpGet":{"httpHeaders":null,"path":"/","port":9100,"scheme":"HTTP"},"initialDelaySeconds":0,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1},"securityContext":{"allowPrivilegeEscalation":false},"volumeMounts":[{"mountPath":"/host/proc","name":"proc","readOnly":true},{"mountPath":"/host/sys","name":"sys","readOnly":true},{"mountPath":"/host/root","mountPropagation":"HostToContainer","name":"root","readOnly":true}]}],"hostNetwork":true,"hostPID":true,"securityContext":{"fsGroup":65534,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534},"serviceAccountName":"prometheus-prometheus-node-exporter","tolerations":[{"effect":"NoSchedule","operator":"Exists"}],"volumes":[{"hostPath":{"path":"/proc"},"name":"proc"},{"hostPath":{"path":"/sys"},"name":"sys"},{"hostPath":{"path":"/"},"name":"root"}]}},"updateStrategy":{"rollingUpdate":{"maxUnavailable":1},"type":"RollingUpdate"}}}
  creationTimestamp: '2023-01-23T19:32:19Z'
  generation: 1
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: prometheus-node-exporter
    app.kubernetes.io/part-of: prometheus-node-exporter
    app.kubernetes.io/version: 1.5.0
    helm.sh/chart: prometheus-node-exporter-4.8.1
  name: prometheus-prometheus-node-exporter
  namespace: prometheus
  resourceVersion: '1156021'
  uid: 3659924e-2902-4651-aa2a-1d20a1dc1ce7
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: prometheus
      app.kubernetes.io/name: prometheus-node-exporter
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: metrics
        app.kubernetes.io/instance: prometheus
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: prometheus-node-exporter
        app.kubernetes.io/part-of: prometheus-node-exporter
        app.kubernetes.io/version: 1.5.0
        helm.sh/chart: prometheus-node-exporter-4.8.1
    spec:
      automountServiceAccountToken: false
      containers:
        - args:
            - '--path.procfs=/host/proc'
            - '--path.sysfs=/host/sys'
            - '--path.rootfs=/host/root'
            - '--web.listen-address=[$(HOST_IP)]:9100'
          env:
            - name: HOST_IP
              value: 0.0.0.0
          image: 'quay.io/prometheus/node-exporter:v1.5.0'
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: 9100
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          name: node-exporter
          ports:
            - containerPort: 9100
              hostPort: 9100
              name: metrics
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: 9100
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /host/proc
              name: proc
              readOnly: true
            - mountPath: /host/sys
              name: sys
              readOnly: true
            - mountPath: /host/root
              mountPropagation: HostToContainer
              name: root
              readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
      hostPID: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
      serviceAccount: prometheus-prometheus-node-exporter
      serviceAccountName: prometheus-prometheus-node-exporter
      terminationGracePeriodSeconds: 30
      tolerations:
        - effect: NoSchedule
          operator: Exists
      volumes:
        - hostPath:
            path: /proc
            type: ''
          name: proc
        - hostPath:
            path: /sys
            type: ''
          name: sys
        - hostPath:
            path: /
            type: ''
          name: root
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 5
  desiredNumberScheduled: 5
  numberAvailable: 3
  numberMisscheduled: 0
  numberReady: 3
  numberUnavailable: 2
  observedGeneration: 1
  updatedNumberScheduled: 5

This is the live manifest of one of the pods that can't be scheduled:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
    kubernetes.io/psp: eks.privileged
  creationTimestamp: '2023-01-23T19:32:19Z'
  generateName: prometheus-prometheus-node-exporter-
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: prometheus-node-exporter
    app.kubernetes.io/part-of: prometheus-node-exporter
    app.kubernetes.io/version: 1.5.0
    controller-revision-hash: 7b4cd87594
    helm.sh/chart: prometheus-node-exporter-4.8.1
    pod-template-generation: '1'
  name: prometheus-prometheus-node-exporter-9c5s5
  namespace: prometheus
  ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: DaemonSet
      name: prometheus-prometheus-node-exporter
      uid: 3659924e-2902-4651-aa2a-1d20a1dc1ce7
  resourceVersion: '1155915'
  uid: 98b0cea4-68fe-47ea-83f1-231d5b5809ca
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchFields:
              - key: metadata.name
                operator: In
                values:
                  - ip-192-168-14-194.eu-central-1.compute.internal
  automountServiceAccountToken: false
  containers:
    - args:
        - '--path.procfs=/host/proc'
        - '--path.sysfs=/host/sys'
        - '--path.rootfs=/host/root'
        - '--web.listen-address=[$(HOST_IP)]:9100'
      env:
        - name: HOST_IP
          value: 0.0.0.0
      image: 'quay.io/prometheus/node-exporter:v1.5.0'
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 3
        httpGet:
          path: /
          port: 9100
          scheme: HTTP
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      name: node-exporter
      ports:
        - containerPort: 9100
          hostPort: 9100
          name: metrics
          protocol: TCP
      readinessProbe:
        failureThreshold: 3
        httpGet:
          path: /
          port: 9100
          scheme: HTTP
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      resources: {}
      securityContext:
        allowPrivilegeEscalation: false
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /host/proc
          name: proc
          readOnly: true
        - mountPath: /host/sys
          name: sys
          readOnly: true
        - mountPath: /host/root
          mountPropagation: HostToContainer
          name: root
          readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  hostPID: true
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 65534
    runAsGroup: 65534
    runAsNonRoot: true
    runAsUser: 65534
  serviceAccount: prometheus-prometheus-node-exporter
  serviceAccountName: prometheus-prometheus-node-exporter
  terminationGracePeriodSeconds: 30
  tolerations:
    - effect: NoSchedule
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/disk-pressure
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/memory-pressure
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/pid-pressure
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/unschedulable
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/network-unavailable
      operator: Exists
  volumes:
    - hostPath:
        path: /proc
        type: ''
      name: proc
    - hostPath:
        path: /sys
        type: ''
      name: sys
    - hostPath:
        path: /
        type: ''
      name: root
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2023-01-23T19:32:19Z'
      message: >-
        0/5 nodes are available: 1 Too many pods. preemption: 0/5 nodes are
        available: 5 No preemption victims found for incoming pod.
      reason: Unschedulable
      status: 'False'
      type: PodScheduled
  phase: Pending
  qosClass: BestEffort

I also did a test with the node selector "karpenter.sh/capacity-type: on-demand". Then one of the spot instances is deleted, but no new instance is created. The DaemonSet also doesn't create any pods.

PR aws/karpenter-provider-aws#1155 should have fixed the issue of DaemonSets not being part of the scaling decision, but perhaps this is a special case? The node exporter wants a pod on each node because it wants to tap telemetry.

Best regards,

Werner.

Expected Behavior

An extra node to be provisioned.

Actual Behavior

No extra ode is provisioned while two DaemonSet pods can't be scheduled.

Steps to Reproduce the Problem

I did this when there were already three Karpenter nodes, but I think you can just install Prometheus because the nodes are not full.

Resource Specs and Logs

karpenter-6d57cdbbd6-dqgcj.log
karpenter-6d57cdbbd6-lsv9f.log

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

wdonne · 2023-01-23T20:36:28Z

nodes.zip

tzneal · 2023-01-23T20:49:10Z

This is the current expected behavior. Karpenter provisions enough capacity for the pods and daemonsets that exist at the time of scheduling. When you add a new daemonset, for this to work properly, Karpenter would need to replace any existing nodes that the daemonset won't fit on.

ellistarn · 2023-01-23T21:22:27Z

We typically recommend that you set a high priority for daemonsets to cover this use case. When scaling up a daemonset, it will trigger eviction for existing pods, which will feed back into karpenter's provisioning algorithm.

https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#how-to-use-priority-and-preemption

tzneal · 2023-01-23T21:33:18Z

I'll update the FAQ to cover this.

wdonne · 2023-01-23T22:56:00Z

I forgot to mention that I tried setting the priorityClassField of the DaemonSet to system-node-critical and also once to system-cluster-critical. In both cases all pods were scheduled, but both Karpenter controllers were evicted. I will try to avoid this by changing the pod disruption budget in the values file of Karpenter’s Helm chart.

wdonne · 2023-01-24T10:01:34Z

I could avoid the eviction of the Karpenter controllers, which have priority system-cluster-critical, by giving the Prometheus DaemonSet a PriorityClass with a priority half of system-cluster-critical. So, it works. Thanks!

wdonne · 2023-01-24T15:53:19Z

This trick doesn't always work. I removed Prometheus and installed the AWS CloudWatch agent as a DaemonSet. They also have priority of 1000000000. One of the four pods can't be scheduled, but no node is added.

wdonne · 2023-01-25T09:42:16Z

Here are some files that reflect the new situation.
files.zip

According to the generated nodeAffinity the node onto which the pod is supposed to be scheduled doesn't have enough memory. Shouldn't this evict other pods? There are several with priority 0. Without eviction Karpenter has no reason to provision a new node.

ospiegel91 · 2023-01-30T07:46:09Z

This feature is very necessary, Karpenter should auto adjust upon introducing a new daemonset. We should NOT have to set priority classes on every single resource in the cluster. The correct solution here would be that if a node cant fit a newly installed daemonset pod due to CPU/RAM a new bigger node should be auto ordered that is bigger and could all pods that were housed on old and nodes as well as the daemonset.

billrayburn · 2023-05-31T18:18:43Z

@tzneal Added kubernetes/website#40851

wdonne · 2023-06-01T07:23:28Z

I have been using the following Kyverno policy to make sure DaemonSets have the right priority class:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-priority-class
  annotations:
    policies.kyverno.io/title: Add priority class for DaemonSets to help Karpenter.
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/minversion: 1.6.0
    policies.kyverno.io/description: Add priority class for DaemonSets to help Karpenter.
spec:
  rules:
    - name: add-priority-class-context
      match:
        any:
          - resources:
              kinds:
                - DaemonSet
      mutate:
        patchStrategicMerge:
          spec:
            template:
              spec:
                priorityClassName: system-node-critical

tzneal · 2023-06-01T12:28:36Z

Nice @wdonne , Kyverno might be interested in taking that upstream. I think it would be useful for any autoscaler.

See
https://kyverno.io/policies/?policytypes=Karpenter and https://github.com/kyverno/policies/tree/main/karpenter

wdonne · 2023-06-02T14:45:57Z

Hi @tzneal , thanks for the tip. I have created a pull request: kyverno/policies#631.

If it gets merged I will create another one called "set-karpenter-non-cpu-limits". It relates to a best practice when using consolidation mode.

I have a third one that sets the annotation kubernetes.io/arch: arm64 if there is no such annotation. This way arm64 becomes the default. You would then only have to set the annotation for images that are not multi-architecture. This is not really a best practice. Would it fit in the same folder in the Kyverno policies project?

ospiegel91 · 2023-06-19T11:10:15Z

I have been using the following Kyverno policy to make sure DaemonSets have the right priority class:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-priority-class
  annotations:
    policies.kyverno.io/title: Add priority class for DaemonSets to help Karpenter.
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/minversion: 1.6.0
    policies.kyverno.io/description: Add priority class for DaemonSets to help Karpenter.
spec:
  rules:
    - name: add-priority-class-context
      match:
        any:
          - resources:
              kinds:
                - DaemonSet
      mutate:
        patchStrategicMerge:
          spec:
            template:
              spec:
                priorityClassName: system-node-critical

this is a NOT a solution. It will not solve 99% of cases. It simply prioritizes daemonsets over non system critical items.
But what if a node ordered by Karpenter cant even fit all system critical pods.... then issue not solved at all. So again , the priority class is "cute". But it is NOT a solution.

Karpenter team, I at least by all means consider your product incomplete as is. please pay attention to this comment:
#731
This is not a feature request. This is a bug.

Community please upvote so AWS understands that it charges money for an incomplete product. Let's not give them the idea that this ticket is somehow optional.

galarbel · 2023-11-15T23:21:18Z

I have been using the following Kyverno policy to make sure DaemonSets have the right priority class:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-priority-class
  annotations:
    policies.kyverno.io/title: Add priority class for DaemonSets to help Karpenter.
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/minversion: 1.6.0
    policies.kyverno.io/description: Add priority class for DaemonSets to help Karpenter.
spec:
  rules:
    - name: add-priority-class-context
      match:
        any:
          - resources:
              kinds:
                - DaemonSet
      mutate:
        patchStrategicMerge:
          spec:
            template:
              spec:
                priorityClassName: system-node-critical
this is a NOT a solution. It will not solve 99% of cases. It simply prioritizes daemonsets over non system critical items. But what if a node ordered by Karpenter cant even fit all system critical pods.... then issue not solved at all. So again , the priority class is "cute". But it is NOT a solution.

Karpenter team, I at least by all means consider your product incomplete as is. please pay attention to this comment: #731 This is not a feature request. This is a bug.

Community please upvote so AWS understands that it charges money for an incomplete product. Let's not give them the idea that this ticket is somehow optional.

couldn't've said it better myself.
we're facing the same issue too.

can't believe this bug is open for nearly a year :(

k8s-triage-robot · 2024-02-14T00:21:19Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Bryce-Soghigian · 2024-02-14T00:22:06Z

/remove-lifecycle stale

sftim · 2024-03-02T14:41:18Z

can't believe this bug is open for nearly a year :(

As this is an open source project: code contributions are welcome. If nobody writes the code, it doesn't get merged.

k8s-triage-robot · 2024-05-31T15:01:15Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

tinhhn · 2024-06-08T01:24:00Z

/remove-lifecycle stale

kpiroto · 2024-06-28T13:26:37Z

/remove-lifecycle stale

Idan-Lazar · 2024-08-16T11:25:52Z

any update?

xer0devit · 2024-10-08T09:16:37Z

Requesting AWS to release formal update in which Karpenter trully lives into "Just-in-time Nodes for Any Kubernetes Cluster" even for this case.

Depending on a plethora of workarounds that doesn't cover all cases tends to question the operational, production readiness of what's now considered a stable product, put in the market as proclaimed new default.

evaleah · 2024-10-17T12:11:02Z

We are now many versions on and this is still an issue. Any updates?

k8s-triage-robot · 2025-01-15T12:42:09Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pdf · 2025-01-15T13:39:47Z

/remove-lifecycle stale

wdonne added the kind/bug Categorizes issue or PR as related to a bug. label Jan 23, 2023

tzneal added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jan 23, 2023

tzneal self-assigned this Jan 23, 2023

tzneal added the documentation label Jan 23, 2023

ellistarn changed the title ~~Prometheus DaemonSet doesn't trigger node provisioning~~ Daemonset-driven consolidation Feb 22, 2023

maxforasteiro mentioned this issue Apr 1, 2023

DaemonSets not being correctly calculated when choosing a node #715

Open

billrayburn mentioned this issue May 3, 2023

Support daemonsets with VPA aws/karpenter-provider-aws#799

Closed

billrayburn unassigned tzneal May 31, 2023

wdonne mentioned this issue Jun 6, 2023

Add add-karpenter-daemonset-priority-class and set-karpenter-non-cpu-limits kyverno/policies#631

Merged

3 tasks

billrayburn assigned chrisnegus and jonathan-innis and unassigned chrisnegus Jun 28, 2023

jonathan-innis mentioned this issue Jun 29, 2023

docs: Add new DaemonSet FAQ for new DaemonSets deployed after Nodes exist aws/karpenter-provider-aws#4161

Merged

1 task

jonathan-innis removed the documentation label Jul 5, 2023

jonathan-innis removed their assignment Jul 5, 2023

jonathan-innis mentioned this issue Jul 5, 2023

Karpenter is not respecting per-node Daemonsets aws/karpenter-provider-aws#1649

Closed

jonathan-innis mentioned this issue Aug 3, 2023

Karpenter not replacing a node if you created daemonset and node doesn't have enough capacity aws/karpenter-provider-aws#4242

Closed

ellistarn added the v1.x Issues prioritized for post-1.0 label Aug 7, 2023

njtran transferred this issue from aws/karpenter-provider-aws Nov 2, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2024

JulesClaussen mentioned this issue Nov 19, 2024

Insufficient memory for daemonset - no new node aws/karpenter-provider-aws#7406

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 15, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 15, 2025

jonathan-innis mentioned this issue Feb 17, 2025

Karpenter is not auto creating better machines for daemon sets aws/karpenter-provider-aws#7705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daemonset-driven consolidation #731

Daemonset-driven consolidation #731

wdonne commented Jan 23, 2023 •

edited

Loading

wdonne commented Jan 23, 2023

tzneal commented Jan 23, 2023

ellistarn commented Jan 23, 2023

tzneal commented Jan 23, 2023

wdonne commented Jan 23, 2023

wdonne commented Jan 24, 2023

wdonne commented Jan 24, 2023

wdonne commented Jan 25, 2023 •

edited

Loading

ospiegel91 commented Jan 30, 2023 •

edited

Loading

billrayburn commented May 31, 2023

wdonne commented Jun 1, 2023

tzneal commented Jun 1, 2023

wdonne commented Jun 2, 2023

ospiegel91 commented Jun 19, 2023 •

edited

Loading

galarbel commented Nov 15, 2023 •

edited

Loading

k8s-triage-robot commented Feb 14, 2024

Bryce-Soghigian commented Feb 14, 2024

sftim commented Mar 2, 2024

k8s-triage-robot commented May 31, 2024

tinhhn commented Jun 8, 2024

kpiroto commented Jun 28, 2024

Idan-Lazar commented Aug 16, 2024

xer0devit commented Oct 8, 2024 •

edited

Loading

evaleah commented Oct 17, 2024

k8s-triage-robot commented Jan 15, 2025

pdf commented Jan 15, 2025

Daemonset-driven consolidation #731

Daemonset-driven consolidation #731

Comments

wdonne commented Jan 23, 2023 • edited Loading

Version

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Resource Specs and Logs

Community Note

wdonne commented Jan 23, 2023

tzneal commented Jan 23, 2023

ellistarn commented Jan 23, 2023

tzneal commented Jan 23, 2023

wdonne commented Jan 23, 2023

wdonne commented Jan 24, 2023

wdonne commented Jan 24, 2023

wdonne commented Jan 25, 2023 • edited Loading

ospiegel91 commented Jan 30, 2023 • edited Loading

billrayburn commented May 31, 2023

wdonne commented Jun 1, 2023

tzneal commented Jun 1, 2023

wdonne commented Jun 2, 2023

ospiegel91 commented Jun 19, 2023 • edited Loading

galarbel commented Nov 15, 2023 • edited Loading

k8s-triage-robot commented Feb 14, 2024

Bryce-Soghigian commented Feb 14, 2024

sftim commented Mar 2, 2024

k8s-triage-robot commented May 31, 2024

tinhhn commented Jun 8, 2024

kpiroto commented Jun 28, 2024

Idan-Lazar commented Aug 16, 2024

xer0devit commented Oct 8, 2024 • edited Loading

evaleah commented Oct 17, 2024

k8s-triage-robot commented Jan 15, 2025

pdf commented Jan 15, 2025

wdonne commented Jan 23, 2023 •

edited

Loading

wdonne commented Jan 25, 2023 •

edited

Loading

ospiegel91 commented Jan 30, 2023 •

edited

Loading

ospiegel91 commented Jun 19, 2023 •

edited

Loading

galarbel commented Nov 15, 2023 •

edited

Loading

xer0devit commented Oct 8, 2024 •

edited

Loading