Skip to content

pod/airflow-worker-0 gets stuck on pending with error: 1 node(s) didn't match Pod's node affinity/selector.  #33077

@StathisKap

Description

@StathisKap

Official Helm Chart version

1.10.0 (latest released)

Apache Airflow version

2.6.2

Kubernetes Version

k3s latest

Helm Chart configuration

under worker

  extraVolumes: []
  extraVolumeMounts: []

  # Select certain nodes for airflow worker pods.
  nodeSelector:
    node-role.kubernetes.io/airflow-worker: "true"
  priorityClassName: ~
  affinity: {}
  # default worker affinity is:
  #  podAntiAffinity:
  #    preferredDuringSchedulingIgnoredDuringExecution:
  #    - podAffinityTerm:
  #        labelSelector:
  #          matchLabels:
  #            component: worker
  #        topologyKey: kubernetes.io/hostname
  #      weight: 100
  tolerations: []

Docker Image customizations

FROM apache/airflow
#COPY ./dags/ ${AIRFLOW_HOME}/dags/
COPY ./requirements.txt ${AIRFLOW_HOME}/requirements.txt
RUN pip3 install --no-cache-dir apache-airflow==${AIRFLOW_VERSION} -r ${AIRFLOW_HOME}/requirements.txt

What happened

pod/airflow-worker-0 gets stuck on pending, and I get this error

0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 1 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod..

I have 3 nodes, and I'm using k3s

I've set the labels, and for some reason, when I set the node selector to my master/control-panel, it works, but when i set it to my agents it doesn't.

What you think should happen instead

it should just spawn the pods at the agents.

How to reproduce

curl -sfL https://get.k3s.io | sh - to install k3s

terraform {
  required_providers {
    hcloud = {
      source = "hetznercloud/hcloud"
    }
  }
}

variable "hcloud_token" {
  description = "The API token for Hetzner Cloud"
}

provider "hcloud" {
  token = var.hcloud_token
}

resource "hcloud_server" "k3s_agent" {
  count       = 2
  name        = "k3s-agent-${count.index}"
  server_type = "cx11"
  image       = "ubuntu-22.04"
  ssh_keys    = [hcloud_ssh_key.my_key.id]

  provisioner "remote-exec" {
    inline = [
      "curl -sfL https://get.k3s.io | K3S_URL=https://<IP>:6443 K3S_TOKEN='<toke>' sh -"
    ]
    connection {
      type        = "ssh"
      user        = "root"
      private_key = file("~/.ssh/id_rsa") # Replace with the correct absolute path
      host        = self.ipv4_address
    }
  }
}

resource "hcloud_ssh_key" "my_key" {
  name       = "my_key"
  public_key = file("~/.ssh/id_rsa.pub")
}

to create the agents

helm repo add apache-airflow https://airflow.apache.org
helm upgrade --debug --install airflow apache-airflow/airflow --namespace airflow -f values.yaml

set labels:

 kubectl label nodes k3s-agent-0 node-role.kubernetes.io/airflow-worker=true
 kubectl label nodes k3s-agent-1 node-role.kubernetes.io/airflow-worker=true

then upgrade the helm chart

kubectl describe pod airflow-worker-0

Name:             airflow-worker-0
Namespace:        airflow
Priority:         0
Service Account:  airflow-worker
Node:             <none>
Labels:           component=worker
                  controller-revision-hash=airflow-worker-64d7df4f8c
                  release=airflow
                  statefulset.kubernetes.io/pod-name=airflow-worker-0
                  tier=airflow
Annotations:      checksum/airflow-config: d6a9135fc4481a5bbcf6bace4a4bb82c2fd958c7af2b9c0c1f3e7ddb7715a944
                  checksum/extra-configmaps: e862ea47e13e634cf17d476323784fa27dac20015550c230953b526182f5cac8
                  checksum/extra-secrets: e9582fdd622296c976cbc10a5ba7d6702c28a24fe80795ea5b84ba443a56c827
                  checksum/kerberos-keytab: 80979996aa3c1f48c95dfbe9bb27191e71f12442a08c0ed834413da9d430fd0e
                  checksum/metadata-secret: cd6de1cad5366c38201917e3ed1ac78bec2655c819758d1fa68bbe0b6539968b
                  checksum/pgbouncer-config-secret: 1dae2adc757473469686d37449d076b0c82404f61413b58ae68b3c5e99527688
                  checksum/result-backend-secret: 98a68f230007cfa8f5d3792e1aff843a76b0686409e4a46ab2f092f6865a1b71
                  checksum/webserver-secret-key: 668251e56927d3d78c4037169030b342f4270aff8e247721420789ede6176254
                  cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/airflow-worker
Init Containers:
  wait-for-airflow-migrations:
    Image:      stathiskap/custom-airflow:0.0.1
    Port:       <none>
    Host Port:  <none>
    Args:
      airflow
      db
      check-migrations
      --migration-wait-timeout=60
    Environment:
      AIRFLOW__WEBSERVER__EXPOSE_CONFIG:    true
      AIRFLOW__CORE__FERNET_KEY:            <set to the key 'fernet-key' in secret 'airflow-fernet-key'>                      Optional: false
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:      <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:              <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__WEBSERVER__SECRET_KEY:       <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'>  Optional: false
      AIRFLOW__CELERY__BROKER_URL:          <set to the key 'connection' in secret 'airflow-broker-url'>                      Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wthl2 (ro)
Containers:
  worker:
    Image:      stathiskap/custom-airflow:0.0.1
    Port:       8793/TCP
    Host Port:  0/TCP
    Args:
      bash
      -c
      exec \
      airflow celery worker
    Liveness:  exec [sh -c CONNECTION_CHECK_MAX_COUNT=0 exec /entrypoint python -m celery --app airflow.executors.celery_executor.app inspect ping -d celery@$(hostname)] delay=10s timeout=20s period=60s #success=1 #failure=5
    Environment:
      DUMB_INIT_SETSID:                     0
      AIRFLOW__WEBSERVER__EXPOSE_CONFIG:    true
      AIRFLOW__CORE__FERNET_KEY:            <set to the key 'fernet-key' in secret 'airflow-fernet-key'>                      Optional: false
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:      <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:              <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__WEBSERVER__SECRET_KEY:       <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'>  Optional: false
      AIRFLOW__CELERY__BROKER_URL:          <set to the key 'connection' in secret 'airflow-broker-url'>                      Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
      /opt/airflow/dags from dags (ro)
      /opt/airflow/logs from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wthl2 (ro)
  worker-log-groomer:
    Image:      stathiskap/custom-airflow:0.0.1
    Port:       <none>
    Host Port:  <none>
    Args:
      bash
      /clean-logs
    Environment:
      AIRFLOW__LOG_RETENTION_DAYS:  15
    Mounts:
      /opt/airflow/logs from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wthl2 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  logs:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  logs-airflow-worker-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      airflow-airflow-config
    Optional:  false
  dags:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  airflow-dags
    ReadOnly:   false
  kube-api-access-wthl2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              node-role.kubernetes.io/airflow-worker=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  27m                default-scheduler  0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 1 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod..
  Warning  FailedScheduling  12m (x3 over 22m)  default-scheduler  0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 1 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod..

Anything else

it happens every time

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:helm-chartAirflow Helm Chartkind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions