Skip to content

Pods that end in CreateContainerConfigError are not deleted by Airflow after DAG ends #19704

@ghost

Description

Apache Airflow version

2.1.3

Operating System

Debian 10

Versions of Apache Airflow Providers

apache-airflow-providers-apache-cassandra==2.0.1 \
apache-airflow-providers-apache-hive==2.0.2 \
apache-airflow-providers-celery==2.1.0 \
apache-airflow-providers-cncf-kubernetes==2.0.2 \
apache-airflow-providers-ftp==2.0.1 \
apache-airflow-providers-http==2.0.1 \
apache-airflow-providers-imap==2.0.1 \
apache-airflow-providers-jdbc==2.0.1 \
apache-airflow-providers-mysql==2.1.1 \
apache-airflow-providers-papermill==2.0.1 \
apache-airflow-providers-postgres==2.2.0 \
apache-airflow-providers-sftp==2.1.1 \
apache-airflow-providers-sqlite==2.0.1 \
apache-airflow-providers-ssh==2.1.1 \
apache-airflow-providers-google==5.1.0 \
apache-airflow-providers-apache-beam==3.1.0 \

Deployment

Other Docker-based deployment

Deployment details

Kubernetes - GKE
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS = True
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE = True
KubernetesCeleryExecutor

What happened

I set the AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG parameter to the wrong, nonexisting version.
Before realizing it, Airflow created few pods that ended with ImagePullBackOff and are stuck in restarting state Back-off pulling image "xxx" and status of pods is CreateContainerConfigError.
I see no logs in scheduler about these pods, it seems that they were abandoned by Airflow when DAGs entered failed state because pods were not ready.

What you expected to happen

No response

How to reproduce

  • Set these configuration options:
    • AIRFLOW__KUBERNETES__DELETE_WORKER_PODS = True
    • AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE = True
  • Use KubernetesCeleryExecutor
  • Set AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG to nonexisting tag
  • Run some Airflow tasks
  • Check that pods are failing and are in CreateContainerConfigError state.
  • Mark the task as finished/failed or wait for tasks to fail.
  • Pods are still there and will remain until deleted manually

Anything else

It happens always

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugpending-responsestaleStale PRs per the .github/workflows/stale.yml policy file

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions