Skip to content

In Airflow 3 interruptions of the Airflow worker pod on k8s are not handled #58936

@nclaeys

Description

@nclaeys

Apache Airflow version

Other Airflow 2/3 version (please specify below)

If "Other Airflow 2/3 version" selected, which one?

3.0.6

What happened?

Since upgrading to Airflow 3 we notice that spot interruptions or other container interruptions are not correctly handled anymore.

As a result the on_kill for Airflow operators is not triggered, resulting in resources being left running when the pod gets deleted. The root cause is that the new Airflow sdk does not propagate the interruption signals to the subprocess executing the task.

What you think should happen instead?

When an Airflow worker pod get a sigterm or sigint is should shut down correctly. This means triggering the on_kill functions of the operator running at that time. Now it just shuts down. In our case this means that the pods that were launched by the Airflow worker are not cleaned up for example.

The interruptions are correctly handled on Airflow 2.

How to reproduce

Run an Airflow task with a custom Operator that triggers a pod that sleeps for 15min. Implement on_kill function for the operator and log that it is triggered.

Note: the operator does not use the KubernetesPodOperator to interact with kubernetes.

Then interrupt (gracefully kill) the worker pod and notice that the on_kill function of the operator does not get called.
When looking through the code, this is what I expect to happen from the task sdk.

The task_runner.py has code to trigger the on_kill for a task on an interrupt, but this is not happening.

Operating System

Kubernetes

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==10.6.0
apache-airflow-providers-common-compat==1.7.3
apache-airflow-providers-common-io==1.6.2
apache-airflow-providers-common-sql==1.27.5
apache-airflow-providers-opsgenie==4.0.0
apache-airflow-providers-postgres==6.2.3
apache-airflow-providers-slack==7.3.2
apache-airflow-providers-smtp==2.2.0
apache-airflow-providers-standard==1.4.1

Deployment

Other 3rd-party Helm chart

Deployment details

We run Airflow on Kubernetes. It is a custom setup but similar deployment to the official helm chart.
We use our own operators to abstract away some logic for users.

Anything else?

/

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions