-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version
Other Airflow 2/3 version (please specify below)
If "Other Airflow 2/3 version" selected, which one?
3.0.6
What happened?
Since upgrading to Airflow 3 we notice that spot interruptions or other container interruptions are not correctly handled anymore.
As a result the on_kill for Airflow operators is not triggered, resulting in resources being left running when the pod gets deleted. The root cause is that the new Airflow sdk does not propagate the interruption signals to the subprocess executing the task.
What you think should happen instead?
When an Airflow worker pod get a sigterm or sigint is should shut down correctly. This means triggering the on_kill functions of the operator running at that time. Now it just shuts down. In our case this means that the pods that were launched by the Airflow worker are not cleaned up for example.
The interruptions are correctly handled on Airflow 2.
How to reproduce
Run an Airflow task with a custom Operator that triggers a pod that sleeps for 15min. Implement on_kill function for the operator and log that it is triggered.
Note: the operator does not use the KubernetesPodOperator to interact with kubernetes.
Then interrupt (gracefully kill) the worker pod and notice that the on_kill function of the operator does not get called.
When looking through the code, this is what I expect to happen from the task sdk.
The task_runner.py has code to trigger the on_kill for a task on an interrupt, but this is not happening.
Operating System
Kubernetes
Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==10.6.0
apache-airflow-providers-common-compat==1.7.3
apache-airflow-providers-common-io==1.6.2
apache-airflow-providers-common-sql==1.27.5
apache-airflow-providers-opsgenie==4.0.0
apache-airflow-providers-postgres==6.2.3
apache-airflow-providers-slack==7.3.2
apache-airflow-providers-smtp==2.2.0
apache-airflow-providers-standard==1.4.1
Deployment
Other 3rd-party Helm chart
Deployment details
We run Airflow on Kubernetes. It is a custom setup but similar deployment to the official helm chart.
We use our own operators to abstract away some logic for users.
Anything else?
/
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct