-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Open
Labels
Description
Apache Airflow version
3.1.6
If "Other Airflow 3 version" selected, which one?
3.1.2
What happened?
When a task is OOM-killed (container/process terminated with SIGKILL), the task appears to fail silently:
Task-level on_failure_callback is not executed
In logs/behavior, the executor reports the task as SUCCESS
Because the scheduler never observes a FAILED state (and no mismatch is detected), it does not trigger task-level failure handling/callbacks
What you think should happen instead?
When a task process is terminated with SIGKILL:
The executor should report the task as FAILED (or otherwise indicate abnormal termination)
Airflow should then trigger task-level on_failure_callback (and any related failure handling)
How to reproduce
Create a task which can OOM and has task level failure call back and it wont be triggered
from airflow.operators.python import PythonOperator
from datetime import datetime
def func():
a = "asd"
while True:
a += a*100000
def task_failure_callback(context):
"""Callback function called when a task fails."""
task_instance = context.get('task_instance')
dag_run = context.get('dag_run')
print(f"Task {task_instance.task_id} failed in DAG {dag_run.dag_id} with run_id {dag_run.run_id}")
with DAG(
dag_id="oom_example",
start_date=datetime(2024, 1, 1),
schedule="@daily",
catchup=False,
doc_md=__doc__,
tags=["oom"],
) as dag:
oom_task = PythonOperator(
task_id="oom_task",
python_callable=func,
retries=0,
on_failure_callback=task_failure_callback,
)
Operating System
debian
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct