-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Apache Airflow version
2.5.1
What happened
I think this issue have been already discussed in many issue but I'm still in the case where I basically cannot use the kubernetes Executor...
with this config:
executor = KubernetesExecutor
[scheduler]
job_heartbeat_sec = 60
schedule_after_task_execution = False
scheduler_health_check_threshold = 30
scheduler_heartbeat_sec = 10
and this very simple Dag
import datetime
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
def sigterm_debug(ds, **kwargs):
run_date = str(ds)
print("run date is : " + str(run_date))
import time
time.sleep(500)
print("yeaah not killed ...")
dag = DAG(
dag_id="debug",
description="Debug",
schedule_interval="30 21,9 * * *",
start_date=datetime.datetime(2023, 1, 1),
catchup=False,
dagrun_timeout=datetime.timedelta(hours=3)
)
sigterm_debug_task = PythonOperator(task_id='sigterm_debug_task',
python_callable=sigterm_debug,
execution_timeout=datetime.timedelta(hours=1),
retries=2,
dag=dag)
sigterm_debug_task
the Dag is killed by the scheduler raising the infamous exception: "PID of job runner does not match" at the exact time the job_heartbeat_interval run
Note: the pod by itself is fine. This is really airflow that kill the task and then the pod exit with an exit code > 0...
Note2: increasing "job_heartbeat_sec" to something very high let the DAG finish successfully but then the pods is never ripped before the job_heartbeat_sec arise ...
What you think should happen instead
Everything should be working as expected :)
How to reproduce
With the latest version, my conf and this simple dag.
Operating System
Docker/Kubernetes
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
100% reproducible with this config.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct