Skip to content

Dag with kubernetes executor are being killed with "PID of job runner does not match" #29389

@ut0mt8

Description

@ut0mt8

Apache Airflow version

2.5.1

What happened

I think this issue have been already discussed in many issue but I'm still in the case where I basically cannot use the kubernetes Executor...

with this config:

   executor = KubernetesExecutor

[scheduler]
    job_heartbeat_sec = 60
    schedule_after_task_execution = False
    scheduler_health_check_threshold = 30
    scheduler_heartbeat_sec = 10

and this very simple Dag

import datetime

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
    

def sigterm_debug(ds, **kwargs):
    run_date = str(ds)
    print("run date is : " + str(run_date))
    import time
    time.sleep(500)
    print("yeaah not killed ...")


dag = DAG(
    dag_id="debug",
    description="Debug",
    schedule_interval="30 21,9 * * *",
    start_date=datetime.datetime(2023, 1, 1),
    catchup=False,
    dagrun_timeout=datetime.timedelta(hours=3)
)


sigterm_debug_task = PythonOperator(task_id='sigterm_debug_task', 
                                    python_callable=sigterm_debug,
                                    execution_timeout=datetime.timedelta(hours=1),
                                    retries=2, 
                                    dag=dag)


sigterm_debug_task

the Dag is killed by the scheduler raising the infamous exception: "PID of job runner does not match" at the exact time the job_heartbeat_interval run

Note: the pod by itself is fine. This is really airflow that kill the task and then the pod exit with an exit code > 0...

Note2: increasing "job_heartbeat_sec" to something very high let the DAG finish successfully but then the pods is never ripped before the job_heartbeat_sec arise ...

What you think should happen instead

Everything should be working as expected :)

How to reproduce

With the latest version, my conf and this simple dag.

Operating System

Docker/Kubernetes

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

100% reproducible with this config.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions