Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when running multiple schedulers #19038

Closed
1 of 2 tasks
timgriffiths opened this issue Oct 18, 2021 · 4 comments
Closed
1 of 2 tasks

Race condition when running multiple schedulers #19038

timgriffiths opened this issue Oct 18, 2021 · 4 comments
Labels
affected_version:2.2 Issues Reported for 2.2 area:core area:HA-Scheduler kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues

Comments

@timgriffiths
Copy link

timgriffiths commented Oct 18, 2021

Apache Airflow version

2.2.0 (latest released)

Operating System

Debian GNU/Linux 10 (buster)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==2.3.0
apache-airflow-providers-celery==2.1.0
apache-airflow-providers-cncf-kubernetes==2.0.3
apache-airflow-providers-docker==2.2.0
apache-airflow-providers-elasticsearch==2.0.3
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-google==6.0.0
apache-airflow-providers-grpc==2.0.1
apache-airflow-providers-hashicorp==2.1.1
apache-airflow-providers-http==2.0.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-microsoft-azure==3.2.0
apache-airflow-providers-mysql==2.1.1
apache-airflow-providers-odbc==2.0.1
apache-airflow-providers-postgres==2.3.0
apache-airflow-providers-redis==2.0.1
apache-airflow-providers-sendgrid==2.0.1
apache-airflow-providers-sftp==2.1.1
apache-airflow-providers-slack==4.1.0
apache-airflow-providers-sqlite==2.0.1
apache-airflow-providers-ssh==2.2.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

Helm deployed using the official Apache Airflow Helm chart

What happened

We recently upgraded to 2.2.0 but have now noticed some of the jobs being killed by the scheduler not log after they start.

So we are using KubernetesPodOperator to launch all our tasks.

What I can see happening is:

  • Scheduler 1 -> queue job ... it then launches the intermediate pod
  • Scheduler 2 -> oh a queued job that i haven't seen before ... let me re-schedule that for you
  • Scheduler 1 -> I can't queue that again ... somethings gone wrong let's me clean up what i was doing
  • Scheduler 1 -> kill pod, which kills the successfully running pod
  • Scheduler 1 -> let's queue that again .. it then launches the intermediate pod
  • Scheduler 2 -> oh a queued job that i haven't seen before ... let me re-schedule that for you
    and repeat

Tracking it back it seems to be introduced in #18152, as this function is now scheduled it looks like you can get into a situation where a job has been launched correctly but the scheduler who kicked that off hasn't had time to update the state from queued to scheduled

ie this function

def clear_not_launched_queued_tasks(self, session=None) -> None:

What you expected to happen

Tasks that have been scheduled shouldn't be killed

How to reproduce

Startup at least 2 schedulers

Launch a set of tasks using the Kubernetes pod operator (or something that will cause a delay a job moving from queued to scheduled)

Anything else

Work around at the moment seems to just use 1 scheduler but it would be great if this could be patched.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@timgriffiths timgriffiths added area:core kind:bug This is a clearly a bug labels Oct 18, 2021
@uranusjr
Copy link
Member

cc @andrewgodwin

@andrewgodwin
Copy link
Contributor

Hm, yes, this is probably a lot more likely after my commit but would also have been present before (since that function would still run on boot).

I'm curious why it queues it though if the pod is there - the function was meant to, by my understanding, only move it from QUEUED to SCHEDULED if there's no pod that would match. Is the pod around in this case when the taskinstance gets moved back to SCHEDULED?

@jedcunningham jedcunningham added the affected_version:2.2 Issues Reported for 2.2 label Oct 19, 2021
@eladkal eladkal changed the title I believe merge 18152 has introduced a race condition when running multiple schedulers Race condition when running multiple schedulers Nov 22, 2021
@eladkal eladkal added area:HA-Scheduler provider:cncf-kubernetes Kubernetes provider related issues labels Nov 22, 2021
@george-zubrienko
Copy link

Seeing this issue a lot with 3 schedulers on 2.2.2, KubernetesExecutor on k8s 1.21. Code is PythonOperator calling a webservice. We do get some logs logs on startup though - this is right before job code starts:

[2021-12-07, 13:32:00 UTC] {taskinstance.py:1262} INFO - Executing <Task(PythonOperator): ...> on 2021-12-07 06:00:00+00:00
[2021-12-07, 13:32:00 UTC] {standard_task_runner.py:52} INFO - Started process 13 to run task
[2021-12-07, 13:32:00 UTC] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', ....]
[2021-12-07, 13:32:00 UTC] {standard_task_runner.py:77} INFO - Job 129472: Subtask ...
[2021-12-07, 13:32:05 UTC] {local_task_job.py:211} WARNING - State of this instance has been externally set to queued. Terminating instance.
[2021-12-07, 13:32:05 UTC] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 13
[2021-12-07, 13:32:07 UTC] {process_utils.py:66} INFO - Process psutil.Process(pid=13, status='terminated', exitcode=1, started='13:32:00') (13) terminated with exit code 1

Next try for this task started running, but then:

[2021-12-07, 13:32:04 UTC] {chained.py:84} INFO - DefaultAzureCredential acquired a token from EnvironmentCredential
[2021-12-07, 13:32:06 UTC] {taskinstance.py:1411} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-12-07, 13:32:06 UTC] {taskinstance.py:1703} ERROR - Task failed with exception

And then retry 3 goes through finally. Only seeing this issue on pipelines with high number of parallel tasks - in our case, 3 task pools 48 + 48 + 90 total capacity. Also for >1 scheduler, scheduler pods sometimes print this to logs:

sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL:  Process 1368 waits for ShareLock on transaction 30815670; blocked by process 1160.
Process 1160 waits for ShareLock on transaction 30815664; blocked by process 1368.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (13513,3) in relation "task_instance"

@jedcunningham
Copy link
Member

This should be fixed on 2.2.3 via #19904.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.2 Issues Reported for 2.2 area:core area:HA-Scheduler kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

6 participants