Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfill dagrun mistakenly evaluated as deadlocked #26044

Closed
2 tasks done
wolfier opened this issue Aug 29, 2022 · 3 comments · Fixed by #26161
Closed
2 tasks done

Backfill dagrun mistakenly evaluated as deadlocked #26044

wolfier opened this issue Aug 29, 2022 · 3 comments · Fixed by #26161
Labels
area:backfill Specifically for backfill related kind:bug This is a clearly a bug

Comments

@wolfier
Copy link
Contributor

wolfier commented Aug 29, 2022

Apache Airflow version

Other Airflow 2 version

What happened

I used a bash operator to run a backfill command. The dagrun was marked as failed and was alerted for a deadlock even though the task instances themselves were still ran normally. This happens occasionally.

[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {dagrun.py:585} ERROR - Deadlock; marking run <DagRun load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental @ 2022-08-15 08:00:00+00:00: backfill__2022-08-15T08:00:00+00:00, externally triggered: False> failed
...
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 4 | running: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {local_executor.py:390} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {backfill_job.py:879} INFO - Backfill done. Exiting.

Here is full backfill log.

[2022-08-23, 10:54:00 UTC] {subprocess.py:74} INFO - Running command: ['bash', '-c', 'cd $AIRFLOW_HOME && airflow dags backfill -s "2022-08-15 00:00:00" -e "2022-08-16 00:00:00"  -c \'{"start_val":"1","end_val":"4"}\' --rerun-failed-tasks --reset-dagruns --yes load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental']
...
[2022-08-23, 10:54:21 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:21 UTC] {task_command.py:371} INFO - Running <TaskInstance: load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental.source.extract_withdrawals_venmo_withdrawal_aud_incremental_load backfill__2022-08-15T08:00:00+00:00 [queued]> on host a8870cb5a3e0
[2022-08-23, 10:54:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:19 UTC] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'source.extract_withdrawals_venmo_withdrawal_aud_incremental_load', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmp92x61y3k']
[2022-08-23, 10:54:24 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:24 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:29 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:29 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:34 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:34 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:39 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:39 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:44 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:44 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:49 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:49 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:54 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:54 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {dagrun.py:585} ERROR - Deadlock; marking run <DagRun load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental @ 2022-08-15 08:00:00+00:00: backfill__2022-08-15T08:00:00+00:00, externally triggered: False> failed
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {dagrun.py:609} INFO - DagRun Finished: dag_id=load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental, execution_date=2022-08-15 08:00:00+00:00, run_id=backfill__2022-08-15T08:00:00+00:00, run_start_date=None, run_end_date=2022-08-23 10:54:59.121952+00:00, run_duration=None, state=failed, external_trigger=False, run_type=backfill, data_interval_start=2022-08-15 08:00:00+00:00, data_interval_end=2022-08-16 08:00:00+00:00, dag_hash=None
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {dagrun.py:795} WARNING - Failed to record duration of <DagRun load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental @ 2022-08-15 08:00:00+00:00: backfill__2022-08-15T08:00:00+00:00, externally triggered: False>: start_date is not set.
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 8 | succeeded: 1 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'destination.post_marker_staging_withdrawals_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmpd1nq6xe2']
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'destination.post_marker_fdg_pii_fact_aw_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmps6ah6zww']
[2022-08-23, 10:55:04 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:04 UTC] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'destination.post_marker_fdg_pii_fact_aw_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmps6ah6zww']
[2022-08-23, 10:55:04 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:04 UTC] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'destination.post_marker_staging_withdrawals_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmpd1nq6xe2']
[2022-08-23, 10:55:04 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:04 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 3 | succeeded: 1 | running: 2 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 3
[2022-08-23, 10:55:06 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:06 UTC] {task_command.py:371} INFO - Running <TaskInstance: load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental.destination.post_marker_fdg_pii_fact_aw_venmo_withdrawal_aud backfill__2022-08-15T08:00:00+00:00 [queued]> on host a8870cb5a3e0
[2022-08-23, 10:55:06 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:06 UTC] {task_command.py:371} INFO - Running <TaskInstance: load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental.destination.post_marker_staging_withdrawals_venmo_withdrawal_aud backfill__2022-08-15T08:00:00+00:00 [queued]> on host a8870cb5a3e0
[2022-08-23, 10:55:09 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:09 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 1 | succeeded: 3 | running: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 1
[2022-08-23, 10:55:09 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:09 UTC] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'post_execution.rotate_checkpoint_withdrawals_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmpkve4mv_q']
[2022-08-23, 10:55:14 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:14 UTC] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'post_execution.rotate_checkpoint_withdrawals_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmpkve4mv_q']
[2022-08-23, 10:55:14 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:14 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 3 | running: 1 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2022-08-23, 10:55:15 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:15 UTC] {task_command.py:371} INFO - Running <TaskInstance: load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental.post_execution.rotate_checkpoint_withdrawals_venmo_withdrawal_aud backfill__2022-08-15T08:00:00+00:00 [queued]> on host a8870cb5a3e0
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 4 | running: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {local_executor.py:390} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {backfill_job.py:879} INFO - Backfill done. Exiting.

What you think should happen instead

The DAG is not deadlocked but still somehow was still evaluated as deadlocked.

How to reproduce

No response

Operating System

N/A

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@ephraimbuddy
Copy link
Contributor

@wolfier , can you share the dag you used to reproduce yours?

@eladkal eladkal added area:backfill Specifically for backfill related and removed area:core labels Aug 31, 2022
@ephraimbuddy
Copy link
Contributor

So, using the dag at #25353, I reproduced the deadlock. Not only in backfill but on normal run with mini scheduler turned off.
This case is similar to #25200 (comment), however, filtering the unfinished task does not solve it.
cc: @ashb

@wolfier
Copy link
Contributor Author

wolfier commented Sep 1, 2022

Unfortunately, I am unable to share the DAG itself but I can describe how the DAG is setup.

There is a DAG that runs a BashOperator with the bash command of airflow backfill. The referenced DAG to backfill could be in any shape/size meaning this issue could happen to any DAG in the deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:backfill Specifically for backfill related kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants