You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While trying to run a backfill for one of our DAGs, we noticed that the backfill stalled after completing all the tasks for a given DAG. The max_active_runs for this DAG was set to 1, so the entire backfill stalled even though all the tasks in the last DAG it ran completed successfully.
What you think should happen instead
I would assume that once all the tasks are complete in a DAG (whether succeeded, skipped, or failed) during a backfill, the backfill should mark the DAG with the proper state and proceed on with the rest of the tasks.
How to reproduce
Here is simulacrum of our DAG with all the actual logic stripped out:
Note that the DAG is still shown as "running" even though the last task ended several minutes ago:
Here's the backfill command I ran for this DAG: airflow dags backfill -s 2022-07-25 -e 2022-07-26 sample_dag_with_skip --reset-dagruns -y
And here are the logs from the backfill process:
Backfill logs
airflow@42a81ed08a3d:~$ airflow dags backfill -s 2022-07-25 -e 2022-07-26 sample_dag_with_skip --reset-dagruns -y
/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/cli/commands/dag_command.py:57 PendingDeprecationWarning: --ignore-first-depends-on-past is deprecated as the value is always set to True
[2022-07-27 23:15:44,655] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/airflow/openverse_catalog/dags
[2022-07-27 23:15:44,937] {urls.py:74} INFO - https://creativecommons.org/publicdomain/zero/1.0 was rewritten to https://creativecommons.org/publicdomain/zero/1.0/
[2022-07-27 23:15:44,948] {media.py:63} INFO - Initialized image MediaStore with provider brooklynmuseum
[2022-07-27 23:15:44,948] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,948] {media.py:186} INFO - Output path: /var/workflow_output/brooklynmuseum_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,952] {media.py:63} INFO - Initialized image MediaStore with provider europeana
[2022-07-27 23:15:44,952] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,952] {media.py:186} INFO - Output path: /var/workflow_output/europeana_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,953] {media.py:63} INFO - Initialized image MediaStore with provider finnishmuseums
[2022-07-27 23:15:44,953] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,953] {media.py:186} INFO - Output path: /var/workflow_output/finnishmuseums_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,955] {media.py:63} INFO - Initialized image MediaStore with provider flickr
[2022-07-27 23:15:44,955] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,955] {media.py:186} INFO - Output path: /var/workflow_output/flickr_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,957] {media.py:63} INFO - Initialized audio MediaStore with provider freesound
[2022-07-27 23:15:44,957] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,957] {media.py:186} INFO - Output path: /var/workflow_output/freesound_audio_v001_20220727231544.tsv
[2022-07-27 23:15:44,959] {media.py:63} INFO - Initialized audio MediaStore with provider jamendo
[2022-07-27 23:15:44,959] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,959] {media.py:186} INFO - Output path: /var/workflow_output/jamendo_audio_v001_20220727231544.tsv
[2022-07-27 23:15:44,961] {media.py:63} INFO - Initialized image MediaStore with provider met
[2022-07-27 23:15:44,961] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,961] {media.py:186} INFO - Output path: /var/workflow_output/met_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,962] {media.py:63} INFO - Initialized image MediaStore with provider museumsvictoria
[2022-07-27 23:15:44,962] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,962] {media.py:186} INFO - Output path: /var/workflow_output/museumsvictoria_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,964] {media.py:63} INFO - Initialized image MediaStore with provider nypl
[2022-07-27 23:15:44,964] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,964] {media.py:186} INFO - Output path: /var/workflow_output/nypl_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,965] {media.py:63} INFO - Initialized image MediaStore with provider phylopic
[2022-07-27 23:15:44,965] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,965] {media.py:186} INFO - Output path: /var/workflow_output/phylopic_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,967] {media.py:63} INFO - Initialized image MediaStore with provider rawpixel
[2022-07-27 23:15:44,967] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,967] {media.py:186} INFO - Output path: /var/workflow_output/rawpixel_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,968] {media.py:63} INFO - Initialized image MediaStore with provider sciencemuseum
[2022-07-27 23:15:44,968] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,968] {media.py:186} INFO - Output path: /var/workflow_output/sciencemuseum_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,970] {media.py:63} INFO - Initialized image MediaStore with provider smithsonian
[2022-07-27 23:15:44,970] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,970] {media.py:186} INFO - Output path: /var/workflow_output/smithsonian_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,971] {media.py:63} INFO - Initialized image MediaStore with provider smk
[2022-07-27 23:15:44,972] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,972] {media.py:186} INFO - Output path: /var/workflow_output/smk_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,974] {media.py:63} INFO - Initialized image MediaStore with provider waltersartmuseum
[2022-07-27 23:15:44,974] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,974] {media.py:186} INFO - Output path: /var/workflow_output/waltersartmuseum_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,976] {media.py:63} INFO - Initialized audio MediaStore with provider wikimedia_audio
[2022-07-27 23:15:44,976] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,976] {media.py:186} INFO - Output path: /var/workflow_output/wikimedia_audio_audio_v001_20220727231544.tsv
[2022-07-27 23:15:44,976] {media.py:63} INFO - Initialized image MediaStore with provider wikimedia
[2022-07-27 23:15:44,976] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,977] {media.py:186} INFO - Output path: /var/workflow_output/wikimedia_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,980] {media.py:63} INFO - Initialized image MediaStore with provider wordpress
[2022-07-27 23:15:44,980] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,980] {media.py:186} INFO - Output path: /var/workflow_output/wordpress_image_v001_20220727231544.tsv
[2022-07-27 23:15:45,043] {executor_loader.py:106} INFO - Loaded executor: LocalExecutor
[2022-07-27 23:15:45,176] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'first', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpfowbb78c']
[2022-07-27 23:15:50,050] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'first', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpfowbb78c']
[2022-07-27 23:15:50,060] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.skipme backfill__2022-07-25T00:00:00+00:00 [skipped]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:15:50,061] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.always_run backfill__2022-07-25T00:00:00+00:00 [success]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:15:50,063] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 2 | tasks waiting: 7 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 5
[2022-07-27 23:15:50,071] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:15:50,089] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.first backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:15:50,486] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:15:50,487] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:15:50,488] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:15:50,488] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:15:50,525] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:15:55,064] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.skipme backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:15:55,065] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.always_run backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:15:55,067] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 2 | tasks waiting: 7 | succeeded: 1 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 5
[2022-07-27 23:15:55,081] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.run_and_skip', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmp1uesref4']
[2022-07-27 23:15:55,170] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.run_fine', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpib__7p5u']
[2022-07-27 23:16:00,058] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.run_fine', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpib__7p5u']
[2022-07-27 23:16:00,058] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.run_and_skip', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmp1uesref4']
[2022-07-27 23:16:00,063] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.skipme backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:16:00,064] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.always_run backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:16:00,065] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 2 | tasks waiting: 5 | succeeded: 1 | running: 2 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 3
[2022-07-27 23:16:00,077] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:16:00,080] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:16:00,096] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.subgroup.run_fine backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:16:00,097] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.subgroup.run_and_skip backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:16:00,504] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,505] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,506] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:16:00,506] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:16:00,514] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,515] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,516] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:16:00,516] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:16:00,541] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,559] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:05,071] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.skipme backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:16:05,072] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.always_run backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:16:05,073] {dagrun.py:583} ERROR - Deadlock; marking run <DagRun sample_dag_with_skip @ 2022-07-25 00:00:00+00:00: backfill__2022-07-25T00:00:00+00:00, externally triggered: False> failed
[2022-07-27 23:16:05,073] {dagrun.py:607} INFO - DagRun Finished: dag_id=sample_dag_with_skip, execution_date=2022-07-25 00:00:00+00:00, run_id=backfill__2022-07-25T00:00:00+00:00, run_start_date=None, run_end_date=2022-07-27 23:16:05.073628+00:00, run_duration=None, state=failed, external_trigger=False, run_type=backfill, data_interval_start=2022-07-25 00:00:00+00:00, data_interval_end=2022-07-26 00:00:00+00:00, dag_hash=None
[2022-07-27 23:16:05,075] {dagrun.py:799} WARNING - Failed to record duration of <DagRun sample_dag_with_skip @ 2022-07-25 00:00:00+00:00: backfill__2022-07-25T00:00:00+00:00, externally triggered: False>: start_date is not set.
[2022-07-27 23:16:05,075] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 5 | succeeded: 1 | running: 0 | failed: 0 | skipped: 2 | deadlocked: 0 | not ready: 3
[2022-07-27 23:16:05,096] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.always_succeed', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpulo4p958']
[2022-07-27 23:16:10,065] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.always_succeed', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpulo4p958']
[2022-07-27 23:16:10,067] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 3 | succeeded: 1 | running: 1 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 1
[2022-07-27 23:16:10,084] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:16:10,104] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.subgroup.always_succeed backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:16:10,500] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:10,501] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:10,502] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:16:10,502] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:16:10,537] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:15,074] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 3 | succeeded: 2 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 1
[2022-07-27 23:16:15,091] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'final', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmp7ifr68s2']
[2022-07-27 23:16:20,073] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'final', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmp7ifr68s2']
[2022-07-27 23:16:20,075] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 2 | running: 1 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:20,094] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:16:20,114] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.final backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:16:20,522] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:20,523] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:20,524] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:16:20,524] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:16:20,561] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:25,082] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:30,083] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:35,089] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:40,093] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:45,099] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:50,105] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:55,112] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:00,117] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:05,122] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:10,128] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:15,133] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:20,137] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:25,141] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:30,145] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:35,149] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:40,153] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:45,157] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:50,161] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:55,166] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:18:00,169] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:18:05,174] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:18:10,177] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
^C[2022-07-27 23:18:11,199] {backfill_job.py:870} WARNING - Backfill terminated by user.
[2022-07-27 23:18:11,199] {local_executor.py:390} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
It's worth noting that I tried to replicate this with a DAG that only had a single skip task, and a DAG with skip -> run -> skip, and both succeeded with the backfill. So my guess would be that there's an odd interaction with the TaskGroup, skipped tasks, trigger rules, and possibly max_active_runs.
I'll try to see if I can continue to pare down the DAG to see if there are pieces I can throw out and still replicate the error. I don't think I'd be comfortable submitting a PR for this one because my gut says it's probably deep in the bowels of the codebase 😅 If it's something clear or straightforward though, I'd be happy to take a stab at it! I'd just need to be pointed in the right direction within the codebase 🙂
Apache Airflow version
2.3.0
What happened
While trying to run a backfill for one of our DAGs, we noticed that the backfill stalled after completing all the tasks for a given DAG. The
max_active_runs
for this DAG was set to1
, so the entire backfill stalled even though all the tasks in the last DAG it ran completed successfully.What you think should happen instead
I would assume that once all the tasks are complete in a DAG (whether succeeded, skipped, or failed) during a backfill, the backfill should mark the DAG with the proper state and proceed on with the rest of the tasks.
How to reproduce
Here is simulacrum of our DAG with all the actual logic stripped out:
Here's a screenshot of the DAG:
Note that the DAG is still shown as "running" even though the last task ended several minutes ago:
Here's the backfill command I ran for this DAG:
airflow dags backfill -s 2022-07-25 -e 2022-07-26 sample_dag_with_skip --reset-dagruns -y
And here are the logs from the backfill process:
Backfill logs
It's worth noting that I tried to replicate this with a DAG that only had a single skip task, and a DAG with skip -> run -> skip, and both succeeded with the backfill. So my guess would be that there's an odd interaction with the TaskGroup, skipped tasks, trigger rules, and possibly
max_active_runs
.Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
Deployment
Docker-Compose
Deployment details
This is a custom configured Docker image, but doesn't deviate too much from a standard deployment: https://github.com/WordPress/openverse-catalog/blob/main/docker/airflow/Dockerfile
Anything else
I'll try to see if I can continue to pare down the DAG to see if there are pieces I can throw out and still replicate the error. I don't think I'd be comfortable submitting a PR for this one because my gut says it's probably deep in the bowels of the codebase 😅 If it's something clear or straightforward though, I'd be happy to take a stab at it! I'd just need to be pointed in the right direction within the codebase 🙂
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: