Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfill stalls with certain combination of skipped tasks & trigger rules #25353

Closed
1 of 2 tasks
AetherUnbound opened this issue Jul 27, 2022 · 4 comments · Fixed by #26161
Closed
1 of 2 tasks

Backfill stalls with certain combination of skipped tasks & trigger rules #25353

AetherUnbound opened this issue Jul 27, 2022 · 4 comments · Fixed by #26161
Labels
area:backfill Specifically for backfill related kind:bug This is a clearly a bug

Comments

@AetherUnbound
Copy link
Contributor

Apache Airflow version

2.3.0

What happened

While trying to run a backfill for one of our DAGs, we noticed that the backfill stalled after completing all the tasks for a given DAG. The max_active_runs for this DAG was set to 1, so the entire backfill stalled even though all the tasks in the last DAG it ran completed successfully.

What you think should happen instead

I would assume that once all the tasks are complete in a DAG (whether succeeded, skipped, or failed) during a backfill, the backfill should mark the DAG with the proper state and proceed on with the rest of the tasks.

How to reproduce

Here is simulacrum of our DAG with all the actual logic stripped out:

from datetime import datetime

from airflow.decorators import dag
from airflow.exceptions import AirflowSkipException
from airflow.operators.python import PythonOperator
from airflow.utils.task_group import TaskGroup
from airflow.utils.trigger_rule import TriggerRule


def skipme():
    raise AirflowSkipException("Skip")


def run():
    return


@dag(
    schedule_interval="@daily",
    start_date=datetime(2022, 7, 14),
    catchup=False,
    max_active_runs=1,
)
def sample_dag_with_skip():
    a = PythonOperator(
        task_id="first",
        python_callable=run,
    )

    with TaskGroup(group_id="subgroup") as tg:
        b = PythonOperator(
            task_id="run_and_skip",
            trigger_rule=TriggerRule.NONE_SKIPPED,
            python_callable=skipme,
        )
        c = PythonOperator(
            task_id="run_fine",
            trigger_rule=TriggerRule.NONE_SKIPPED,
            python_callable=skipme,
        )

        d = PythonOperator(
            task_id="gather",
            python_callable=run,
        )

        e = PythonOperator(
            task_id="always_succeed",
            trigger_rule=TriggerRule.ALL_DONE,
            python_callable=run,
        )

        [b, c] >> d >> e

    f = PythonOperator(
        task_id="final",
        trigger_rule=TriggerRule.ALL_DONE,
        python_callable=run,
    )

    a >> tg >> f


skip_dag = sample_dag_with_skip()

Here's a screenshot of the DAG:
image

Note that the DAG is still shown as "running" even though the last task ended several minutes ago:
image

Here's the backfill command I ran for this DAG: airflow dags backfill -s 2022-07-25 -e 2022-07-26 sample_dag_with_skip --reset-dagruns -y

And here are the logs from the backfill process:

Backfill logs
airflow@42a81ed08a3d:~$ airflow dags backfill -s 2022-07-25 -e 2022-07-26 sample_dag_with_skip --reset-dagruns -y
/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/cli/commands/dag_command.py:57 PendingDeprecationWarning: --ignore-first-depends-on-past is deprecated as the value is always set to True
[2022-07-27 23:15:44,655] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/airflow/openverse_catalog/dags
[2022-07-27 23:15:44,937] {urls.py:74} INFO - https://creativecommons.org/publicdomain/zero/1.0 was rewritten to https://creativecommons.org/publicdomain/zero/1.0/
[2022-07-27 23:15:44,948] {media.py:63} INFO - Initialized image MediaStore with provider brooklynmuseum
[2022-07-27 23:15:44,948] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,948] {media.py:186} INFO - Output path: /var/workflow_output/brooklynmuseum_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,952] {media.py:63} INFO - Initialized image MediaStore with provider europeana
[2022-07-27 23:15:44,952] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,952] {media.py:186} INFO - Output path: /var/workflow_output/europeana_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,953] {media.py:63} INFO - Initialized image MediaStore with provider finnishmuseums
[2022-07-27 23:15:44,953] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,953] {media.py:186} INFO - Output path: /var/workflow_output/finnishmuseums_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,955] {media.py:63} INFO - Initialized image MediaStore with provider flickr
[2022-07-27 23:15:44,955] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,955] {media.py:186} INFO - Output path: /var/workflow_output/flickr_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,957] {media.py:63} INFO - Initialized audio MediaStore with provider freesound
[2022-07-27 23:15:44,957] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,957] {media.py:186} INFO - Output path: /var/workflow_output/freesound_audio_v001_20220727231544.tsv
[2022-07-27 23:15:44,959] {media.py:63} INFO - Initialized audio MediaStore with provider jamendo
[2022-07-27 23:15:44,959] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,959] {media.py:186} INFO - Output path: /var/workflow_output/jamendo_audio_v001_20220727231544.tsv
[2022-07-27 23:15:44,961] {media.py:63} INFO - Initialized image MediaStore with provider met
[2022-07-27 23:15:44,961] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,961] {media.py:186} INFO - Output path: /var/workflow_output/met_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,962] {media.py:63} INFO - Initialized image MediaStore with provider museumsvictoria
[2022-07-27 23:15:44,962] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,962] {media.py:186} INFO - Output path: /var/workflow_output/museumsvictoria_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,964] {media.py:63} INFO - Initialized image MediaStore with provider nypl
[2022-07-27 23:15:44,964] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,964] {media.py:186} INFO - Output path: /var/workflow_output/nypl_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,965] {media.py:63} INFO - Initialized image MediaStore with provider phylopic
[2022-07-27 23:15:44,965] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,965] {media.py:186} INFO - Output path: /var/workflow_output/phylopic_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,967] {media.py:63} INFO - Initialized image MediaStore with provider rawpixel
[2022-07-27 23:15:44,967] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,967] {media.py:186} INFO - Output path: /var/workflow_output/rawpixel_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,968] {media.py:63} INFO - Initialized image MediaStore with provider sciencemuseum
[2022-07-27 23:15:44,968] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,968] {media.py:186} INFO - Output path: /var/workflow_output/sciencemuseum_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,970] {media.py:63} INFO - Initialized image MediaStore with provider smithsonian
[2022-07-27 23:15:44,970] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,970] {media.py:186} INFO - Output path: /var/workflow_output/smithsonian_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,971] {media.py:63} INFO - Initialized image MediaStore with provider smk
[2022-07-27 23:15:44,972] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,972] {media.py:186} INFO - Output path: /var/workflow_output/smk_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,974] {media.py:63} INFO - Initialized image MediaStore with provider waltersartmuseum
[2022-07-27 23:15:44,974] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,974] {media.py:186} INFO - Output path: /var/workflow_output/waltersartmuseum_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,976] {media.py:63} INFO - Initialized audio MediaStore with provider wikimedia_audio
[2022-07-27 23:15:44,976] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,976] {media.py:186} INFO - Output path: /var/workflow_output/wikimedia_audio_audio_v001_20220727231544.tsv
[2022-07-27 23:15:44,976] {media.py:63} INFO - Initialized image MediaStore with provider wikimedia
[2022-07-27 23:15:44,976] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,977] {media.py:186} INFO - Output path: /var/workflow_output/wikimedia_image_v001_20220727231544.tsv
[2022-07-27 23:15:44,980] {media.py:63} INFO - Initialized image MediaStore with provider wordpress
[2022-07-27 23:15:44,980] {media.py:168} INFO - No given output directory. Using OUTPUT_DIR from environment.
[2022-07-27 23:15:44,980] {media.py:186} INFO - Output path: /var/workflow_output/wordpress_image_v001_20220727231544.tsv
[2022-07-27 23:15:45,043] {executor_loader.py:106} INFO - Loaded executor: LocalExecutor
[2022-07-27 23:15:45,176] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'first', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpfowbb78c']
[2022-07-27 23:15:50,050] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'first', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpfowbb78c']
[2022-07-27 23:15:50,060] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.skipme backfill__2022-07-25T00:00:00+00:00 [skipped]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:15:50,061] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.always_run backfill__2022-07-25T00:00:00+00:00 [success]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:15:50,063] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 2 | tasks waiting: 7 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 5
[2022-07-27 23:15:50,071] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:15:50,089] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.first backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:15:50,486] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:15:50,487] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:15:50,488] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:15:50,488] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:15:50,525] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:15:55,064] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.skipme backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:15:55,065] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.always_run backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:15:55,067] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 2 | tasks waiting: 7 | succeeded: 1 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 5
[2022-07-27 23:15:55,081] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.run_and_skip', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmp1uesref4']
[2022-07-27 23:15:55,170] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.run_fine', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpib__7p5u']
[2022-07-27 23:16:00,058] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.run_fine', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpib__7p5u']
[2022-07-27 23:16:00,058] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.run_and_skip', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmp1uesref4']
[2022-07-27 23:16:00,063] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.skipme backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:16:00,064] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.always_run backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:16:00,065] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 2 | tasks waiting: 5 | succeeded: 1 | running: 2 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 3
[2022-07-27 23:16:00,077] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:16:00,080] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:16:00,096] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.subgroup.run_fine backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:16:00,097] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.subgroup.run_and_skip backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:16:00,504] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,505] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,506] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:16:00,506] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:16:00,514] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,515] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,516] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:16:00,516] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:16:00,541] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:00,559] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:05,071] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.skipme backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:16:05,072] {dagrun.py:647} WARNING - Failed to get task '<TaskInstance: sample_dag_with_skip.always_run backfill__2022-07-25T00:00:00+00:00 [removed]>' for dag 'sample_dag_with_skip'. Marking it as removed.
[2022-07-27 23:16:05,073] {dagrun.py:583} ERROR - Deadlock; marking run <DagRun sample_dag_with_skip @ 2022-07-25 00:00:00+00:00: backfill__2022-07-25T00:00:00+00:00, externally triggered: False> failed
[2022-07-27 23:16:05,073] {dagrun.py:607} INFO - DagRun Finished: dag_id=sample_dag_with_skip, execution_date=2022-07-25 00:00:00+00:00, run_id=backfill__2022-07-25T00:00:00+00:00, run_start_date=None, run_end_date=2022-07-27 23:16:05.073628+00:00, run_duration=None, state=failed, external_trigger=False, run_type=backfill, data_interval_start=2022-07-25 00:00:00+00:00, data_interval_end=2022-07-26 00:00:00+00:00, dag_hash=None
[2022-07-27 23:16:05,075] {dagrun.py:799} WARNING - Failed to record duration of <DagRun sample_dag_with_skip @ 2022-07-25 00:00:00+00:00: backfill__2022-07-25T00:00:00+00:00, externally triggered: False>: start_date is not set.
[2022-07-27 23:16:05,075] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 5 | succeeded: 1 | running: 0 | failed: 0 | skipped: 2 | deadlocked: 0 | not ready: 3
[2022-07-27 23:16:05,096] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.always_succeed', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpulo4p958']
[2022-07-27 23:16:10,065] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'subgroup.always_succeed', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmpulo4p958']
[2022-07-27 23:16:10,067] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 3 | succeeded: 1 | running: 1 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 1
[2022-07-27 23:16:10,084] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:16:10,104] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.subgroup.always_succeed backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:16:10,500] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:10,501] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:10,502] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:16:10,502] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:16:10,537] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:15,074] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 3 | succeeded: 2 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 1
[2022-07-27 23:16:15,091] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'final', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmp7ifr68s2']
[2022-07-27 23:16:20,073] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'sample_dag_with_skip', 'final', 'backfill__2022-07-25T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/simple_backfill_example.py', '--cfg-path', '/tmp/tmp7ifr68s2']
[2022-07-27 23:16:20,075] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 2 | running: 1 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:20,094] {dagbag.py:507} INFO - Filling up the DagBag from /usr/local/***/openverse_catalog/dags/simple_backfill_example.py
[2022-07-27 23:16:20,114] {task_command.py:369} INFO - Running <TaskInstance: sample_dag_with_skip.final backfill__2022-07-25T00:00:00+00:00 [queued]> on host 42a81ed08a3d
[2022-07-27 23:16:20,522] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:20,523] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:20,524] {base_aws.py:206} INFO - Credentials retrieved from login
[2022-07-27 23:16:20,524] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-07-27 23:16:20,561] {base.py:68} INFO - Using connection ID 'aws_default' for task execution.
[2022-07-27 23:16:25,082] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:30,083] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:35,089] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:40,093] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:45,099] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:50,105] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:16:55,112] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:00,117] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:05,122] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:10,128] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:15,133] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:20,137] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:25,141] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:30,145] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:35,149] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:40,153] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:45,157] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:50,161] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:17:55,166] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:18:00,169] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:18:05,174] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
[2022-07-27 23:18:10,177] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 2 | tasks waiting: 2 | succeeded: 3 | running: 0 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 0
^C[2022-07-27 23:18:11,199] {backfill_job.py:870} WARNING - Backfill terminated by user.
[2022-07-27 23:18:11,199] {local_executor.py:390} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.

It's worth noting that I tried to replicate this with a DAG that only had a single skip task, and a DAG with skip -> run -> skip, and both succeeded with the backfill. So my guess would be that there's an odd interaction with the TaskGroup, skipped tasks, trigger rules, and possibly max_active_runs.

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==3.3.0
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-postgres==4.1.0
apache-airflow-providers-sqlite==2.1.3

Deployment

Docker-Compose

Deployment details

This is a custom configured Docker image, but doesn't deviate too much from a standard deployment: https://github.com/WordPress/openverse-catalog/blob/main/docker/airflow/Dockerfile

Anything else

I'll try to see if I can continue to pare down the DAG to see if there are pieces I can throw out and still replicate the error. I don't think I'd be comfortable submitting a PR for this one because my gut says it's probably deep in the bowels of the codebase 😅 If it's something clear or straightforward though, I'd be happy to take a stab at it! I'd just need to be pointed in the right direction within the codebase 🙂

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@AetherUnbound AetherUnbound added area:core kind:bug This is a clearly a bug labels Jul 27, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 27, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@AetherUnbound
Copy link
Contributor Author

Ha, I could have sworn I've submitted other issues in the past! 😄 Looks like it was with my old @madison-ookla account 🙂

@potiuk
Copy link
Member

potiuk commented Jul 28, 2022

[2022-07-27 23:16:05,073] {dagrun.py:583} ERROR - Deadlock; marking run <DagRun sample_dag_with_skip @ 2022-07-25 00:00:00+00:00: backfill__2022-07-25T00:00:00+00:00, externally triggered: False> failed

@ashb - that might be another manifestation of the same deadlock case we saw recently.

@wolfier
Copy link
Contributor

wolfier commented Aug 29, 2022

@potiuk Is there an issue for the deadlock issue?

I made an issue (#26044) in regards to a similar/same behaviour.

@eladkal eladkal added area:backfill Specifically for backfill related and removed area:core labels Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:backfill Specifically for backfill related kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants