Skip to content

DatasetOrTimeSchedule causes deadlock #41305

@RobbertDM

Description

@RobbertDM

Apache Airflow version

2.9.3

If "Other Airflow 2 version" selected, which one?

No response

What happened?

I want to use DatasetOrTimeSchedule as a schedule in my DAG.

When I do that and wait a day for the schedule to trigger and also the dataset trigger from another DAG, I indeed see 2 DAG runs triggered, one triggered by the time and one by the dataset (has the dataset icon as well).
Now, the problem is, the time triggered one (displayed first) is queued, and the dataset triggered on is running, but all the tasks have no status. They keep being in this state ad infinitum without anything running.

What's also weird is that the time triggered one is queued just one second after the dataset triggered one:
Queued At: 2024-08-07, 02:01:45 CEST
And the Dataset triggered one:
Queued at 2024-08-07, 02:01:44 CEST
Started 2024-08-07, 02:01:44 CEST

Which I find weird because it should've been queued at 02:00:00 CEST according to its schedule.

I should note I have max_active_runs=1 and depends_on_past=True.

What you think should happen instead?

The Dataset trigger should not influence the time schedule. They should run independently. The time schedule run should be queued and start running at its scheduled time. The tasks should actually run. There should be no deadlock.

How to reproduce

From the docs:

from airflow.timetables.datasets import DatasetOrTimeSchedule
from airflow.timetables.trigger import CronTriggerTimetable


@dag(
    schedule=DatasetOrTimeSchedule(
        timetable=CronTriggerTimetable("0 1 * * 3", timezone="UTC"), datasets=(dag1_dataset | dag2_dataset)
    )
    # Additional arguments here, replace this comment with actual arguments
)
def example_dag():
    # DAG tasks go here
    pass

Set max_active_tasks=1 and depends_on_past=True.
Have a second DAG trigger this DAG with dag1_dataset or dag2_dataset

Operating System

Our Airflow runs on kubernetes on EKS

Versions of Apache Airflow Providers

No response

Deployment

Other

Deployment details

We use conveyor https://docs.conveyordata.com/technical-reference/airflow/airflow-installation-details

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions