Feature/add max new dagruns to schedule by Nataneljpwd · Pull Request #64294 · apache/airflow

Nataneljpwd · 2026-03-27T12:27:37Z

When new dagruns are created in bulk (i.e with triggerDagRunOperator), the scheduler might struggle with the amount created, and cause other dagruns to starve.

This is due to the sort order in get_running_dagruns_to_examine which selects (with a nulls first) by last scheduling decision, which means that if a lot of new dagruns are created, the scheduler will examine them first, and in situations where the dags have a lot of tasks (hundreds to tens of thousands) it can cause the scheduler to stall, as it has to both examine a lot of dagruns, and create new tasks for those dagruns.

When we have tried to tune the max_dagruns_per_loop_to_schedule we either got starvation of other dagruns OR the scheduler being reset due to not returning a heartbeat for a long time and failing the readiness probe.

To fix this, a new configuration is added, max_new_dagruns_per_loop_to_schedule which can help when a lot of new dagruns are created in large batches at the same time, and allow the scheduler to both look at existing dagruns (not starving them and causing them to timeout with no running / scheduled tasks) and create and manage the new dagruns.

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)
No

Copilot

Pull request overview

This PR introduces a scheduler tuning knob to limit how many new (never-before-examined) running DagRuns are considered per scheduling loop, to reduce starvation/slowdown when large batches of DagRuns are created at once.

Changes:

Add scheduler.max_new_dagruns_per_loop_to_schedule config (default 0) and plumb it into DagRun selection.
Update DagRun.get_running_dag_runs_to_examine() to optionally split selection into “previously examined” vs “new” DagRuns.
Add/adjust unit tests to cover the new selection behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
airflow-core/src/airflow/models/dagrun.py	Adds config-backed limit and changes running DagRun selection logic to optionally fetch “old” and “new” runs separately.
airflow-core/src/airflow/config_templates/config.yml	Documents the new scheduler configuration option.
airflow-core/tests/unit/models/test_dagrun.py	Adds tests for the new DagRun selection behavior and updates an existing test to handle the new return type.

Copilot · 2026-04-02T00:49:23Z

airflow-core/tests/unit/models/test_dagrun.py

+        self, session, dag_maker
+    ):
+
+        DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = 0
+
+        def create_dagruns(
+            last_scheduling_decision: datetime.datetime | None = None,
+            count: int = 20,
+        ):
+            dagrun = dag_maker.create_dagrun(
+                run_type=DagRunType.SCHEDULED,
+                state=State.RUNNING,
+                run_after=datetime.datetime(2024, 1, 1),
+            )
+            dagrun.last_scheduling_decision = last_scheduling_decision
+            session.merge(dagrun)
+            for _ in range(count - 1):
+                dagrun = dag_maker.create_dagrun_after(
+                    dagrun,
+                    run_type=DagRunType.SCHEDULED,
+                    state=State.RUNNING,
+                    run_after=datetime.datetime(2024, 1, 1),
+                )
+
+                dagrun.last_scheduling_decision = last_scheduling_decision
+                session.merge(dagrun)
+
+        with dag_maker(
+            dag_id="dummy_dag",
+            schedule=datetime.timedelta(days=1),
+            start_date=datetime.datetime(2024, 1, 1),
+            session=session,
+        ):
+            EmptyOperator(task_id="dummy_task")
+
+        create_dagruns(None, 10)
+
+        with dag_maker(
+            dag_id="dummy_dag2",
+            schedule=datetime.timedelta(days=1),
+            start_date=datetime.datetime(2024, 1, 1),
+            session=session,
+        ):
+            EmptyOperator(task_id="dummy_task2")
+
+        create_dagruns(func.now(), 20)
+
+        session.flush()
+
+        dagruns = list(DagRun.get_running_dag_runs_to_examine(session=session))
+
+        assert len([dagrun for dagrun in dagruns if dagrun.last_scheduling_decision is None]) == 10
+
+        assert len([dagrun for dagrun in dagruns if dagrun.last_scheduling_decision is not None]) == 10
+


This test name implies it covers the "< 0" configuration path, but it sets DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = 0, so the warning/clamping branch is never exercised. Set a negative value here (e.g. -1) and assert the expected warning (via caplog) to actually cover the behavior.

Suggested change

self, session, dag_maker

):

DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = 0

def create_dagruns(

last_scheduling_decision: datetime.datetime | None = None,

count: int = 20,

):

dagrun = dag_maker.create_dagrun(

run_type=DagRunType.SCHEDULED,

state=State.RUNNING,

run_after=datetime.datetime(2024, 1, 1),

)

dagrun.last_scheduling_decision = last_scheduling_decision

session.merge(dagrun)

for _ in range(count - 1):

dagrun = dag_maker.create_dagrun_after(

dagrun,

run_type=DagRunType.SCHEDULED,

state=State.RUNNING,

run_after=datetime.datetime(2024, 1, 1),

)

dagrun.last_scheduling_decision = last_scheduling_decision

session.merge(dagrun)

with dag_maker(

dag_id="dummy_dag",

schedule=datetime.timedelta(days=1),

start_date=datetime.datetime(2024, 1, 1),

session=session,

):

EmptyOperator(task_id="dummy_task")

create_dagruns(None, 10)

with dag_maker(

dag_id="dummy_dag2",

schedule=datetime.timedelta(days=1),

start_date=datetime.datetime(2024, 1, 1),

session=session,

):

EmptyOperator(task_id="dummy_task2")

create_dagruns(func.now(), 20)

session.flush()

dagruns = list(DagRun.get_running_dag_runs_to_examine(session=session))

assert len([dagrun for dagrun in dagruns if dagrun.last_scheduling_decision is None]) == 10

assert len([dagrun for dagrun in dagruns if dagrun.last_scheduling_decision is not None]) == 10

self, session, dag_maker, caplog

):

original_value = DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE

try:

# Set a negative value to exercise the "< 0" clamping and warning path.

DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = -1

# Capture warnings emitted when handling the negative configuration value.

caplog.set_level("WARNING", logger="airflow.models.dagrun")

def create_dagruns(

last_scheduling_decision: datetime.datetime | None = None,

count: int = 20,

):

dagrun = dag_maker.create_dagrun(

run_type=DagRunType.SCHEDULED,

state=State.RUNNING,

run_after=datetime.datetime(2024, 1, 1),

)

dagrun.last_scheduling_decision = last_scheduling_decision

session.merge(dagrun)

for _ in range(count - 1):

dagrun = dag_maker.create_dagrun_after(

dagrun,

run_type=DagRunType.SCHEDULED,

state=State.RUNNING,

run_after=datetime.datetime(2024, 1, 1),

)

dagrun.last_scheduling_decision = last_scheduling_decision

session.merge(dagrun)

with dag_maker(

dag_id="dummy_dag",

schedule=datetime.timedelta(days=1),

start_date=datetime.datetime(2024, 1, 1),

session=session,

):

EmptyOperator(task_id="dummy_task")

create_dagruns(None, 10)

with dag_maker(

dag_id="dummy_dag2",

schedule=datetime.timedelta(days=1),

start_date=datetime.datetime(2024, 1, 1),

session=session,

):

EmptyOperator(task_id="dummy_task2")

create_dagruns(func.now(), 20)

session.flush()

dagruns = list(DagRun.get_running_dag_runs_to_examine(session=session))

# Verify that the negative value was ignored/clamped by checking for the warning.

assert any(

"DEFAULT_NEW_DAGRUNS_TO_EXAMINE" in record.getMessage()

and ("negative" in record.getMessage() or "< 0" in record.getMessage())

for record in caplog.records

)

assert len([dagrun for dagrun in dagruns if dagrun.last_scheduling_decision is None]) == 10

assert len([dagrun for dagrun in dagruns if dagrun.last_scheduling_decision is not None]) == 10

finally:

DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = original_value

Copilot · 2026-04-02T00:49:24Z

airflow-core/tests/unit/models/test_dagrun.py

+        self, session, dag_maker
+    ):
+
+        DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = 0


These tests mutate the class-level DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE but never restore it, which can make later tests order-dependent. Please use monkeypatch.setattr(...) (or save/restore the original value) so the change is scoped to the test.

Suggested change

self, session, dag_maker

):

DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = 0

self, session, dag_maker, monkeypatch

):

monkeypatch.setattr(DagRun, "DEFAULT_NEW_DAGRUNS_TO_EXAMINE", 0)

Copilot · 2026-04-02T00:49:24Z

airflow-core/tests/unit/models/test_dagrun.py

+    def test_get_running_dag_runs_with_max_new_dagruns_to_examine(self, session, dag_maker):
+
+        DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = 10


Same issue here: DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE is modified without being restored, which can leak state across tests. Please scope this via monkeypatch or restore the previous value in a finally block.

Suggested change

def test_get_running_dag_runs_with_max_new_dagruns_to_examine(self, session, dag_maker):

DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = 10

def test_get_running_dag_runs_with_max_new_dagruns_to_examine(self, session, dag_maker, monkeypatch):

monkeypatch.setattr(DagRun, "DEFAULT_NEW_DAGRUNS_TO_EXAMINE", 10)

Copilot · 2026-04-02T00:49:25Z

airflow-core/src/airflow/models/dagrun.py

 from sqlalchemy import (
    JSON,
    Enum,
    ForeignKey,
    ForeignKeyConstraint,
    Index,
    Integer,
    PrimaryKeyConstraint,
+    SQLColumnExpression,
    String,
    Text,


SQLColumnExpression is only used for typing in _get_dagrun_query, and this file already keeps most SQLAlchemy typing-only imports under TYPE_CHECKING. Consider moving this import under TYPE_CHECKING (or using an already-imported typing like ColumnElement[Any]) to avoid adding an extra runtime dependency/import surface.

Copilot · 2026-04-02T00:49:25Z

airflow-core/src/airflow/models/dagrun.py

+        new_dagruns_to_examine = cls.DEFAULT_NEW_DAGRUNS_TO_EXAMINE
+        dagruns_to_examine = cls.DEFAULT_DAGRUNS_TO_EXAMINE
+
+        if new_dagruns_to_examine < 0:
+            log.warning("'max_new_dagruns_per_loop_to_schedule' is smaller than 0, ignoring configuration")
+            new_dagruns_to_examine = 0


If max_new_dagruns_per_loop_to_schedule is configured as a negative value, this warning will be emitted on every scheduler loop, potentially spamming logs. Consider clamping/validating the config once when DEFAULT_NEW_DAGRUNS_TO_EXAMINE is initialized (and logging once), instead of warning on every call.

Nataneljpwd added 2 commits March 23, 2026 21:48

added new_dagruns_to_examine configuration

2bce943

added an option to choose new dagruns to schedule amount

03bacfb

Nataneljpwd requested review from XD-DENG and ashb as code owners March 27, 2026 12:27

boring-cyborg bot added the area:ConfigTemplates label Mar 27, 2026

Merge branch 'main' into feature/add-max-new-dagruns-to-schedule

0399261

Nataneljpwd marked this pull request as draft March 27, 2026 12:28

Nataneljpwd and others added 5 commits March 27, 2026 16:21

separated to 2 queries

c70108c

removed subquery

c5431b9

fixed mypy

36e0514

fixed mypy

25ced1c

Merge branch 'main' into feature/add-max-new-dagruns-to-schedule

bdf59c7

Nataneljpwd marked this pull request as ready for review March 27, 2026 17:24

Nataneljpwd added 2 commits March 29, 2026 08:18

Merge branch 'main' into feature/add-max-new-dagruns-to-schedule

920f06f

Merge branch 'main' into feature/add-max-new-dagruns-to-schedule

0f6948c

kaxil requested a review from Copilot April 2, 2026 00:42

Copilot started reviewing on behalf of kaxil April 2, 2026 00:43 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Apr 2, 2026

Nataneljpwd added 2 commits April 4, 2026 08:48

Merge branch 'main' into feature/add-max-new-dagruns-to-schedule

eec2612

Merge branch 'main' into feature/add-max-new-dagruns-to-schedule

9de6190

eladkal added this to the Airflow 3.2.1 milestone Apr 9, 2026

eladkal added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/add max new dagruns to schedule#64294

Feature/add max new dagruns to schedule#64294
Nataneljpwd wants to merge 12 commits intoapache:mainfrom
Nataneljpwd:feature/add-max-new-dagruns-to-schedule

Nataneljpwd commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		def test_get_running_dag_runs_with_max_new_dagruns_to_examine(self, session, dag_maker):

		DagRun.DEFAULT_NEW_DAGRUNS_TO_EXAMINE = 10

Conversation

Nataneljpwd commented Mar 27, 2026

Was generative AI tooling used to co-author this PR?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants