Skip to content

support continues catchup and backfill #25265

@blcksrx

Description

@blcksrx

Description

catchup and backfill provided good flexibility to run Dags in pasts. the only thing that might concerning on those context is that they might produces a lot of DagRun. this situation could leads to some performance issues such as remaining slot availability and long time to finish.
Assume that there is a DAG with SqlOperator using this sql template that runs every 15 mins:

SELECT * FROM table WHERE created_at BETWEEN {{  prev_data_interval_start_success }} AND {{ ts }}

In case of any interruption on the scheduler level such as 1 day, this catchup process would creates 96 DagRun.
or also assume running this Dag with backfill like this:

airflow backfill DAG --start-date=today --end-date=prev_week

that would creates 672 DagRun!

with simple feature that only runs 1 instance of DagRun and fulfils these parameters by the starts of the gap and end of the gap, this issue will be solve.

Use case/motivation

  • reduce number of DagRuns in case of catchup and backfill
  • decrease completion time of backfill and catchup

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions