Skip to content

Duplicate dag_id should emit an error #19525

@MatrixManAtYrService

Description

@MatrixManAtYrService

Apache Airflow version

2.2.2rc1 (release candidate)

Operating System

debian (docker)

Versions of Apache Airflow Providers

n/a

Deployment

Astronomer

Deployment details

Dockerfile:

astro dev start

FROM quay.io/astronomer/ap-airflow-dev:2.2.2-onbuild-48058

Two dags:

# one.py

@task
def a():
    print("a")

@dag(schedule_interval=None, start_date=days_ago(2))
def my_dag():
    a()

dag = my_dag()
# two.py

@task
def b():
    print("b")

@dag(schedule_interval=None, start_date=days_ago(2))
def my_dag():
    b()

dag = my_dag()

Note that they share the same dag_id: my_dag

What happened

Only one DAG appeared. In the tree-view, I would refresh the page and see the task change between:

  • task_id: a
  • task_id: b

seemingly at random.

What you expected to happen

At least, I expected a warning to appear. It seems that we used to have this: #15302

I think that in the presence of a dag_id collision, we should either:

  • refuse to run that dag at all until the collision is resolved
  • resolve it deterministically

How to reproduce

Include the two dags above and observe the "DAGs" view. Notice that there's only one DAG and it's not clear which one it is.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions