Skip to content

TriggerDagRunOperator can fail with DagRunAlreadyExists after ambiguous execution API trigger retry #66905

@hkc-8010

Description

@hkc-8010

Apache Airflow version

main branch

What happened?

TriggerDagRunOperator can fail with DagRunAlreadyExists even though the child Dag run was created successfully.

In the Airflow 3 task-sdk path, DagRunOperations.trigger() sends POST /execution/dag-runs/{dag_id}/{run_id} through the generic execution API retry layer. If the server creates the Dag run but the client sees an ambiguous transport or request error, the retry can POST the same run ID again and receive 409 Conflict.

The task runner then treats that as a real pre-existing run, marks the parent task failed, and does not write the trigger_run_id XCom.

What you think should happen instead?

A transport-level ambiguity after a trigger POST should not be converted into a duplicate-run failure when the requested Dag run now exists.

How to reproduce

  1. Mock POST /dag-runs/{dag_id}/{run_id} so the server-side run is created but the client sees httpx.RequestError.
  2. Return an existing Dag run from GET /dag-runs/{dag_id}/{run_id}.
  3. The trigger operation should treat this as success for that run ID instead of surfacing DAGRUN_ALREADY_EXISTS.

Code pointers

task-sdk/src/airflow/sdk/api/client.py
task-sdk/src/airflow/sdk/execution_time/task_runner.py

Are you willing to submit PR?

Yes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions