Skip to content

Emit dagrun.first_task_start_delay separately from first_task_scheduling_delay #66802

@1fanwang

Description

@1fanwang

Description

dagrun.first_task_scheduling_delay measures data_interval_end → first_start_date, which conflates two things: (a) scheduler latency to enqueue the first task, and (b) executor latency to pick the task up. Splitting these helps locate where time is being spent when a DAG run starts late.

The executor-pickup portion (queued_at → first_start_date) has no metric today.

Use case / motivation

When the first task of a DAG run starts late, ops want to know: was the scheduler slow to queue it, or was the executor slow to pick it up? One metric per phase.

Proposal

Add dagrun.first_task_start_delay, computed as first_start_date - queued_at on dag run completion. Tag by dag_id and run_type to match the existing tag shape on first_task_scheduling_delay.

I expect there's a "do we want N metrics or one with M dimensions" discussion — flagging this as an issue first instead of a direct PR so the shape can settle before code.

Are you willing to submit a PR?

  • Yes

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions