Fix dagrun.duration.failed missing run_type tag on dagrun_timeout#64768
Fix dagrun.duration.failed missing run_type tag on dagrun_timeout#64768mahirhiro wants to merge 2 commits intoapache:mainfrom
Conversation
When a dag run fails due to dagrun_timeout, the Stats.timing call for
dagrun.duration.failed used tags={"dag_id": dag_run.dag_id} instead of
dag_run.stats_tags. This meant run_type was always absent from timeout-
caused failures, making it impossible to filter the metric by run_type
in monitoring queries.
Aligns the timeout path with the normal finish path in dagrun.py which
correctly uses stats_tags.
closes: apache#64765
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
There was a problem hiding this comment.
Pull request overview
Aligns dagrun.duration.failed metric tagging for DagRun timeout failures with the normal DagRun completion path so run_type is consistently included.
Changes:
- Emit
dagrun.duration.failedondagrun_timeoutusingdag_run.stats_tags(includesdag_id+run_type). - Add a unit regression test ensuring
run_typeis present in the emitted metric tags for timeout-caused failures.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
airflow-core/src/airflow/jobs/scheduler_job_runner.py |
Uses dag_run.stats_tags when timing dagrun.duration.failed in the timeout failure path to include run_type. |
airflow-core/tests/unit/jobs/test_scheduler_job.py |
Adds a regression test asserting run_type is included in dagrun.duration.failed tags for timeout failures. |
| with dag_maker( | ||
| dag_id="test_scheduler_fail_dagrun_timeout_stats", | ||
| dagrun_timeout=datetime.timedelta(seconds=60), | ||
| schedule="@daily", | ||
| session=session, | ||
| ): | ||
| EmptyOperator(task_id="dummy") | ||
|
|
||
| dr = dag_maker.create_dagrun(start_date=timezone.utcnow() - datetime.timedelta(days=1)) | ||
|
|
There was a problem hiding this comment.
dag_maker.create_dagrun(...) defaults run_type to DagRunType.MANUAL unless explicitly provided (see devel-common/src/tests_common/pytest_plugin.py:1118-1120), so schedule="@daily" here doesn’t make this DagRun “scheduled”. To avoid the test being misleading (and to better match the reported impact), consider either passing run_type=DagRunType.SCHEDULED and asserting the expected value, or dropping the schedule argument.
…SCHEDULED run type Pass run_type=DagRunType.SCHEDULED explicitly to create_dagrun so the test actually exercises the scheduled run type scenario. Also assert the tag value rather than just its presence, and drop the schedule="@daily" arg which had no effect on run_type.
When a dag run fails due to
dagrun_timeout, theDualStatsManager.timingcall fordagrun.duration.failedusedtags={"dag_id": dag_run.dag_id}instead ofdag_run.stats_tags. This meantrun_typewas always absent from timeout-caused failures, making it impossible to filter the metric byrun_typein monitoring queries.The normal finish path in
dagrun.pycorrectly usesself.stats_tags(which includes bothdag_idandrun_type). This aligns the timeout path to match.closes: #64765
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (claude-sonnet-4-6) following the guidelines