Reduce noise in the daily CI duration trend alert by potiuk · Pull Request #69113 · apache/airflow

potiuk · 2026-06-29T06:47:54Z

The daily CI Duration Trend Alert (.github/workflows/ci-duration-monitor.yml + scripts/ci/analyze_ci_job_durations.py) has been firing most nights with a wildly varying set of "jobs that got slower", carrying little real signal.

Root cause: the monitor compared a single nightly canary run against the median of the preceding ~24 runs (LATEST_RUNS defaulted to 1). The two sides were asymmetric — a raw, unsmoothed point against a robust median — so any one unlucky run (slow PyPI, GitHub runner queue pressure, a cold cache) tripped the alert. Because a different run is "latest" each day, a different set of jobs got flagged each day. Network-bound constraint-resolution jobs (Finalize tests / Deps 3.x:constraints, Generate constraints, provider installs), which legitimately swing tens of minutes run-to-run, dominated nearly every alert and only needed to clear a 3-minute absolute floor.

Fix:

LATEST_RUNS: "3" — compare the median of the last 3 nightly runs against the baseline so the two sides are symmetric and one unlucky run no longer trips it.
JOB_MIN_ABS_INCREASE_MINUTES: "6" — require a larger sustained absolute jump before flagging an individual job, so ordinary network variance on the long constraint jobs stops alerting.
ONLY_SUCCESSFUL: "true" — pin the baseline to successful (green) canary runs. The script already defaulted to this, but a failed/cancelled canary stops partway and its truncated durations would skew the trend downwards and mask regressions; setting it explicitly keeps the green-only guarantee visible and unchangeable-by-accident at the call site.

No script logic changes — the existing env knobs already support this, so the behaviour is config-only and the script's tests are unaffected. Genuine sustained regressions still trip the alert.

Was generative AI tooling used to co-author this PR?

Yes — Claude Code (Opus 4.8)

Generated-by: Claude Code (Opus 4.8) following the guidelines

The duration monitor flagged jobs by comparing a single nightly canary run against the median of the preceding runs, so any one slow run — slow PyPI, runner queue pressure, a cold cache — tripped the alert. Because a different run was "latest" each day, a different set of jobs was flagged each day, and network-bound constraint-resolution jobs that legitimately swing tens of minutes dominated nearly every alert. The result was a near-daily alert whose contents swung wildly and carried little signal. Compare the median of the last few nightly runs against the baseline so the two sides are symmetric and one unlucky run no longer trips it, and require a larger absolute jump before flagging individual jobs. Pin the monitor to successful (green) canary runs only. A failed or cancelled canary stops partway, so its truncated wall-clock and per-job durations would skew the baseline downwards and mask real regressions. The script already defaults to this, but the guarantee is now explicit at the call site so it cannot be silently changed.

github-actions · 2026-07-03T13:47:11Z

Backport successfully created: v3-3-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status	Branch	Result
✅	v3-3-test

The duration monitor flagged jobs by comparing a single nightly canary run against the median of the preceding runs, so any one slow run — slow PyPI, runner queue pressure, a cold cache — tripped the alert. Because a different run was "latest" each day, a different set of jobs was flagged each day, and network-bound constraint-resolution jobs that legitimately swing tens of minutes dominated nearly every alert. The result was a near-daily alert whose contents swung wildly and carried little signal. Compare the median of the last few nightly runs against the baseline so the two sides are symmetric and one unlucky run no longer trips it, and require a larger absolute jump before flagging individual jobs. Pin the monitor to successful (green) canary runs only. A failed or cancelled canary stops partway, so its truncated wall-clock and per-job durations would skew the baseline downwards and mask real regressions. The script already defaults to this, but the guarantee is now explicit at the call site so it cannot be silently changed. (cherry picked from commit e99daee)

boring-cyborg Bot added area:dev-tools backport-to-v3-3-test Backport to v3-3-test labels Jun 29, 2026

potiuk marked this pull request as ready for review June 29, 2026 06:47

potiuk requested review from amoghrajesh, ashb, bugraoz93, gopidesupavan, jason810496 and jscheffl as code owners June 29, 2026 06:48

potiuk force-pushed the ci-duration-alert-reduce-noise branch from 7252b10 to c58fa46 Compare July 3, 2026 11:41

vatsrahul1001 approved these changes Jul 3, 2026

View reviewed changes

potiuk merged commit e99daee into apache:main Jul 3, 2026
66 checks passed

potiuk deleted the ci-duration-alert-reduce-noise branch July 3, 2026 13:45

potiuk mentioned this pull request Jul 3, 2026

[v3-3-test] Reduce noise in the daily CI duration trend alert (#69113) #69337

Merged

1 task

potiuk added this to the Airflow 3.3.1 milestone Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce noise in the daily CI duration trend alert#69113

Reduce noise in the daily CI duration trend alert#69113
potiuk merged 1 commit into
apache:mainfrom
potiuk:ci-duration-alert-reduce-noise

potiuk commented Jun 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

potiuk commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Was generative AI tooling used to co-author this PR?

Uh oh!

Uh oh!

github-actions Bot commented Jul 3, 2026

Backport successfully created: v3-3-test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

potiuk commented Jun 29, 2026 •

edited

Loading