Reduce noise in the daily CI duration trend alert#69113
Merged
Conversation
The duration monitor flagged jobs by comparing a single nightly canary run against the median of the preceding runs, so any one slow run — slow PyPI, runner queue pressure, a cold cache — tripped the alert. Because a different run was "latest" each day, a different set of jobs was flagged each day, and network-bound constraint-resolution jobs that legitimately swing tens of minutes dominated nearly every alert. The result was a near-daily alert whose contents swung wildly and carried little signal. Compare the median of the last few nightly runs against the baseline so the two sides are symmetric and one unlucky run no longer trips it, and require a larger absolute jump before flagging individual jobs. Pin the monitor to successful (green) canary runs only. A failed or cancelled canary stops partway, so its truncated wall-clock and per-job durations would skew the baseline downwards and mask real regressions. The script already defaults to this, but the guarantee is now explicit at the call site so it cannot be silently changed.
7252b10 to
c58fa46
Compare
vatsrahul1001
approved these changes
Jul 3, 2026
Contributor
Backport successfully created: v3-3-testNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
|
1 task
potiuk
added a commit
that referenced
this pull request
Jul 3, 2026
The duration monitor flagged jobs by comparing a single nightly canary run against the median of the preceding runs, so any one slow run — slow PyPI, runner queue pressure, a cold cache — tripped the alert. Because a different run was "latest" each day, a different set of jobs was flagged each day, and network-bound constraint-resolution jobs that legitimately swing tens of minutes dominated nearly every alert. The result was a near-daily alert whose contents swung wildly and carried little signal. Compare the median of the last few nightly runs against the baseline so the two sides are symmetric and one unlucky run no longer trips it, and require a larger absolute jump before flagging individual jobs. Pin the monitor to successful (green) canary runs only. A failed or cancelled canary stops partway, so its truncated wall-clock and per-job durations would skew the baseline downwards and mask real regressions. The script already defaults to this, but the guarantee is now explicit at the call site so it cannot be silently changed. (cherry picked from commit e99daee)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The daily CI Duration Trend Alert (
.github/workflows/ci-duration-monitor.yml+scripts/ci/analyze_ci_job_durations.py) has been firing most nights with a wildly varying set of "jobs that got slower", carrying little real signal.Root cause: the monitor compared a single nightly canary run against the median of the preceding ~24 runs (
LATEST_RUNSdefaulted to1). The two sides were asymmetric — a raw, unsmoothed point against a robust median — so any one unlucky run (slow PyPI, GitHub runner queue pressure, a cold cache) tripped the alert. Because a different run is "latest" each day, a different set of jobs got flagged each day. Network-bound constraint-resolution jobs (Finalize tests / Deps 3.x:constraints,Generate constraints, provider installs), which legitimately swing tens of minutes run-to-run, dominated nearly every alert and only needed to clear a 3-minute absolute floor.Fix:
LATEST_RUNS: "3"— compare the median of the last 3 nightly runs against the baseline so the two sides are symmetric and one unlucky run no longer trips it.JOB_MIN_ABS_INCREASE_MINUTES: "6"— require a larger sustained absolute jump before flagging an individual job, so ordinary network variance on the long constraint jobs stops alerting.ONLY_SUCCESSFUL: "true"— pin the baseline to successful (green) canary runs. The script already defaulted to this, but a failed/cancelled canary stops partway and its truncated durations would skew the trend downwards and mask regressions; setting it explicitly keeps the green-only guarantee visible and unchangeable-by-accident at the call site.No script logic changes — the existing env knobs already support this, so the behaviour is config-only and the script's tests are unaffected. Genuine sustained regressions still trip the alert.
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.8) following the guidelines