[Backlog Discovery] feat(backlog): workflow-metrics-regression-monitoring#115
[Backlog Discovery] feat(backlog): workflow-metrics-regression-monitoring#115
Conversation
Summary of ChangesHello @bestony, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求引入了一个新的待办事项发现文档,旨在解决自驱工作流缺乏历史性能指标和回归监控的问题。目前,工作流性能(如运行时长、排队时间、失败率)的退化主要依靠人工发现,效率低下。通过建立自动化的指标收集和回归告警机制,本 PR 旨在提升系统健康状况的可观测性,降低因隐性性能退化而导致的迭代停滞风险。 Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| ## 1. Requirement Summary | ||
| - 当前仓库缺少按工作流维度的历史运行指标与趋势对比,运行时长、排队时间与失败率的回归只能靠人工发现,导致 schedule 任务逐步变慢或不稳定时难以及时定位与纠偏。 | ||
|
|
||
| ## 2. Target Users | ||
| - 仓库维护者 | ||
| - 自动化平台负责人 | ||
| - 值班/运维人员 | ||
|
|
||
| ## 3. Core Scenarios | ||
| - Backlog Discovery 或 Product Designer 连续多次运行时长飙升但无人察觉 | ||
| - GitHub Actions 排队时间上升导致计划与修复链路整体延迟 | ||
| - 某工作流失败率在一周内显著升高但只有单次失败通知,缺少趋势提醒 | ||
|
|
||
| ## 4. User Problems | ||
| - 当前仓库缺少按工作流维度的历史运行指标与趋势对比,运行时长、排队时间与失败率的回归只能靠人工发现,导致 schedule 任务逐步变慢或不稳定时难以及时定位与纠偏。 |
There was a problem hiding this comment.
[Backlog Discovery]
Reviewer: Product Manager
- 验收标准第 2 条“创建/更新 issue 或在 Job Summary 输出告警”选择不明确,导致交付不可验证;建议明确默认告警渠道及触发条件,必要时给出可配置的选择规则。
- 指标结果“持久化为可追溯的 JSON/CSV 结果文件”缺少存储位置与保留策略,存在仓库膨胀或数据不可见风险;建议明确存放介质(例如 artifact/专用目录/外部存储)与保留周期。
- 回归判定仅说明“7 天 vs 30 天基线、阈值可配置”,但未定义最小样本量与统计口径(成功率/失败率、排队时间计算方式),可能产生误报/漏报;建议补充指标计算口径与样本门槛。
|
[Reviewer Workflow] 需求价值评估
价值点
风险与建议
|
There was a problem hiding this comment.
[Backlog Discovery]
Reviewer: Product Manager
- 指标覆盖范围不清楚(是否包含所有 workflow 与所有触发类型,是否包含 PR/手动运行),可能导致基线混杂、告警噪音高。建议明确默认纳入/排除的 workflow 与事件类型,并提供可配置白名单/黑名单。
- 失败率口径未说明如何处理 cancelled/skipped/timeout 等状态,容易造成误报或漏报。建议明确分母口径与状态归类规则,并在报告中展示各状态占比以便校验。
- “默认指派仓库维护者”缺少可执行定义(谁是维护者、如何解析),可能导致告警无人接收。建议明确默认负责人解析规则(如 CODEOWNERS/仓库 owner/配置项),并在缺失时回退为不指派但必须@提醒。
There was a problem hiding this comment.
[Backlog Discovery]
Reviewer: Product Manager
- Workflow-level指标的计算口径不完整:当前用 job 的时间字段定义运行/排队时长,但未说明多 job / matrix 时如何汇总到 workflow(max/min/avg?)以及先算 run 再算 7/30 天统计。建议补充明确的聚合规则与统计路径,避免实现各自理解导致指标不可比。
- 成功率/失败率表述不一致:Summary/场景提到“成功率”,但验收仅给出失败率公式且排除 cancelled/skipped。建议统一只用失败率或补充成功率公式与 cancelled/skipped 的处理方式,避免告警阈值含义不一致。
- 数据缺口对回归判断的处理不明确:要求标记缺失区间与重试次数,但未定义当 7 天或 30 天窗口存在缺口时是否仍触发回归。建议加入最小数据完整性门槛(例如覆盖率阈值)并明确触发/不触发规则,降低误报风险。
There was a problem hiding this comment.
[Backlog Discovery]
Reviewer: Product Manager
- Scope definition is internally inconsistent: the requirement mentions default coverage for all workflows/event types, but the rollout guidance says initial coverage should be core/high-frequency allowlist only, which changes expected alert volume and ownership. Suggest explicitly define the default coverage vs pilot phase (what’s enabled by default, what’s gated behind config) and add a rollout timeline/criteria in acceptance.
- Low-frequency workflows are called out as a risk, but acceptance criteria only say to label “data insufficient” without a defined alternative path or exclusion policy, which leaves owners unclear on expected behavior. Suggest add a concrete policy (e.g., exclude below threshold by default, or extend window/aggregation rules) so coverage expectations and alert eligibility are unambiguous.
[Backlog Discovery]
backlog/20260221081835-workflow-metrics-regression-monitoring.mdworkflow-metrics-regression-monitoring[Backlog Discovery]
Update Record 2026-02-21 16:21:41 +08:00
Update summary:
Clarified user problems by defining baseline/alert gaps, ownership/dedup risks, and missing metric definitions.
Expanded solution to compare 7-day windows against 30-day baselines, add sample thresholds, and specify issue + Job Summary alerts with dedup/owner config.
Acceptance criteria now store metrics as artifacts with retention, define metric formulas and minimum sample sizes, and add configurable regression thresholds.
Alerting requirements now include labels, workflow/metric/window deduping, default assignee behavior, and report details for affected workflows.
Status: committed
Commit:
657bbd9dbfa93b572adee6ebeb68c9d605617f36Trigger: workflow_run.completed
Comment: https://github.com/bestony/self/actions/runs/22253473292
Actor: @github-actions[bot]
Updated At (Asia/Shanghai): 2026-02-21 16:21:41 +08:00
[Backlog Discovery]
Update Record 2026-02-21 16:24:22 +08:00
Update summary:
Added gaps/risks for low-frequency workflows and API rate-limit/data gap impacts in the problem statement.
Clarified default coverage: all workflows, event types default to schedule/workflow_dispatch/default-branch push, exclude PR events, with allow/deny lists.
Expanded acceptance criteria for sample insufficiency handling with Job Summary guidance and no issue alert on insufficient data.
Refined metric definitions and reporting to include status breakdowns, adjusted failure rate denominator, and explicitly report cancelled/skipped.
Strengthened alerting/ownership rules and required reporting of API fetch failures, missing windows, and retries.
Status: committed
Commit:
27aa9891cd2f90ff573720c8096537f3cedbe241Trigger: workflow_run.completed
Comment: https://github.com/bestony/self/actions/runs/22253507894
Actor: @github-actions[bot]
Updated At (Asia/Shanghai): 2026-02-21 16:24:22 +08:00
[Backlog Discovery]
Update Record 2026-02-21 16:27:31 +08:00
Update summary:
Added a requirement to capture historical regression examples and impact size to validate alert value.
Refined coverage guidance to use allowlists/denylists and start with core/high-frequency workflows before expanding.
Added an “observation mode” for alerts that only writes Job Summary without creating issues.
Clarified rollout criteria: require at least two historical regression cases and track pilot false positives and response effort before broad rollout.
Status: committed
Commit:
0ded63f6fa03f4096553a05578c0f75881da5562Trigger: workflow_run.completed
Comment: https://github.com/bestony/self/actions/runs/22253544839
Actor: @github-actions[bot]
Updated At (Asia/Shanghai): 2026-02-21 16:27:31 +08:00
[Backlog Discovery]
Update Record 2026-02-21 16:30:17 +08:00
Update summary:
Clarified metric scope defaults to include schedule, workflow_dispatch, and default-branch push events.
Explicitly excludes PR events from the default monitoring event types.
Kept configurable workflow/event allowlists and blacklists while noting default rollout remains core/high-frequency workflows.
Status: committed
Commit:
daeda6477f11d9a94131be4d17e9ff7f41ef553aTrigger: workflow_run.completed
Comment: https://github.com/bestony/self/actions/runs/22253582823
Actor: @github-actions[bot]
Updated At (Asia/Shanghai): 2026-02-21 16:30:17 +08:00
[Backlog Discovery]
Update Record 2026-02-21 16:33:31 +08:00
Update summary:
Tightened requirement summary wording to focus on workflow-level trend regression alerting and delayed detection risks.
Refined user problems to emphasize actionable-signal gaps, reliability concerns for regression criteria, and low-frequency coverage strategy.
Expanded regression hypothesis to add data completeness thresholds alongside minimum sample sizes.
Reworked acceptance criteria to define workflow-level metric formulas, median/mean aggregation rules, and explicit data completeness gates for regression decisions.
Clarified alerting behavior so issues are created only when regression and completeness thresholds are met, with guidance for data gaps.
Added pilot success thresholds for acceptable false-positive rate and median remediation time.
Status: committed
Commit:
657b2fbfb4bdbdc48ce92b61707b064b452b6715Trigger: workflow_run.completed
Comment: https://github.com/bestony/self/actions/runs/22253619131
Actor: @github-actions[bot]
Updated At (Asia/Shanghai): 2026-02-21 16:33:31 +08:00
[Backlog Discovery]
Update Record 2026-02-21 16:36:50 +08:00
Update summary:
Clarified rollout guidance to default the initial whitelist to core/high-frequency workflows before broader coverage.
Added a low-frequency strategy to skip regression checks when samples are insufficient, with optional extended windows or observation-only mode.
Status: committed
Commit:
d391a30579672adfdc80d2a6ad9e81820c31dceaTrigger: workflow_run.completed
Comment: https://github.com/bestony/self/actions/runs/22253665367
Actor: @github-actions[bot]
Updated At (Asia/Shanghai): 2026-02-21 16:36:50 +08:00
[Backlog Discovery]
Update Record 2026-02-21 16:39:12 +08:00
Update summary:
Reviewed trusted feedback and current PR content; no document update was needed.
Status: no_change
Commit:
N/ATrigger: workflow_run.completed
Comment: https://github.com/bestony/self/actions/runs/22253707739
Actor: @github-actions[bot]
Updated At (Asia/Shanghai): 2026-02-21 16:39:12 +08:00