Skip to content
This repository was archived by the owner on Apr 30, 2026. It is now read-only.

[Backlog Discovery] feat(backlog): workflow-metrics-regression-monitoring#115

Open
bestony wants to merge 7 commits intomainfrom
backlog/20260221081835-workflow-metrics-regression-monitoring-workflow
Open

[Backlog Discovery] feat(backlog): workflow-metrics-regression-monitoring#115
bestony wants to merge 7 commits intomainfrom
backlog/20260221081835-workflow-metrics-regression-monitoring-workflow

Conversation

@bestony
Copy link
Copy Markdown
Owner

@bestony bestony commented Feb 21, 2026

[Backlog Discovery]

  • Requirement title: 为自驱工作流建立历史指标与回归告警
  • Priority: P2
  • Requirement file: backlog/20260221081835-workflow-metrics-regression-monitoring.md
  • Dedupe key: workflow-metrics-regression-monitoring
  • Source run: https://github.com/bestony/self/actions/runs/22253416257

[Backlog Discovery]

Update Record 2026-02-21 16:21:41 +08:00

Update summary:

  • Clarified user problems by defining baseline/alert gaps, ownership/dedup risks, and missing metric definitions.

  • Expanded solution to compare 7-day windows against 30-day baselines, add sample thresholds, and specify issue + Job Summary alerts with dedup/owner config.

  • Acceptance criteria now store metrics as artifacts with retention, define metric formulas and minimum sample sizes, and add configurable regression thresholds.

  • Alerting requirements now include labels, workflow/metric/window deduping, default assignee behavior, and report details for affected workflows.

  • Status: committed

  • Commit: 657bbd9dbfa93b572adee6ebeb68c9d605617f36

  • Trigger: workflow_run.completed

  • Comment: https://github.com/bestony/self/actions/runs/22253473292

  • Actor: @github-actions[bot]

  • Updated At (Asia/Shanghai): 2026-02-21 16:21:41 +08:00


[Backlog Discovery]

Update Record 2026-02-21 16:24:22 +08:00

Update summary:

  • Added gaps/risks for low-frequency workflows and API rate-limit/data gap impacts in the problem statement.

  • Clarified default coverage: all workflows, event types default to schedule/workflow_dispatch/default-branch push, exclude PR events, with allow/deny lists.

  • Expanded acceptance criteria for sample insufficiency handling with Job Summary guidance and no issue alert on insufficient data.

  • Refined metric definitions and reporting to include status breakdowns, adjusted failure rate denominator, and explicitly report cancelled/skipped.

  • Strengthened alerting/ownership rules and required reporting of API fetch failures, missing windows, and retries.

  • Status: committed

  • Commit: 27aa9891cd2f90ff573720c8096537f3cedbe241

  • Trigger: workflow_run.completed

  • Comment: https://github.com/bestony/self/actions/runs/22253507894

  • Actor: @github-actions[bot]

  • Updated At (Asia/Shanghai): 2026-02-21 16:24:22 +08:00


[Backlog Discovery]

Update Record 2026-02-21 16:27:31 +08:00

Update summary:

  • Added a requirement to capture historical regression examples and impact size to validate alert value.

  • Refined coverage guidance to use allowlists/denylists and start with core/high-frequency workflows before expanding.

  • Added an “observation mode” for alerts that only writes Job Summary without creating issues.

  • Clarified rollout criteria: require at least two historical regression cases and track pilot false positives and response effort before broad rollout.

  • Status: committed

  • Commit: 0ded63f6fa03f4096553a05578c0f75881da5562

  • Trigger: workflow_run.completed

  • Comment: https://github.com/bestony/self/actions/runs/22253544839

  • Actor: @github-actions[bot]

  • Updated At (Asia/Shanghai): 2026-02-21 16:27:31 +08:00


[Backlog Discovery]

Update Record 2026-02-21 16:30:17 +08:00

Update summary:

  • Clarified metric scope defaults to include schedule, workflow_dispatch, and default-branch push events.

  • Explicitly excludes PR events from the default monitoring event types.

  • Kept configurable workflow/event allowlists and blacklists while noting default rollout remains core/high-frequency workflows.

  • Status: committed

  • Commit: daeda6477f11d9a94131be4d17e9ff7f41ef553a

  • Trigger: workflow_run.completed

  • Comment: https://github.com/bestony/self/actions/runs/22253582823

  • Actor: @github-actions[bot]

  • Updated At (Asia/Shanghai): 2026-02-21 16:30:17 +08:00


[Backlog Discovery]

Update Record 2026-02-21 16:33:31 +08:00

Update summary:

  • Tightened requirement summary wording to focus on workflow-level trend regression alerting and delayed detection risks.

  • Refined user problems to emphasize actionable-signal gaps, reliability concerns for regression criteria, and low-frequency coverage strategy.

  • Expanded regression hypothesis to add data completeness thresholds alongside minimum sample sizes.

  • Reworked acceptance criteria to define workflow-level metric formulas, median/mean aggregation rules, and explicit data completeness gates for regression decisions.

  • Clarified alerting behavior so issues are created only when regression and completeness thresholds are met, with guidance for data gaps.

  • Added pilot success thresholds for acceptable false-positive rate and median remediation time.

  • Status: committed

  • Commit: 657b2fbfb4bdbdc48ce92b61707b064b452b6715

  • Trigger: workflow_run.completed

  • Comment: https://github.com/bestony/self/actions/runs/22253619131

  • Actor: @github-actions[bot]

  • Updated At (Asia/Shanghai): 2026-02-21 16:33:31 +08:00


[Backlog Discovery]

Update Record 2026-02-21 16:36:50 +08:00

Update summary:

  • Clarified rollout guidance to default the initial whitelist to core/high-frequency workflows before broader coverage.

  • Added a low-frequency strategy to skip regression checks when samples are insufficient, with optional extended windows or observation-only mode.

  • Status: committed

  • Commit: d391a30579672adfdc80d2a6ad9e81820c31dcea

  • Trigger: workflow_run.completed

  • Comment: https://github.com/bestony/self/actions/runs/22253665367

  • Actor: @github-actions[bot]

  • Updated At (Asia/Shanghai): 2026-02-21 16:36:50 +08:00


[Backlog Discovery]

Update Record 2026-02-21 16:39:12 +08:00

Update summary:

  • Reviewed trusted feedback and current PR content; no document update was needed.

  • Status: no_change

  • Commit: N/A

  • Trigger: workflow_run.completed

  • Comment: https://github.com/bestony/self/actions/runs/22253707739

  • Actor: @github-actions[bot]

  • Updated At (Asia/Shanghai): 2026-02-21 16:39:12 +08:00

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @bestony, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求引入了一个新的待办事项发现文档,旨在解决自驱工作流缺乏历史性能指标和回归监控的问题。目前,工作流性能(如运行时长、排队时间、失败率)的退化主要依靠人工发现,效率低下。通过建立自动化的指标收集和回归告警机制,本 PR 旨在提升系统健康状况的可观测性,降低因隐性性能退化而导致的迭代停滞风险。

Highlights

  • 新增需求文档: 新增了一个待办事项发现文档,详细说明了为自驱工作流建立历史指标与回归告警的需求。
  • 问题阐述: 该文档阐述了当前缺乏工作流维度历史运行指标和趋势对比的问题,导致运行时长、排队时间与失败率的回归难以及时发现和纠正。
  • 解决方案: 提出了基于 GitHub Actions REST API 收集数据,计算工作流级别指标,并在回归阈值触发时发出告警的解决方案。
Changelog
  • backlog/20260221081835-workflow-metrics-regression-monitoring.md
    • 新增了关于工作流指标与回归告警的需求文档。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次变更添加了一个关于工作流指标回归监控的需求文档。该文档由 Backlog Discovery 工作流自动生成,整体结构清晰,内容详实。

我发现文档中的“需求摘要”和“用户问题”部分内容完全重复,这可能会降低文档的可读性。我已在具体的 review comment 中提出了修改建议,以使各部分内容职责更分明,提升文档质量。除此之外,没有其他问题。

Comment on lines +13 to +27
## 1. Requirement Summary
- 当前仓库缺少按工作流维度的历史运行指标与趋势对比,运行时长、排队时间与失败率的回归只能靠人工发现,导致 schedule 任务逐步变慢或不稳定时难以及时定位与纠偏。

## 2. Target Users
- 仓库维护者
- 自动化平台负责人
- 值班/运维人员

## 3. Core Scenarios
- Backlog Discovery 或 Product Designer 连续多次运行时长飙升但无人察觉
- GitHub Actions 排队时间上升导致计划与修复链路整体延迟
- 某工作流失败率在一周内显著升高但只有单次失败通知,缺少趋势提醒

## 4. User Problems
- 当前仓库缺少按工作流维度的历史运行指标与趋势对比,运行时长、排队时间与失败率的回归只能靠人工发现,导致 schedule 任务逐步变慢或不稳定时难以及时定位与纠偏。
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

“需求摘要 (Requirement Summary)” 和 “用户问题 (User Problems)” 两部分的内容完全相同,这造成了信息冗余。建议将 “需求摘要” 精炼为对整个需求的概括,而 “用户问题” 则详细描述用户遇到的具体痛点。

例如,“需求摘要”可以修改为:

- 为解决 schedule 任务性能衰退难以被及时发现的问题,建议建立一套工作流历史指标监控与回归告警机制。

这样可以使文档结构更清晰,重点更突出,避免重复。

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Backlog Discovery]
Reviewer: Product Manager

  • 验收标准第 2 条“创建/更新 issue 或在 Job Summary 输出告警”选择不明确,导致交付不可验证;建议明确默认告警渠道及触发条件,必要时给出可配置的选择规则。
  • 指标结果“持久化为可追溯的 JSON/CSV 结果文件”缺少存储位置与保留策略,存在仓库膨胀或数据不可见风险;建议明确存放介质(例如 artifact/专用目录/外部存储)与保留周期。
  • 回归判定仅说明“7 天 vs 30 天基线、阈值可配置”,但未定义最小样本量与统计口径(成功率/失败率、排队时间计算方式),可能产生误报/漏报;建议补充指标计算口径与样本门槛。

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 21, 2026

[Reviewer Workflow]
Reviewer: Product Manager

需求价值评估

  • 是否有价值: 有价值
  • 优先级: P2
  • Reviewer 视角结论: 能提升自驱系统对工作流退化的早发现与响应,但仍需用本仓库历史案例验证告警收益。

价值点

  • 为自驱工作流建立量化基线与回归告警,减少性能退化被延迟发现导致的迭代停滞。
  • 明确告警去重与负责人机制,降低误报与无人响应风险,提高运维可执行性。
  • 通过观察期与样本/完整性门槛,提升告警可信度并控制推送成本。

风险与建议

  • 证据偏通用,缺少本仓库退化案例与影响数据;需先补齐至少 2 个历史回归样例再进入全面 rollout。
  • API 限流与低频 workflow 可能导致数据不足;建议先对核心/高频工作流试点并跟踪误报率与处置耗时。

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Backlog Discovery]
Reviewer: Product Manager

  • 指标覆盖范围不清楚(是否包含所有 workflow 与所有触发类型,是否包含 PR/手动运行),可能导致基线混杂、告警噪音高。建议明确默认纳入/排除的 workflow 与事件类型,并提供可配置白名单/黑名单。
  • 失败率口径未说明如何处理 cancelled/skipped/timeout 等状态,容易造成误报或漏报。建议明确分母口径与状态归类规则,并在报告中展示各状态占比以便校验。
  • “默认指派仓库维护者”缺少可执行定义(谁是维护者、如何解析),可能导致告警无人接收。建议明确默认负责人解析规则(如 CODEOWNERS/仓库 owner/配置项),并在缺失时回退为不指派但必须@提醒。

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Backlog Discovery]
Reviewer: Product Manager

  • 验收标准未明确“默认覆盖的事件类型与排除 PR 事件”的具体规则(更新记录中要求 schedule/workflow_dispatch/default-branch push 且排除 PR)。建议在“指标覆盖范围”条款中补充默认事件类型与排除范围,避免实现与预期偏差。

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Backlog Discovery]
Reviewer: Product Manager

  • Workflow-level指标的计算口径不完整:当前用 job 的时间字段定义运行/排队时长,但未说明多 job / matrix 时如何汇总到 workflow(max/min/avg?)以及先算 run 再算 7/30 天统计。建议补充明确的聚合规则与统计路径,避免实现各自理解导致指标不可比。
  • 成功率/失败率表述不一致:Summary/场景提到“成功率”,但验收仅给出失败率公式且排除 cancelled/skipped。建议统一只用失败率或补充成功率公式与 cancelled/skipped 的处理方式,避免告警阈值含义不一致。
  • 数据缺口对回归判断的处理不明确:要求标记缺失区间与重试次数,但未定义当 7 天或 30 天窗口存在缺口时是否仍触发回归。建议加入最小数据完整性门槛(例如覆盖率阈值)并明确触发/不触发规则,降低误报风险。

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Backlog Discovery]
Reviewer: Product Manager

  • Scope definition is internally inconsistent: the requirement mentions default coverage for all workflows/event types, but the rollout guidance says initial coverage should be core/high-frequency allowlist only, which changes expected alert volume and ownership. Suggest explicitly define the default coverage vs pilot phase (what’s enabled by default, what’s gated behind config) and add a rollout timeline/criteria in acceptance.
  • Low-frequency workflows are called out as a risk, but acceptance criteria only say to label “data insufficient” without a defined alternative path or exclusion policy, which leaves owners unclear on expected behavior. Suggest add a concrete policy (e.g., exclude below threshold by default, or extend window/aggregation rules) so coverage expectations and alert eligibility are unambiguous.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant