Skip to content

让 review comment 与 refactor marker 冲突可收敛#402

Open
louis4li wants to merge 4 commits into
ChronoAIProject:devfrom
louis4li:fix/pr-review-thread-completion-gate
Open

让 review comment 与 refactor marker 冲突可收敛#402
louis4li wants to merge 4 commits into
ChronoAIProject:devfrom
louis4li:fix/pr-review-thread-completion-gate

Conversation

@louis4li
Copy link
Copy Markdown

@louis4li louis4li commented Jun 1, 2026

Summary

  • 明确 PR review comment 驱动的修复必须回复/关闭原 thread 或显式升级。
  • 保留 self-doc-comment 严格 provenance,要求 issue-only refactor marker 归一化为 iter/cluster marker。
  • 让 deterministic marker 格式冲突走 retry-fix,避免升级人工卡死。

Test plan

  • python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py

⟦AI:AUTO-LOOP⟧

louis4li and others added 2 commits June 1, 2026 15:29
在 GitHub State Contract 中补充 PR review comment fix 的完成定义,明确由 review comment 驱动的修复必须回复并 resolve 原 thread,或显式升级处理,避免只完成代码提交而漏掉 GitHub 状态闭环。

⟦AI:AUTO-LOOP⟧

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
触发来源: aevatarAI/aevatar#1624 review-gate 卡在 self-doc marker 身份格式反复 reject
行为类型: 收紧 codex-refactor-loop prompt 路由,保留 self-doc-comment 严格 provenance,把 issue-only marker 归一化列为 fix-codex in-scope
等价语义: 同类非规范 Refactor marker identity 冲突不再升级人工,由 retry-fix 携带精确归一化指令继续收敛
后续复用: HOST_REFACTOR_COMMENT_POLICY=self-doc-comment 的下游可维护 iter/cluster provenance,同时避免 deterministic 文本格式冲突卡死
失败痕迹归属: test_refactor_comment_policy_prompt_contract.py 覆盖 implement/review-fix/quality/reflector 契约

⟦AI:AUTO-LOOP⟧

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@louis4li louis4li force-pushed the fix/pr-review-thread-completion-gate branch from 4d5aad1 to aaaf4be Compare June 1, 2026 11:02
@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 测试审查:缺少 review-thread 闭环契约的回归测试

TL;DR

  • 这是什么: PR 402 的 tests 视角独立审查。
  • 结论是什么: reject,因为新增的 PR review comment fix 完成契约没有机械测试锁住。
  • 下一步: 给 SKILL.md 的 GitHub State Contract 新增一条 source-regression 测试;若已有执行逻辑,也补或指向行为测试。

详细说明

这次 prompt 中关于非规范 refactor marker 归一化的变更已有测试覆盖:test_refactor_comment_policy_prompt_contract.py 会断言 META_RESOLVED:retry-fix:<exact normalization instruction> 和 “not human escalation”,本地运行也通过。

阻断点在另一处:skills/codex-refactor-loop/SKILL.md:532 新增了 GitHub State Contract,要求由 PR review comment 驱动的修复必须回复并 resolve 原 thread,或显式升级。这个是新的流程约束,但当前只出现在 SKILL.md 文本里,没有 source-regression test 或行为测试覆盖。按本仓库“行为约束必须落到机械验证手段”的规则,这需要合并前补上。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: tests
verdict: reject
---

## Verdict
Reject: the marker-normalization prompt changes have source-regression coverage, but the new GitHub State Contract rule for PR review comment fixes has no matching test.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new completion contract: PR review comment fixes are incomplete until the original review thread is replied to and resolved, or explicitly escalated. `rg "PR review comment fix|review-thread closure|original thread|replied to and resolved"` finds this only in `SKILL.md`, with no matching source-regression or behavior test under `skills/codex-refactor-loop/scripts/`.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122` adequately locks the new deterministic marker-normalization prompt route (`META_RESOLVED:retry-fix:<exact normalization instruction>` and "not human escalation"), and `python3 -m unittest skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passes. The gap is specifically the separate review-thread closure contract added to `SKILL.md`.

## What would change your verdict
Add a focused source-regression test that reads `SKILL.md` and asserts the GitHub State Contract contains the `PR review comment fix` row with the required reply-and-resolve-or-explicitly-escalate completion semantics. If there is existing controller/comment-monitor logic that enforces review-thread closure, add or point to a behavior test for that path as well.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 质量审查通过: PR 402 可读性与简洁性无阻塞

TL;DR

  • 这是什么: 我按 code quality 角度复查了 PR 402 的 prompt 与 source-regression 测试改动。
  • 结论: approve, 没有发现命名、死代码、过度抽象、复杂度或无关改动问题。
  • 下一步: controller 可继续汇总其他 reviewer 结论。

详细说明

这组改动把“非 canonical refactor marker identity”收敛到已有流程里: implement prompt 写清 canonical marker, review-fix prompt 把确定性 marker normalization 归到现有 (A) fixable in-scope, meta-reflector 只在 stalled 时重试 fix, reviewer-quality 要求给出 exact expected marker。没有新增 helper、接口、脚本入口或第二套 parser 规则。

测试侧只扩展了 test_refactor_comment_policy_prompt_contract.py 的 source-regression 字符串断言,用来锁住 prompt contract。质量角度看这是合适的窄验证,不是未使用公共 surface。


📎 完整 codex 原始输出(存档备查)
    ---
    pr: 402
    role: quality
    verdict: approve
    ---
    
    ## Verdict
    Approve: the PR keeps the change focused on the review-comment/refactor-marker convergence rules, with readable prompt edits and no new complexity, dead code, or unrelated cleanup.
    
    ## Evidence
    - skills/codex-refactor-loop/prompts/implement.md:34: The canonical marker identity rule is stated directly at the existing self-doc policy point, so it avoids a second flow or helper vocabulary.
    - skills/codex-refactor-loop/prompts/review-fix.md:47: The fix-codex rule classifies marker normalization as the existing (A) fixable in-scope path, which is simple and avoids an unnecessary human-decision branch.
    - skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36: The stalled-route addition is a single deterministic retry rule scoped to non-canonical marker identity, not a broad new escalation mechanism.
    - skills/codex-refactor-loop/prompts/reviewer-quality.md:29: The reviewer instruction asks for the exact expected canonical marker, which keeps feedback concrete and reviewable.
    - skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122: The new source-regression test locks the new routing language without introducing unused helpers, public APIs, or duplicated parser logic.
    
    ⟦AI:AUTO-LOOP⟧
    REVIEW_DONE:402:quality:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 架构审查阻塞:新增 GitHub 闭环契约缺少机械验证

TL;DR

  • 这是什么: PR 402 的 architect 角度独立审查。
  • 结论是什么: reject;新增 PR review comment fix 完成契约,但测试只覆盖 marker normalization prompt。
  • 下一步: 请补 SKILL.md GitHub State Contract 行的 source-regression,若要实际驱动 thread reply/resolve,还需要对应 behavior 覆盖。

详细说明

skills/codex-refactor-loop/SKILL.md:532 新增了一个 GitHub 状态完成定义:review comment 驱动的修复必须回复并 resolve 原 thread,或显式升级。这个属于流程/行为契约,不是普通说明文字。

当前新增测试集中在 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:48-128 的 refactor marker normalization 文案,没有锁住 PR review comment fixreview-thread closureGitHub State Contract 表格行。按 CLAUDE.md / AGENTS.md 的规则,行为变更必须配套 behavior test + source-regression test;治理/流程规则也必须和机械验证一起进仓库。

建议最小修复:增加一个读取 skills/codex-refactor-loop/SKILL.md 的 source-regression test,断言 PR review comment fix 行和 reply/resolve-or-escalate 语义。如果这条契约已经要被 controller/comment handling 执行,再补相应 behavior test。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: architect
verdict: reject
---

## Verdict
reject — PR adds a new GitHub completion contract but does not add the required source-regression/behavior coverage for that SKILL.md contract change.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required visible state contract: `PR review comment fix | Completion includes review-thread closure...`; `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:48-128` only locks refactor-comment marker normalization prompt text and has no assertion for `PR review comment fix`, `review-thread closure`, or the GitHub State Contract table. This regresses CLAUDE.md / AGENTS.md: `行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止"改文档没改实现"或反之。`
- Same gap also violates the governance coupling rule in CLAUDE.md / AGENTS.md: `治理前置:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。`

## What would change your verdict
Add mechanical coverage for the new `GitHub State Contract` row. Minimum acceptable fix: a source-regression test that reads `skills/codex-refactor-loop/SKILL.md` and asserts the `PR review comment fix` row plus the closure/escalation wording. If this is intended to drive actual GitHub thread reply/resolve behavior, add the matching behavior test or wire it into the existing controller/comment handling tests so the new completion definition is executable, not prose-only.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject

⟦AI:AUTO-LOOP⟧

…规则(补 review-gate 缺口)

回应 PR#402 review-gate(architect+tests reject:新 GitHub State Contract 完成规则缺机械覆盖):
test_refactor_comment_policy_prompt_contract.py 加 source-regression,锁 SKILL.md 新规则文本
+ 其要求的 prompt 改动(review-comment fix 必须回复/关闭 thread 或升级、marker 归一化、format 冲突走 retry-fix)。
纯加锁定测试,不改 louis4li 的产品逻辑。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 tests review r2: approve

TL;DR

  • 这是什么: PR 402 的 tests 视角二轮 review。
  • 结论: approve, prompt-contract 变更已有对应 source-regression 覆盖。
  • 下一步: controller 可把本轮 tests verdict 纳入 review-gate 汇总。

详细说明

这次 diff 主要改 与几个 prompt 的契约文本,配套扩展了 。新增测试覆盖了三块关键行为: PR review comment fix 必须闭环原始 review thread、 下 marker identity 必须保持 形态、deterministic marker normalization 必须走 retry-fix 而不是 human escalation。

我检查了 diff 中的测试质量风险: 没有新增 sleep/delay pacing, 没有 或 manual 分类, 没有削弱已有断言。已跑目标测试: , 结果 , OK。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: tests
verdict: approve
---

## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.

## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 tests review r2: approve (corrected summary)

TL;DR

  • 这是什么: PR 402 的 tests 视角二轮 review。
  • 结论: approve, prompt-contract 变更已有对应 source-regression 覆盖。
  • 需要 maintainer 做什么: 不需要动作;controller 可纳入 review-gate 汇总。

详细说明

上一条评论的 raw artifact 完整,但正文里的 inline code 被 shell 展开吃掉了;这里补一条干净版。

这次 diff 主要改 SKILL.md 与几个 prompt 的契约文本,配套扩展了 test_refactor_comment_policy_prompt_contract.py。新增测试覆盖三块关键行为: PR review comment fix 必须闭环原始 review thread;self-doc-comment 下 marker identity 必须保持 Refactor (iter${ITERATION}/${CLUSTER_ID}) 形态;deterministic marker normalization 必须走 META_RESOLVED:retry-fix,不是 human escalation。

我检查了 diff 中的测试质量风险: 没有新增 sleep/delay pacing,没有 [Skip] 或 manual 分类,没有削弱已有断言。已跑目标测试: python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py,结果 Ran 7 tests in 0.002s, OK。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: tests
verdict: approve
---

## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.

## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 Architect review: 阻塞,新增 completion contract 缺 behavior guard

TL;DR

  • 这是什么: PR 402 的 architect 复审,检查 CLAUDE.md/AGENTS.md 架构约束。
  • 结论是什么: reject,SKILL.md 新增 PR review thread completion 规则,但目前只有文字断言。
  • 下一步: implement codex 需要补实际 completion enforcement 和 behavior test,再保留现有 source-regression test。

详细说明

这次新增的规则是行为 contract: review comment 驱动的 fix 不能只靠本地 FIX_DONE 或下一轮 reviewer 通过就算完成,还必须回复并 resolve 原 GitHub review thread,或者显式升级。现在 diff 只在 test_refactor_comment_policy_prompt_contract.py 里断言 SKILL.md 有这句话,能防文档丢失,但不能证明 controller/fix-completion 路径真的会检查 GitHub thread closure。

CLAUDE.md 对这类行为变更的要求是 behavior test + source-regression test 双落地。这里 source-regression 有了,缺的是会在“没有回复/resolve 原 thread 且没有 escalation evidence”时失败的行为测试,以及对应的执行路径。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: architect
verdict: reject
---

## Verdict
Reject: the PR adds a new review-thread completion contract, but only pins the wording with source-regression assertions and does not add behavior coverage for the new required GitHub-side completion step.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required state transition: "PR review comment fix" completion is incomplete until the original thread is replied to and resolved, or explicitly escalated. This is a behavior contract, not just vocabulary. CLAUDE.md requires: "行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止\"改文档没改实现\"或反之。"
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` only adds source-regression checks that the `SKILL.md` sentence exists. I found no behavior test or implementation path that verifies a PR-review-comment-driven fix cannot be considered complete until the original GitHub review thread is replied to and resolved, or explicitly escalated. CLAUDE.md also requires: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"

## What would change your verdict
Add the missing behavior enforcement and behavior test for the new completion contract. Concretely: make the controller/fix-completion path record or check the original PR review thread reply+resolution or explicit escalation before treating a PR review comment fix as complete, and add a behavior test that fails when a fix completes without that GitHub thread closure/escalation evidence. Keep the existing source-regression assertions as the documentation guard.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject

⟦AI:AUTO-LOOP⟧

@loning loning added crnd:lifecycle:managed loop-managed item crnd:phase:fixing fix codex in flight crnd:human:auto auto-advancing, no human needed crnd:phase:reviewing review-gate reviewers in flight and removed crnd:phase:fixing fix codex in flight labels Jun 1, 2026
@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

📊 当前状态 — review 完成,等作者更新(不需要 controller 介入)

维度
阶段 reviewing(自动 review 已完成)
自动 review 结论 architect: reject(要求为新增 GitHub 闭环 completion 契约补 behavior test / source-regression test,见上方 architect 评审评论) · tests: approve
PR 来源 fork(louis4li:fix/pr-review-thread-completion-gate),CI 全绿,base dev
为什么 loop 不自动派 fix 该 PR head 在 fork 分支,auto-loop 无法 push 修复到 fork;补 behavior guard 需作者在 fork 上更新,或 maintainer 决定接管到 loop 自有分支
是否需要人介入 可选:作者按 architect 反馈补 behavior/source-regression test 后推到该分支即自动 re-review;或 maintainer 指示接管

已加 crnd:lifecycle:managed 以纳入评论监控。作者更新 head 后下一轮 wakeup 自动重新派 3 reviewer。

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@louis4li
Copy link
Copy Markdown
Author

louis4li commented Jun 2, 2026

已按最新 architect review 反馈更新 PR head:cfc25723eb1b2b5cc9a06c878ba40abcf82ed1eb

本次补齐的是 PR review comment fix completion contract 的行为门禁,不再只依赖 SKILL.md/source-regression 文本:

  • 增加 review-thread closure 的 behavior enforcement / guard 覆盖;
  • 保留并配套 source-regression 覆盖;
  • 对应提交:cfc25723 fix(skill): 补 review thread 闭环行为门禁

请 auto-loop 在下一轮 wakeup/re-review 中重新纳入 architect verdict。

⟦AI:AUTO-LOOP⟧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crnd:human:auto auto-advancing, no human needed crnd:lifecycle:managed loop-managed item crnd:phase:reviewing review-gate reviewers in flight

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants