让 review comment 与 refactor marker 冲突可收敛#402
Conversation
在 GitHub State Contract 中补充 PR review comment fix 的完成定义,明确由 review comment 驱动的修复必须回复并 resolve 原 thread,或显式升级处理,避免只完成代码提交而漏掉 GitHub 状态闭环。 ⟦AI:AUTO-LOOP⟧ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
触发来源: aevatarAI/aevatar#1624 review-gate 卡在 self-doc marker 身份格式反复 reject 行为类型: 收紧 codex-refactor-loop prompt 路由,保留 self-doc-comment 严格 provenance,把 issue-only marker 归一化列为 fix-codex in-scope 等价语义: 同类非规范 Refactor marker identity 冲突不再升级人工,由 retry-fix 携带精确归一化指令继续收敛 后续复用: HOST_REFACTOR_COMMENT_POLICY=self-doc-comment 的下游可维护 iter/cluster provenance,同时避免 deterministic 文本格式冲突卡死 失败痕迹归属: test_refactor_comment_policy_prompt_contract.py 覆盖 implement/review-fix/quality/reflector 契约 ⟦AI:AUTO-LOOP⟧ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4d5aad1 to
aaaf4be
Compare
🤖 测试审查:缺少 review-thread 闭环契约的回归测试TL;DR
详细说明这次 prompt 中关于非规范 refactor marker 归一化的变更已有测试覆盖: 阻断点在另一处: 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: tests
verdict: reject
---
## Verdict
Reject: the marker-normalization prompt changes have source-regression coverage, but the new GitHub State Contract rule for PR review comment fixes has no matching test.
## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new completion contract: PR review comment fixes are incomplete until the original review thread is replied to and resolved, or explicitly escalated. `rg "PR review comment fix|review-thread closure|original thread|replied to and resolved"` finds this only in `SKILL.md`, with no matching source-regression or behavior test under `skills/codex-refactor-loop/scripts/`.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122` adequately locks the new deterministic marker-normalization prompt route (`META_RESOLVED:retry-fix:<exact normalization instruction>` and "not human escalation"), and `python3 -m unittest skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passes. The gap is specifically the separate review-thread closure contract added to `SKILL.md`.
## What would change your verdict
Add a focused source-regression test that reads `SKILL.md` and asserts the GitHub State Contract contains the `PR review comment fix` row with the required reply-and-resolve-or-explicitly-escalate completion semantics. If there is existing controller/comment-monitor logic that enforces review-thread closure, add or point to a behavior test for that path as well.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:reject⟦AI:AUTO-LOOP⟧ |
🤖 质量审查通过: PR 402 可读性与简洁性无阻塞TL;DR
详细说明这组改动把“非 canonical refactor marker identity”收敛到已有流程里: implement prompt 写清 canonical marker, review-fix prompt 把确定性 marker normalization 归到现有 (A) fixable in-scope, meta-reflector 只在 stalled 时重试 fix, reviewer-quality 要求给出 exact expected marker。没有新增 helper、接口、脚本入口或第二套 parser 规则。 测试侧只扩展了 📎 完整 codex 原始输出(存档备查) ---
pr: 402
role: quality
verdict: approve
---
## Verdict
Approve: the PR keeps the change focused on the review-comment/refactor-marker convergence rules, with readable prompt edits and no new complexity, dead code, or unrelated cleanup.
## Evidence
- skills/codex-refactor-loop/prompts/implement.md:34: The canonical marker identity rule is stated directly at the existing self-doc policy point, so it avoids a second flow or helper vocabulary.
- skills/codex-refactor-loop/prompts/review-fix.md:47: The fix-codex rule classifies marker normalization as the existing (A) fixable in-scope path, which is simple and avoids an unnecessary human-decision branch.
- skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36: The stalled-route addition is a single deterministic retry rule scoped to non-canonical marker identity, not a broad new escalation mechanism.
- skills/codex-refactor-loop/prompts/reviewer-quality.md:29: The reviewer instruction asks for the exact expected canonical marker, which keeps feedback concrete and reviewable.
- skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122: The new source-regression test locks the new routing language without introducing unused helpers, public APIs, or duplicated parser logic.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:quality:approve⟦AI:AUTO-LOOP⟧ |
🤖 架构审查阻塞:新增 GitHub 闭环契约缺少机械验证TL;DR
详细说明
当前新增测试集中在 建议最小修复:增加一个读取 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: architect
verdict: reject
---
## Verdict
reject — PR adds a new GitHub completion contract but does not add the required source-regression/behavior coverage for that SKILL.md contract change.
## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required visible state contract: `PR review comment fix | Completion includes review-thread closure...`; `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:48-128` only locks refactor-comment marker normalization prompt text and has no assertion for `PR review comment fix`, `review-thread closure`, or the GitHub State Contract table. This regresses CLAUDE.md / AGENTS.md: `行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止"改文档没改实现"或反之。`
- Same gap also violates the governance coupling rule in CLAUDE.md / AGENTS.md: `治理前置:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。`
## What would change your verdict
Add mechanical coverage for the new `GitHub State Contract` row. Minimum acceptable fix: a source-regression test that reads `skills/codex-refactor-loop/SKILL.md` and asserts the `PR review comment fix` row plus the closure/escalation wording. If this is intended to drive actual GitHub thread reply/resolve behavior, add the matching behavior test or wire it into the existing controller/comment handling tests so the new completion definition is executable, not prose-only.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject⟦AI:AUTO-LOOP⟧ |
…规则(补 review-gate 缺口) 回应 PR#402 review-gate(architect+tests reject:新 GitHub State Contract 完成规则缺机械覆盖): test_refactor_comment_policy_prompt_contract.py 加 source-regression,锁 SKILL.md 新规则文本 + 其要求的 prompt 改动(review-comment fix 必须回复/关闭 thread 或升级、marker 归一化、format 冲突走 retry-fix)。 纯加锁定测试,不改 louis4li 的产品逻辑。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🤖 tests review r2: approveTL;DR
详细说明这次 diff 主要改 与几个 prompt 的契约文本,配套扩展了 。新增测试覆盖了三块关键行为: PR review comment fix 必须闭环原始 review thread、 下 marker identity 必须保持 形态、deterministic marker normalization 必须走 retry-fix 而不是 human escalation。 我检查了 diff 中的测试质量风险: 没有新增 sleep/delay pacing, 没有 或 manual 分类, 没有削弱已有断言。已跑目标测试: , 结果 , OK。 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: tests
verdict: approve
---
## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.
## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 tests review r2: approve (corrected summary)TL;DR
详细说明上一条评论的 raw artifact 完整,但正文里的 inline code 被 shell 展开吃掉了;这里补一条干净版。 这次 diff 主要改 我检查了 diff 中的测试质量风险: 没有新增 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: tests
verdict: approve
---
## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.
## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 Architect review: 阻塞,新增 completion contract 缺 behavior guardTL;DR
详细说明这次新增的规则是行为 contract: review comment 驱动的 fix 不能只靠本地 CLAUDE.md 对这类行为变更的要求是 behavior test + source-regression test 双落地。这里 source-regression 有了,缺的是会在“没有回复/resolve 原 thread 且没有 escalation evidence”时失败的行为测试,以及对应的执行路径。 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: architect
verdict: reject
---
## Verdict
Reject: the PR adds a new review-thread completion contract, but only pins the wording with source-regression assertions and does not add behavior coverage for the new required GitHub-side completion step.
## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required state transition: "PR review comment fix" completion is incomplete until the original thread is replied to and resolved, or explicitly escalated. This is a behavior contract, not just vocabulary. CLAUDE.md requires: "行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止\"改文档没改实现\"或反之。"
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` only adds source-regression checks that the `SKILL.md` sentence exists. I found no behavior test or implementation path that verifies a PR-review-comment-driven fix cannot be considered complete until the original GitHub review thread is replied to and resolved, or explicitly escalated. CLAUDE.md also requires: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"
## What would change your verdict
Add the missing behavior enforcement and behavior test for the new completion contract. Concretely: make the controller/fix-completion path record or check the original PR review thread reply+resolution or explicit escalation before treating a PR review comment fix as complete, and add a behavior test that fails when a fix completes without that GitHub thread closure/escalation evidence. Keep the existing source-regression assertions as the documentation guard.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject
⟦AI:AUTO-LOOP⟧ |
📊 当前状态 — review 完成,等作者更新(不需要 controller 介入)
已加 🤖 controller status banner ⟦AI:AUTO-LOOP⟧ |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
已按最新 architect review 反馈更新 PR head: 本次补齐的是
请 auto-loop 在下一轮 wakeup/re-review 中重新纳入 architect verdict。 ⟦AI:AUTO-LOOP⟧ |
Summary
self-doc-comment严格 provenance,要求 issue-only refactor marker 归一化为iter/clustermarker。retry-fix,避免升级人工卡死。Test plan
python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py⟦AI:AUTO-LOOP⟧