让 review comment 与 refactor marker 冲突可收敛 by louis4li · Pull Request #402 · ChronoAIProject/consensus-rnd

louis4li · 2026-06-01T10:56:23Z

Summary

明确 PR review comment 驱动的修复必须回复/关闭原 thread 或显式升级。
保留 self-doc-comment 严格 provenance，要求 issue-only refactor marker 归一化为 iter/cluster marker。
让 deterministic marker 格式冲突走 retry-fix，避免升级人工卡死。

Test plan

python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py

⟦AI:AUTO-LOOP⟧

在 GitHub State Contract 中补充 PR review comment fix 的完成定义，明确由 review comment 驱动的修复必须回复并 resolve 原 thread，或显式升级处理，避免只完成代码提交而漏掉 GitHub 状态闭环。 ⟦AI:AUTO-LOOP⟧ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

触发来源: aevatarAI/aevatar#1624 review-gate 卡在 self-doc marker 身份格式反复 reject 行为类型: 收紧 codex-refactor-loop prompt 路由,保留 self-doc-comment 严格 provenance,把 issue-only marker 归一化列为 fix-codex in-scope 等价语义: 同类非规范 Refactor marker identity 冲突不再升级人工,由 retry-fix 携带精确归一化指令继续收敛后续复用: HOST_REFACTOR_COMMENT_POLICY=self-doc-comment 的下游可维护 iter/cluster provenance,同时避免 deterministic 文本格式冲突卡死失败痕迹归属: test_refactor_comment_policy_prompt_contract.py 覆盖 implement/review-fix/quality/reflector 契约 ⟦AI:AUTO-LOOP⟧ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

loning · 2026-06-01T11:58:15Z

🤖 测试审查：缺少 review-thread 闭环契约的回归测试

TL;DR

这是什么: PR 402 的 tests 视角独立审查。
结论是什么: reject，因为新增的 PR review comment fix 完成契约没有机械测试锁住。
下一步: 给 SKILL.md 的 GitHub State Contract 新增一条 source-regression 测试；若已有执行逻辑，也补或指向行为测试。

详细说明

这次 prompt 中关于非规范 refactor marker 归一化的变更已有测试覆盖：test_refactor_comment_policy_prompt_contract.py 会断言 META_RESOLVED:retry-fix:<exact normalization instruction> 和 “not human escalation”，本地运行也通过。

阻断点在另一处：skills/codex-refactor-loop/SKILL.md:532 新增了 GitHub State Contract，要求由 PR review comment 驱动的修复必须回复并 resolve 原 thread，或显式升级。这个是新的流程约束，但当前只出现在 SKILL.md 文本里，没有 source-regression test 或行为测试覆盖。按本仓库“行为约束必须落到机械验证手段”的规则，这需要合并前补上。

📎 完整 codex 原始输出(存档备查)

---
pr: 402
role: tests
verdict: reject
---

## Verdict
Reject: the marker-normalization prompt changes have source-regression coverage, but the new GitHub State Contract rule for PR review comment fixes has no matching test.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new completion contract: PR review comment fixes are incomplete until the original review thread is replied to and resolved, or explicitly escalated. `rg "PR review comment fix|review-thread closure|original thread|replied to and resolved"` finds this only in `SKILL.md`, with no matching source-regression or behavior test under `skills/codex-refactor-loop/scripts/`.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122` adequately locks the new deterministic marker-normalization prompt route (`META_RESOLVED:retry-fix:<exact normalization instruction>` and "not human escalation"), and `python3 -m unittest skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passes. The gap is specifically the separate review-thread closure contract added to `SKILL.md`.

## What would change your verdict
Add a focused source-regression test that reads `SKILL.md` and asserts the GitHub State Contract contains the `PR review comment fix` row with the required reply-and-resolve-or-explicitly-escalate completion semantics. If there is existing controller/comment-monitor logic that enforces review-thread closure, add or point to a behavior test for that path as well.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:reject

⟦AI:AUTO-LOOP⟧

loning · 2026-06-01T11:58:21Z

🤖 质量审查通过: PR 402 可读性与简洁性无阻塞

TL;DR

这是什么: 我按 code quality 角度复查了 PR 402 的 prompt 与 source-regression 测试改动。
结论: approve, 没有发现命名、死代码、过度抽象、复杂度或无关改动问题。
下一步: controller 可继续汇总其他 reviewer 结论。

详细说明

这组改动把“非 canonical refactor marker identity”收敛到已有流程里: implement prompt 写清 canonical marker, review-fix prompt 把确定性 marker normalization 归到现有 (A) fixable in-scope, meta-reflector 只在 stalled 时重试 fix, reviewer-quality 要求给出 exact expected marker。没有新增 helper、接口、脚本入口或第二套 parser 规则。

测试侧只扩展了 test_refactor_comment_policy_prompt_contract.py 的 source-regression 字符串断言，用来锁住 prompt contract。质量角度看这是合适的窄验证，不是未使用公共 surface。

📎 完整 codex 原始输出(存档备查)

    ---
    pr: 402
    role: quality
    verdict: approve
    ---
    
    ## Verdict
    Approve: the PR keeps the change focused on the review-comment/refactor-marker convergence rules, with readable prompt edits and no new complexity, dead code, or unrelated cleanup.
    
    ## Evidence
    - skills/codex-refactor-loop/prompts/implement.md:34: The canonical marker identity rule is stated directly at the existing self-doc policy point, so it avoids a second flow or helper vocabulary.
    - skills/codex-refactor-loop/prompts/review-fix.md:47: The fix-codex rule classifies marker normalization as the existing (A) fixable in-scope path, which is simple and avoids an unnecessary human-decision branch.
    - skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36: The stalled-route addition is a single deterministic retry rule scoped to non-canonical marker identity, not a broad new escalation mechanism.
    - skills/codex-refactor-loop/prompts/reviewer-quality.md:29: The reviewer instruction asks for the exact expected canonical marker, which keeps feedback concrete and reviewable.
    - skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122: The new source-regression test locks the new routing language without introducing unused helpers, public APIs, or duplicated parser logic.
    
    ⟦AI:AUTO-LOOP⟧
    REVIEW_DONE:402:quality:approve

⟦AI:AUTO-LOOP⟧

loning · 2026-06-01T12:00:20Z

🤖 架构审查阻塞：新增 GitHub 闭环契约缺少机械验证

TL;DR

这是什么: PR 402 的 architect 角度独立审查。
结论是什么: reject；新增 PR review comment fix 完成契约，但测试只覆盖 marker normalization prompt。
下一步: 请补 SKILL.md GitHub State Contract 行的 source-regression，若要实际驱动 thread reply/resolve，还需要对应 behavior 覆盖。

详细说明

skills/codex-refactor-loop/SKILL.md:532 新增了一个 GitHub 状态完成定义：review comment 驱动的修复必须回复并 resolve 原 thread，或显式升级。这个属于流程/行为契约，不是普通说明文字。

当前新增测试集中在 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:48-128 的 refactor marker normalization 文案，没有锁住 PR review comment fix、review-thread closure 或 GitHub State Contract 表格行。按 CLAUDE.md / AGENTS.md 的规则，行为变更必须配套 behavior test + source-regression test；治理/流程规则也必须和机械验证一起进仓库。

建议最小修复：增加一个读取 skills/codex-refactor-loop/SKILL.md 的 source-regression test，断言 PR review comment fix 行和 reply/resolve-or-escalate 语义。如果这条契约已经要被 controller/comment handling 执行，再补相应 behavior test。

📎 完整 codex 原始输出(存档备查)

---
pr: 402
role: architect
verdict: reject
---

## Verdict
reject — PR adds a new GitHub completion contract but does not add the required source-regression/behavior coverage for that SKILL.md contract change.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required visible state contract: `PR review comment fix | Completion includes review-thread closure...`; `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:48-128` only locks refactor-comment marker normalization prompt text and has no assertion for `PR review comment fix`, `review-thread closure`, or the GitHub State Contract table. This regresses CLAUDE.md / AGENTS.md: `行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止"改文档没改实现"或反之。`
- Same gap also violates the governance coupling rule in CLAUDE.md / AGENTS.md: `治理前置:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。`

## What would change your verdict
Add mechanical coverage for the new `GitHub State Contract` row. Minimum acceptable fix: a source-regression test that reads `skills/codex-refactor-loop/SKILL.md` and asserts the `PR review comment fix` row plus the closure/escalation wording. If this is intended to drive actual GitHub thread reply/resolve behavior, add the matching behavior test or wire it into the existing controller/comment handling tests so the new completion definition is executable, not prose-only.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject

⟦AI:AUTO-LOOP⟧

…规则(补 review-gate 缺口) 回应 PR#402 review-gate(architect+tests reject:新 GitHub State Contract 完成规则缺机械覆盖): test_refactor_comment_policy_prompt_contract.py 加 source-regression,锁 SKILL.md 新规则文本 + 其要求的 prompt 改动(review-comment fix 必须回复/关闭 thread 或升级、marker 归一化、format 冲突走 retry-fix)。纯加锁定测试,不改 louis4li 的产品逻辑。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

loning · 2026-06-01T12:33:39Z

🤖 tests review r2: approve

TL;DR

这是什么: PR 402 的 tests 视角二轮 review。
结论: approve, prompt-contract 变更已有对应 source-regression 覆盖。
下一步: controller 可把本轮 tests verdict 纳入 review-gate 汇总。

详细说明

这次 diff 主要改与几个 prompt 的契约文本，配套扩展了。新增测试覆盖了三块关键行为: PR review comment fix 必须闭环原始 review thread、下 marker identity 必须保持形态、deterministic marker normalization 必须走 retry-fix 而不是 human escalation。

我检查了 diff 中的测试质量风险: 没有新增 sleep/delay pacing, 没有或 manual 分类, 没有削弱已有断言。已跑目标测试: , 结果 , OK。

📎 完整 codex 原始输出(存档备查)

---
pr: 402
role: tests
verdict: approve
---

## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.

## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

loning · 2026-06-01T12:34:09Z

🤖 tests review r2: approve (corrected summary)

TL;DR

这是什么: PR 402 的 tests 视角二轮 review。
结论: approve, prompt-contract 变更已有对应 source-regression 覆盖。
需要 maintainer 做什么: 不需要动作；controller 可纳入 review-gate 汇总。

详细说明

上一条评论的 raw artifact 完整，但正文里的 inline code 被 shell 展开吃掉了；这里补一条干净版。

这次 diff 主要改 SKILL.md 与几个 prompt 的契约文本，配套扩展了 test_refactor_comment_policy_prompt_contract.py。新增测试覆盖三块关键行为: PR review comment fix 必须闭环原始 review thread；self-doc-comment 下 marker identity 必须保持 Refactor (iter${ITERATION}/${CLUSTER_ID}) 形态；deterministic marker normalization 必须走 META_RESOLVED:retry-fix，不是 human escalation。

我检查了 diff 中的测试质量风险: 没有新增 sleep/delay pacing，没有 [Skip] 或 manual 分类，没有削弱已有断言。已跑目标测试: python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py，结果 Ran 7 tests in 0.002s, OK。

📎 完整 codex 原始输出(存档备查)

---
pr: 402
role: tests
verdict: approve
---

## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.

## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

loning · 2026-06-01T12:34:12Z

🤖 Architect review: 阻塞，新增 completion contract 缺 behavior guard

TL;DR

这是什么: PR 402 的 architect 复审，检查 CLAUDE.md/AGENTS.md 架构约束。
结论是什么: reject，SKILL.md 新增 PR review thread completion 规则，但目前只有文字断言。
下一步: implement codex 需要补实际 completion enforcement 和 behavior test，再保留现有 source-regression test。

详细说明

这次新增的规则是行为 contract: review comment 驱动的 fix 不能只靠本地 FIX_DONE 或下一轮 reviewer 通过就算完成，还必须回复并 resolve 原 GitHub review thread，或者显式升级。现在 diff 只在 test_refactor_comment_policy_prompt_contract.py 里断言 SKILL.md 有这句话，能防文档丢失，但不能证明 controller/fix-completion 路径真的会检查 GitHub thread closure。

CLAUDE.md 对这类行为变更的要求是 behavior test + source-regression test 双落地。这里 source-regression 有了，缺的是会在“没有回复/resolve 原 thread 且没有 escalation evidence”时失败的行为测试，以及对应的执行路径。

📎 完整 codex 原始输出(存档备查)

---
pr: 402
role: architect
verdict: reject
---

## Verdict
Reject: the PR adds a new review-thread completion contract, but only pins the wording with source-regression assertions and does not add behavior coverage for the new required GitHub-side completion step.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required state transition: "PR review comment fix" completion is incomplete until the original thread is replied to and resolved, or explicitly escalated. This is a behavior contract, not just vocabulary. CLAUDE.md requires: "行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止\"改文档没改实现\"或反之。"
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` only adds source-regression checks that the `SKILL.md` sentence exists. I found no behavior test or implementation path that verifies a PR-review-comment-driven fix cannot be considered complete until the original GitHub review thread is replied to and resolved, or explicitly escalated. CLAUDE.md also requires: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"

## What would change your verdict
Add the missing behavior enforcement and behavior test for the new completion contract. Concretely: make the controller/fix-completion path record or check the original PR review thread reply+resolution or explicit escalation before treating a PR review comment fix as complete, and add a behavior test that fails when a fix completes without that GitHub thread closure/escalation evidence. Keep the existing source-regression assertions as the documentation guard.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject

⟦AI:AUTO-LOOP⟧

loning · 2026-06-01T16:38:44Z

📊 当前状态 — review 完成,等作者更新(不需要 controller 介入)

维度	值
阶段	reviewing(自动 review 已完成)
自动 review 结论	architect: reject(要求为新增 GitHub 闭环 completion 契约补 behavior test / source-regression test,见上方 architect 评审评论) · tests: approve
PR 来源	fork(`louis4li:fix/pr-review-thread-completion-gate`),CI 全绿,base `dev`
为什么 loop 不自动派 fix	该 PR head 在 fork 分支,auto-loop 无法 push 修复到 fork;补 behavior guard 需作者在 fork 上更新,或 maintainer 决定接管到 loop 自有分支
是否需要人介入	可选:作者按 architect 反馈补 behavior/source-regression test 后推到该分支即自动 re-review;或 maintainer 指示接管

已加 crnd:lifecycle:managed 以纳入评论监控。作者更新 head 后下一轮 wakeup 自动重新派 3 reviewer。

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

louis4li · 2026-06-02T06:56:02Z

已按最新 architect review 反馈更新 PR head：cfc25723eb1b2b5cc9a06c878ba40abcf82ed1eb。

本次补齐的是 PR review comment fix completion contract 的行为门禁，不再只依赖 SKILL.md/source-regression 文本：

增加 review-thread closure 的 behavior enforcement / guard 覆盖；
保留并配套 source-regression 覆盖；
对应提交：cfc25723 fix(skill): 补 review thread 闭环行为门禁。

请 auto-loop 在下一轮 wakeup/re-review 中重新纳入 architect verdict。

⟦AI:AUTO-LOOP⟧

louis4li and others added 2 commits June 1, 2026 15:29

louis4li force-pushed the fix/pr-review-thread-completion-gate branch from 4d5aad1 to aaaf4be Compare June 1, 2026 11:02

loning added crnd:lifecycle:managed loop-managed item crnd:phase:fixing fix codex in flight crnd:human:auto auto-advancing, no human needed crnd:phase:reviewing review-gate reviewers in flight and removed crnd:phase:fixing fix codex in flight labels Jun 1, 2026

fix(skill): 补 review thread 闭环行为门禁

cfc2572

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

让 review comment 与 refactor marker 冲突可收敛#402

让 review comment 与 refactor marker 冲突可收敛#402
louis4li wants to merge 4 commits into
ChronoAIProject:devfrom
louis4li:fix/pr-review-thread-completion-gate

louis4li commented Jun 1, 2026 •

edited

Loading

Uh oh!

loning commented Jun 1, 2026

Uh oh!

loning commented Jun 1, 2026 •

edited

Loading

Uh oh!

loning commented Jun 1, 2026

Uh oh!

loning commented Jun 1, 2026

Uh oh!

loning commented Jun 1, 2026

Uh oh!

loning commented Jun 1, 2026

Uh oh!

loning commented Jun 1, 2026

Uh oh!

louis4li commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

louis4li commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

loning commented Jun 1, 2026

🤖 测试审查：缺少 review-thread 闭环契约的回归测试

TL;DR

详细说明

Uh oh!

loning commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 质量审查通过: PR 402 可读性与简洁性无阻塞

TL;DR

详细说明

Uh oh!

loning commented Jun 1, 2026

🤖 架构审查阻塞：新增 GitHub 闭环契约缺少机械验证

TL;DR

详细说明

Uh oh!

loning commented Jun 1, 2026

🤖 tests review r2: approve

TL;DR

详细说明

Uh oh!

loning commented Jun 1, 2026

🤖 tests review r2: approve (corrected summary)

TL;DR

详细说明

Uh oh!

loning commented Jun 1, 2026

🤖 Architect review: 阻塞，新增 completion contract 缺 behavior guard

TL;DR

详细说明

Uh oh!

loning commented Jun 1, 2026

📊 当前状态 — review 完成,等作者更新(不需要 controller 介入)

Uh oh!

louis4li commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

louis4li commented Jun 1, 2026 •

edited

Loading

loning commented Jun 1, 2026 •

edited

Loading