refactor(runtime): P7 收敛 before_completion_decision 到 AcceptanceService 单一裁决 (#495) by Cai-Tang-www · Pull Request #546 · 1024XEngineer/neo-code

Cai-Tang-www · 2026-05-03T14:22:59Z

关联 Issue

Closes [P7] 将 Acceptance / Verification 收敛为 internal hook 主链闭环 #495

实现语义（与代码一致）

本 PR 将 final acceptance 收敛到 before_completion_decision 专用编排阶段：
先收集 user/repo hook signals，再由 runtime 内部 AcceptanceService 执行 completion gate + verification gate + decider 聚合，最终输出唯一 AcceptanceDecision。

变更摘要

before_completion_decision 编排收敛

顺序固定：user/repo signals -> ordinary internal observe -> AcceptanceService decision。
注释已明确：AcceptanceDecision 走强类型 runtime 内部路径，不通过通用 HookResult metadata 承载。

P7 主链单路径标识

beforeAcceptFinalLegacy 标注为 Deprecated（rollback/对照用，主链不调用）。
mapDeciderDecisionToAcceptance 标注为 Deprecated（仅 legacy 回滚对照/测试使用）。

completion + verification 双 gate

accepted 仅在 completion_passed && verification_passed 时成立。
completion/verification 任一未通过均不得 accepted。

error_class 兜底统一

AcceptanceService 对关键 stop reason 做稳定 error_class 补齐，避免 TUI/Gateway 落入空/unknown 推断歧义。
覆盖 verification_config_missing / execution_denied / execution_error / required_todo_failed / no_progress_after_final_intercept / verification_failed 相关分支。

测试

新增/增强 internal/runtime/acceptance_service_test.go：

hooks on/off parity（accepted/continue 场景终态一致）
双 gate 约束（两种单侧失败 + 双侧通过）
chat answer（“你好”）不会被 verification 误拦
no-progress 达阈值 -> incomplete + no_progress_after_final_intercept
error_class 覆盖分支

并验证：

go test ./internal/runtime -run "Test(BeforeCompletionDecisionAcceptanceHooksOnOffParity|AcceptanceDecisionRequiresCompletionAndVerification|BeforeCompletionDecisionUserRepoCannotDirectlyTerminal|ChatAnswerAcceptancePassesWithoutHeavyVerification|NoProgressThresholdProducesIncomplete|VerificationFailureProducesStopReasonAndErrorClass|NormalizeAcceptanceErrorClassCoverage)" -count=1
go test ./internal/runtime/decider ./internal/runtime/acceptance ./internal/runtime/verify ./internal/runtime/hooks -count=1

非本 PR 范围

provider 重构
external hook kinds
user/repo async
TUI 大改
FinalDecider 大拆分

…epo-first flow

…ce decision

…onsumption

…string matching

…ingle source

… event trace

…ss fallback

…ass coverage

chatgpt-codex-connector · 2026-05-03T14:23:07Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

codecov · 2026-05-03T14:25:58Z

Codecov Report

❌ Patch coverage is 80.48246% with 89 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/runtime/acceptance_service.go	71.36%	46 Missing and 15 partials ⚠️
internal/runtime/final_acceptance.go	88.11%	10 Missing and 2 partials ⚠️
internal/runtime/decider/decide.go	33.33%	9 Missing and 1 partial ⚠️
internal/runtime/before_completion_orchestrator.go	93.54%	3 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

fennoai

Found 3 behavioral regressions in the new acceptance path.

… acceptance test

…ing, max-turn signal, and decider projection

pionxe · 2026-05-04T06:36:56Z

@xgopilot

Role: 高级架构级 Code Reviewer & 技术布道师

Context

你现在需要 Review 一份关于核心运行时（Runtime）状态机重构的 GitHub PR。该系统采用了 Clean Architecture 的设计思想，包含 Gateway、TUI 和 Runtime 等层级。

Tasks

请你仔细阅读提供的 PR 描述和代码 Diff，并严格按照以下结构输出你的 Review 意见：

第一部分：通俗易懂的逻辑白话翻译 (Plain English Summary)

不要使用过多的内部黑话。请使用生动、易懂的比喻（例如：把 before_completion_decision 比作“出厂前的最后一道安检”），向非核心业务开发者解释清楚这个 PR 到底改了什么，解决了什么痛点，以及重构后的数据流向是怎样的。

第二部分：深度风险分析 (Deep Risk Analysis)

请切换到资深架构师的视角，针对以下三个核心维度，分析代码中可能存在的风险、漏洞或边界遗漏，并给出排查建议：

1. 契约与架构边界风险 (Architecture & Contracts)

审查重点：强类型 AcceptanceDecision 的引入是否彻底？编排流转（signals -> observe -> decision）的顺序是否绝对安全？
寻找风险：是否存在被提前 return 或 panic 打断导致流转短路（Bypass）的可能？旧的松散契约（如 HookResult metadata）是否还有残留依赖？各层级（特别是 Gateway/TUI）之间的防腐层边界是否被破坏？

2. 核心业务逻辑风险 (Core Logic & Gates)

审查重点：completion 与 verification 双 Gate 的布尔逻辑和执行开销，以及废弃逻辑的隔离。
寻找风险：当 completion 失败时，系统是否还会继续执行高消耗的 verification 逻辑（是否缺少提前短路）？被标记为 Deprecated 的遗留路径（如 beforeAcceptFinalLegacy）是否真的从主干执行链路上被彻底移除了？

3. 异常处理与兜底风险 (Error Handling & Fallbacks)

审查重点：AcceptanceService 对 stop reason 和 error_class 的补齐是否完备。
寻找风险：列出的 6 种 error 分支是否已穷举了当前的已知状态？如果未来新增了一种异常，或者出现未捕获的系统级错误，现在的 switch/case 是否有可靠的 default 兜底（Fallback）机制？TUI 或 Gateway 层是否会因为收到 nil 或未知的 error_class 而发生崩溃或渲染空白？

Format Constraints

语气：专业、客观、具有建设性，不要只有批评，也要肯定优秀的重构点。
格式：使用清晰的 Markdown 排版，重点风险使用加粗标出。

fennoai · 2026-05-04T06:37:06Z

结论先行

我这轮 Review 看到 3 个值得优先处理的风险点：

verification 仍然先于部分可提前判定的 decider 终态执行，存在额外开销和副作用风险。
位置：internal/runtime/acceptance_service.go:49-71，对照 internal/runtime/decider/decide.go:26-46。
现在 acceptanceService.Decide() 一上来就跑 runVerificationGate()，之后才进入 decider.Decide()。这意味着像 required_todo_failed、no_progress_after_final_intercept 这类其实只看 todo/progress 就能直接终止的场景，仍会先跑一轮 verifier。对于 command_success 一类 verifier，这不只是性能问题，还可能引入额外 shell 执行、副作用和观测噪音。建议把“纯状态即可判死刑/判 incomplete”的 precheck 前移，至少先短路 required failed、no progress exceeded 这两类路径。
error_class 的兜底归一化在 verification 失败分支里丢失了事实上下文，部分场景会被错误降级成 unknown。
位置：internal/runtime/acceptance_service.go:233 与 internal/runtime/acceptance_service.go:257-271。
mergeVerificationFailure() 最后调用 normalizeAcceptanceErrorClass(out.ErrorClass, acceptanceServiceInput{}, out)，这里传的是空 input。但 normalizeAcceptanceErrorClass() 在 verification_failed 分支里，明明依赖 input.TaskKind 和 input.Facts.Errors.ToolErrors 去补 subagent_failed、filesystem_write_file 等稳定分类。结果是：一旦 verifier 本身没给出 ErrorClass，这里就再也拿不到运行态 facts，只能回落到 unknown。这和 PR 描述里“统一补齐稳定 error_class”是有偏差的。建议把原始 input 继续传进 mergeVerificationFailure()，或者把需要的 facts 预先拷进中间结构。
user/repo hook guard 目前只有“被记录”，没有真正参与 acceptance 判定，契约语义仍偏弱。
位置：internal/runtime/before_completion_orchestrator.go:71-89、internal/runtime/decider/decide.go:615-631、测试 internal/runtime/acceptance_service_test.go:174-218。
从实现看，before_completion_decision 阶段确实先收集了 user/repo signals，但这些 signals 最终只是在 finalizeDecision() 里追加了一段 hook signals consumed (...) 到 InternalSummary。也就是说，“guard” 现在是观测语义，不是决策语义；甚至测试还明确要求 user guard 失败时依然可以 accepted。如果这是刻意设计，建议把命名从 GuardSignal 再收敛一下，避免后续维护者误以为它具备 veto 能力；如果未来想把 repo/user policy 真正接入 final acceptance，这里会成为一个很隐蔽的假接线点。

第一部分：逻辑白话翻译

这个 PR 做的事，可以把它理解成把“模型准备交卷”之前的最后流程，从“几个保安各自判断、口头传话”改成“一道固定顺序的总安检”。

以前的风险在于：completion、verification、final decider、hooks 这些信号虽然都在参与“到底能不能收尾”，但决策链比较分散，谁先说了算、信息怎么传，有不少隐式约定。现在作者把它们收敛到 before_completion_decision 这个专门阶段，相当于把所有材料先送到安检台：

先收 user/repo hooks 的意见，像是先看“外部观察员有没有提醒或警示”。
再跑 internal hooks，主要做观测记录，不直接抢裁决权。
最后把 completion gate、verification gate、todo/facts/progress、hook signals 一起交给 AcceptanceService，由它产出唯一的 AcceptanceDecision。

所以重构后的数据流更像：
模型说“我做完了” -> before_completion_decision 安检台收材料 -> AcceptanceService 给出唯一 verdict -> Runtime 再决定是 accepted / continue / incomplete / failed。

这个改动解决的核心痛点，是把“最终收尾到底是谁拍板”从散点逻辑收回到一个地方，后面 TUI、Gateway、运行态快照都只需要认这一份结构化结论，而不是自己猜 metadata、猜 stop reason、猜 error_class。

第二部分：深度风险分析

1. 契约与架构边界风险

强类型 AcceptanceDecision 的主干路径基本收拢了，但 hook guard 契约还没有真正“收口到行为”。 现在 AcceptanceDecision 不再经由通用 HookResult metadata 往外飘，这一点是对的；不过 HookGuardSignal 仍只被写入 InternalSummary，没有参与状态机分支。建议明确文档语义：它到底是“注释型 signal”还是“准入型 guard”。
顺序层面，signals -> observe -> decision 在编排上是固定的。 runBeforeCompletionDecisionAcceptance() 先 user/repo，再 internal，最后 beforeAcceptFinal()，这条链路本身没有看到被提前 return 打断的路径。hook panic/timeout 也会在 executor 内部转成 HookResultFailed，不至于把 runtime 直接打穿。
旧松散契约的残影还在，但主链没有继续依赖它。 HookResult.Metadata.GuardSignal 仍然存在，不过它只在 runtime 内部从 hook executor 传到 orchestrator，用来抽取 signal；AcceptanceDecision 本身没有再通过 metadata 透传。这块不算回退，但建议后续继续清理命名和边界，避免“看起来还是 metadata 驱动”。

2. 核心业务逻辑风险

最重要的逻辑风险就是 verification 没有对“明显终态”做提前短路。 这是本轮最实在的问题。当前顺序不是 decider precheck -> verification -> final aggregate，而是 verification -> decider。于是 required todo 已 failed、或者 no-progress 已达阈值时，还是会先跑 verifier。建议把 cheap precheck 提前，不要让高成本 verifier 为注定终止的回合买单。
双 gate 的布尔约束本身是成立的。 accepted 只有在 completionPassed && verificationPassed && deciderAccepted 时才会成立，这部分逻辑清楚，相关测试也覆盖到了。
legacy 路径从主链执行上看已经拿掉了。 run.go 现在走的是 runBeforeCompletionDecisionAcceptance() -> beforeAcceptFinal()，没有再看到主链调用 beforeAcceptFinalLegacy()。这部分隔离是有效的。

3. 异常处理与兜底风险

error_class 的总体设计方向是对的，但 verification 失败分支的上下文丢失会削弱兜底质量。 现在 6 类显式 stop reason 有稳定映射，未知失败也有 unknown 兜底，所以 TUI/Gateway 至少不至于收到空值直接发懵；但一旦 verifier 自身不给分类，原本可以从 runtime facts 推导出来的 subagent_failed / permission_denied 会被吃掉。
未来新增 stop reason 时，当前 default 兜底还能保底“不为空”，但不一定“够准”。 normalizeAcceptanceErrorClass() 最后的 fallback 会把 failed/incomplete 统一打成 unknown，这能防止客户端空白渲染，但会让运维诊断粒度下降。后续如果新增新的 verification stop reason，建议同步补 switch，不要只依赖最后的 unknown。
从客户端健壮性角度看，这次比之前安全。 至少当前实现里，failed/incomplete 大多能拿到非空 error_class；真正需要补的不是“有没有值”，而是“值是否保留了足够多的运行时语义”。

补充

我本轮没有做代码修改，只做了读码和定向验证。补跑通过的定向测试为：
go test ./internal/runtime -run 'Test(BeforeCompletionDecisionAcceptanceHooksOnOffParity|AcceptanceDecisionRequiresCompletionAndVerification|VerificationFailureProducesStopReasonAndErrorClass|NormalizeAcceptanceErrorClassCoverage|NoProgressThresholdProducesIncomplete|BeforeCompletionDecisionUserRepoCannotDirectlyTerminal)' -count=1

View job run

Cai-Tang-www · 2026-05-04T06:54:35Z

这些问题都不是P0问题，我想可以延后，都不是什么吹毛求疵的问题，只是为了后续兜底和兼容，我想我会在p6引入的时候顺手修了

Cai-Tang-www · 2026-05-04T06:55:17Z

GuardSignal 仅观测不参与裁决这些是有意为之，收敛本Pr范围

Cai-Tang-www added 8 commits May 3, 2026 16:36

refactor(runtime): orchestrate before_completion_decision with user/r…

f161e5d

…epo-first flow

refactor(runtime): feed completion hook signals into unified acceptan…

991f887

…ce decision

test(runtime): cover completion-hook orchestration order and signal c…

7573985

…onsumption

refactor(runtime): structure before_completion guard signals without …

f97a656

…string matching

refactor(runtime): route final acceptance through AcceptanceService s…

ed3bd38

…ingle source

test(runtime): cover acceptance orchestration parity and verification…

a8c172f

… event trace

fix(runtime): clarify p7 acceptance orchestration and unify error-cla…

acb9a43

…ss fallback

test(runtime): add chat-answer, hooks parity, and acceptance error-cl…

47efdbc

…ass coverage

fennoai Bot reviewed May 3, 2026

View reviewed changes

Comment thread internal/runtime/acceptance_service.go Outdated

Comment thread internal/runtime/final_acceptance.go Outdated

Comment thread internal/runtime/final_acceptance.go

Cai-Tang-www added 2 commits May 3, 2026 22:29

chore(runtime): finalize p7 comments, legacy markers, and no-progress…

9166ca9

… acceptance test

fix(runtime): address p7 acceptance regressions in verification order…

94afcfb

…ing, max-turn signal, and decider projection

pionxe merged commit 88b3347 into 1024XEngineer:main May 4, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(runtime): P7 收敛 before_completion_decision 到 AcceptanceService 单一裁决 (#495)#546

refactor(runtime): P7 收敛 before_completion_decision 到 AcceptanceService 单一裁决 (#495)#546
pionxe merged 10 commits into1024XEngineer:mainfrom
Cai-Tang-www:feat/hook-p7

Cai-Tang-www commented May 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

codecov Bot commented May 3, 2026 •

edited

Loading

Uh oh!

fennoai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pionxe commented May 4, 2026

Uh oh!

fennoai Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Cai-Tang-www commented May 4, 2026

Uh oh!

Cai-Tang-www commented May 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Cai-Tang-www commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

关联 Issue

实现语义（与代码一致）

变更摘要

测试

非本 PR 范围

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

codecov Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fennoai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pionxe commented May 4, 2026

Role: 高级架构级 Code Reviewer & 技术布道师

Context

Tasks

第一部分：通俗易懂的逻辑白话翻译 (Plain English Summary)

第二部分：深度风险分析 (Deep Risk Analysis)

Format Constraints

Uh oh!

fennoai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

结论先行

第一部分：逻辑白话翻译

第二部分：深度风险分析

1. 契约与架构边界风险

2. 核心业务逻辑风险

3. 异常处理与兜底风险

补充

Uh oh!

Cai-Tang-www commented May 4, 2026

Uh oh!

Cai-Tang-www commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cai-Tang-www commented May 3, 2026 •

edited

Loading

codecov Bot commented May 3, 2026 •

edited

Loading

fennoai Bot commented May 4, 2026 •

edited

Loading

Cai-Tang-www commented May 4, 2026 •

edited

Loading