Skip to content

cluster-024: ChatStreamAsync as the only AI executor#687

Merged
loning merged 5 commits into
auto-refact-devfrom
refactor/iter15-cluster-024-chatasync
May 19, 2026
Merged

cluster-024: ChatStreamAsync as the only AI executor#687
loning merged 5 commits into
auto-refact-devfrom
refactor/iter15-cluster-024-chatasync

Conversation

@loning
Copy link
Copy Markdown
Contributor

@loning loning commented May 19, 2026

Summary

iter15 cluster-024 (high severity, AG-AI-STREAM-01).

  • Old: ChatRuntime/ToolCallLoop/AIGAgentBase/Classifier/Studio Generate/ContextCompressor kept non-streaming ChatAsync paths invoking provider.ChatAsync directly.
  • New: ChatStreamAsync is the only authoritative AI executor; offline text aggregation consumes the stream as an explicit adapter; ILLMProvider.ChatAsync remains provider-boundary only with forbidding-comment for upper layers.

Violated AGENTS.md mandate "AI 对话主链必须流式化".

Scope

12 files changed (+242/-111). Targeted ChatbotClassifier / ContextCompressor / ToolCallLoop tests pass. architecture_guards + test_stability_guards green.

See implement-cluster-024 summary and iter15 audit.

Stacked-PR

Part of iter15 batch A (with #PR-028 and #PR-029). All target the iter1-14 integration branch refactor/2026-05-19_auto-refactor-trial which has rollup PR #678 to dev.

🤖 Auto-loop produced by codex-refactor-loop skill

Refactor (iter15/cluster-024):
- Old pattern: ChatRuntime/ToolCallLoop/AIGAgentBase/Classifier/Studio
  Generate/ContextCompressor kept a non-streaming ChatAsync surface that
  directly invoked provider.ChatAsync. Violated AGENTS.md mandate
  "ChatStreamAsync 必须是 AI 对话主链唯一权威执行入口;不得用于 CLI /
  AGUI / Scope Service / NyxID Chat / Workflow Chat 等任何面向用户的
  实时会话入口".
- New principle: ChatStreamAsync is the only authoritative AI executor;
  offline text aggregation consumes the streaming path as an explicit
  adapter; ILLMProvider.ChatAsync remains provider-boundary only with a
  comment forbidding application/host re-use.

12 scoped files staged. Verified: 4+ ChatbotClassifier tests pass,
architecture_guards + test_stability_guards green.

Co-Authored-By: codex (gpt-5) <noreply@openai.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.48%. Comparing base (601d92b) to head (ee0b88b).
⚠️ Report is 15 commits behind head on auto-refact-dev.

@@                 Coverage Diff                 @@
##           auto-refact-dev     #687      +/-   ##
===================================================
+ Coverage            82.42%   82.48%   +0.06%     
===================================================
  Files                  938      939       +1     
  Lines                59753    59735      -18     
  Branches              7831     7837       +6     
===================================================
+ Hits                 49251    49275      +24     
+ Misses                7128     7086      -42     
  Partials              3374     3374              
Flag Coverage Δ
ci 82.48% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...vatar.AI.Abstractions/LLMProviders/ILLMProvider.cs 100.00% <ø> (ø)
src/Aevatar.AI.Core/AIGAgentBase.cs 84.35% <ø> (+2.46%) ⬆️
src/Aevatar.AI.Core/Chat/ChatRuntime.cs 90.41% <100.00%> (-0.03%) ⬇️
...evatar.AI.Core/Chat/ChatStreamContentAggregator.cs 100.00% <100.00%> (ø)
src/Aevatar.AI.Core/Chat/ContextCompressor.cs 87.50% <100.00%> (+0.15%) ⬆️
src/Aevatar.AI.Core/Tools/ToolCallLoop.cs 89.46% <100.00%> (+0.24%) ⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@loning loning changed the base branch from refactor/2026-05-19_auto-refactor-trial to dev May 19, 2026 07:28
@loning
Copy link
Copy Markdown
Contributor Author

loning commented May 19, 2026

🤖 Multi-codex review (v2, round 1) — Phase 8

Verdicts | 三组独立 codex 评审结论

Reviewer Verdict
Architect (CLAUDE/AGENTS clause compliance) comment
Tests (coverage + quality) reject
Quality (readability + simplicity) comment

Not unanimous approve → enter fix-retry loop (per Phase 8 policy, max 3 rounds before human escalation).


English

The 3 reviewer codex ran independently against origin/dev...origin/refactor/iter15-cluster-024-chatasync (three-dot diff = symmetric difference from merge-base, matching what GitHub PR shows).

  • Architect (comment): no CLAUDE/AGENTS clause violation; cluster ChatStreamAsync refactor is compliant. Comments are non-blocking observations.
  • Quality (comment): refactor self-doc comments present; suggests extracting the repeated StringBuilder + await foreach ChatStreamAsync aggregation into a small named helper (now appears in ≥5 production sites). Not blocking.
  • Tests (reject): ScriptGenerateGAgent.GenerateAsync and WorkflowGenerateGAgent.GenerateAsync changed from ChatAsync(...) to GenerateWithReasoningAsync(...) but no matching ScriptGenerateGAgentTests.cs or WorkflowGenerateGAgentTests.cs exercises the new path. Demand: provider test double whose ChatAsync throws and whose ChatStreamAsync yields split chunks; assert concatenated stream result.

Next action (automatic): fix codex round 1 dispatched to add the missing tests on a worktree off this PR's HEAD. Round result will be posted as a follow-up comment.

Full local artifacts (logs, reviewer outputs, fix report): .refactor-loop/runs/review-pr687-*.md, .refactor-loop/logs/v2-review-pr687-*.log.


中文

3 个独立 reviewer codex 在 origin/dev...origin/refactor/iter15-cluster-024-chatasync 三点 diff (= 从 merge-base 起的对称差,与 GitHub PR 视图一致) 上跑出结论。

  • Architect (comment): 没有 CLAUDE/AGENTS 条款违反,cluster ChatStreamAsync 重构符合规范。所有 comment 都是非阻塞观察。
  • Quality (comment): refactor 自文档注释齐全;建议把重复的 StringBuilder + await foreach ChatStreamAsync 聚合代码 (现已在生产代码 ≥5 处出现) 抽成小的命名 helper。不阻塞合并。
  • Tests (reject): ScriptGenerateGAgent.GenerateAsyncWorkflowGenerateGAgent.GenerateAsyncChatAsync(...) 改为 GenerateWithReasoningAsync(...),但没有对应的 ScriptGenerateGAgentTests.csWorkflowGenerateGAgentTests.cs 覆盖新路径。要求:用 provider test double,其 ChatAsync 抛异常、ChatStreamAsync 按片段 yield,然后断言返回内容是拼接结果。

下一步 (自动): fix codex round 1 已 dispatch 到本 PR HEAD 的 worktree 上,补缺失测试。round 结果会作为后续评论 post。

本地完整产物(日志、reviewer 输出、fix 报告): .refactor-loop/runs/review-pr687-*.md, .refactor-loop/logs/v2-review-pr687-*.log

@loning loning added the auto-loop-fixing Phase 8 fix codex round in flight (AI iterating to consensus) label May 19, 2026
Applied 4 fixes (FIX_DONE:687:round-1:applied-4:rejected-0:blocked-0):
- (A) architect: replace one-line note on ILLMProvider.ChatAsync with the
  required Refactor (iter15/cluster-024) Old/New comment, clarifying
  provider ChatAsync is boundary compatibility only; formal entrypoints
  call ChatStreamAsync
- (A) quality: add cluster rationale to spawn-codex.sh strict-mode
  empty-array guard (why this script touch belongs in cluster-024)
- (B) quality SCOPE_EXTEND: extract repeated StringBuilder+ChatStreamAsync
  aggregation into new ChatStreamContentAggregator helper (Aevatar.AI.Core).
  Route classifier, ChatRuntime, ContextCompressor, ToolCallLoop, and
  Studio GenerateAsync through it. Reasoning-callback loops preserved
  as exceptional path
- (B) tests SCOPE_EXTEND: add StudioGenerateGAgentStreamingTests proving
  ScriptGenerateGAgent + WorkflowGenerateGAgent GenerateAsync concatenate
  split ChatStreamAsync chunks, never call provider-boundary ChatAsync,
  return empty string on empty streams

Build pass; targeted tests pass (3 Studio + 91 AI Core stream-touching);
test_stability_guards pass; diff --check pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@loning
Copy link
Copy Markdown
Contributor Author

loning commented May 19, 2026

🤖 Fix codex round 1 — FIX_DONE:687:round-1:applied-4:rejected-0:blocked-0

Pushed: 0ceb4812

English

Fix codex addressed reviewer demands AND took quality reviewer's helper-extraction suggestion (B SCOPE_EXTEND):

  • (A) Architect comment: replaced one-line note on ILLMProvider.ChatAsync with the required // Refactor (iter15/cluster-024): Old/New comment, clarifying provider ChatAsync is boundary compatibility only; formal entrypoints call ChatStreamAsync.
  • (A) Quality comment: added cluster rationale to spawn-codex.sh strict-mode empty-array guard (explains why this script touch is in cluster-024's scope).
  • (B SCOPE_EXTEND) Quality comment: extracted repeated StringBuilder + await foreach ChatStreamAsync aggregation into new Aevatar.AI.Core.Chat.ChatStreamContentAggregator helper. Routed classifier, ChatRuntime, ContextCompressor, ToolCallLoop, and Studio GenerateAsync through it. Reasoning-callback loops preserved as exceptional path.
  • (B SCOPE_EXTEND) Tests reject: added test/Aevatar.Studio.Tests/StudioGenerateGAgentStreamingTests.cs proving ScriptGenerateGAgent + WorkflowGenerateGAgent GenerateAsync concatenate split ChatStreamAsync chunks, never call provider-boundary ChatAsync, and return empty string on empty streams.

Validation: build pass; targeted tests pass (3 Studio + 91 ChatRuntime/ToolCallLoop/ContextCompressor/Classifier); test_stability_guards.sh pass; git diff --check pass.

Diff impact: net −43 lines (28 inserted / 71 removed) — refactor consolidation reduced overall line count even with new helper + test file.

Next action (automatic): re-dispatch 3 reviewer codex against new HEAD 0ceb4812. Round 2 verdicts coming.

Full local artifact: .refactor-loop/runs/fix-pr687-round-1.md.


中文

fix codex 处理掉 reviewer 需求,并采纳 quality reviewer 关于抽取 helper 的建议(B SCOPE_EXTEND):

  • (A) Architect comment: 把 ILLMProvider.ChatAsync 上的一行说明替换为完整 // Refactor (iter15/cluster-024): 旧/新注释,明确 provider ChatAsync 只是边界兼容;正式入口走 ChatStreamAsync
  • (A) Quality comment: 给 spawn-codex.sh 严格模式空数组保护加 cluster 解释(说明为何这处脚本改动属于 cluster-024 的范围)。
  • (B SCOPE_EXTEND) Quality comment: 把重复的 StringBuilder + await foreach ChatStreamAsync 聚合代码抽出为新的 Aevatar.AI.Core.Chat.ChatStreamContentAggregator helper,让 classifier、ChatRuntime、ContextCompressor、ToolCallLoop、Studio GenerateAsync 共用。reasoning 回调流程作为例外路径保留。
  • (B SCOPE_EXTEND) Tests reject: 新增 test/Aevatar.Studio.Tests/StudioGenerateGAgentStreamingTests.cs,证明 ScriptGenerateGAgentWorkflowGenerateGAgentGenerateAsync 会拼接分片 ChatStreamAsync 输出、从不调用 provider 边界 ChatAsync、空流时返回空串。

验证: 构建通过; targeted 测试通过(3 个 Studio + 91 个 ChatRuntime/ToolCallLoop/ContextCompressor/Classifier); test_stability_guards.sh 通过; git diff --check 通过。

diff 影响: 净 −43 行(新增 28 / 删除 71)— 重构整合让总行数下降,即便加了新 helper 与测试文件。

下一步 (自动): 重新派 3 个 reviewer codex 对新 HEAD 0ceb4812 复评。Round 2 结论会作为后续评论 post。

本地完整产物: .refactor-loop/runs/fix-pr687-round-1.md

@loning loning added auto-loop-reviewing Phase 8 reviewer codex round in flight and removed auto-loop-fixing Phase 8 fix codex round in flight (AI iterating to consensus) labels May 19, 2026
@loning
Copy link
Copy Markdown
Contributor Author

loning commented May 19, 2026

🔄 Multi-codex review round 2 (v3) — fix codex round 2 dispatched

Verdicts | 复评结论

Reviewer Round 1 (v2) Round 2 (v3) Change
Architect comment comment unchanged (non-blocking)
Tests reject reject NEW demand surfaced
Quality comment comment unchanged (non-blocking)

Not unanimous → enter fix-retry round 2 (max 3 rounds per Phase 8 policy).


English

Round 1 reject was addressed

v2 tests reviewer demanded ScriptGenerateGAgentTests + WorkflowGenerateGAgentTests. Fix codex round 1 added StudioGenerateGAgentStreamingTests.cs. v3 tests reviewer confirms: "Added tests in ChatRuntimeStreamingBufferTests.cs and StudioGenerateGAgentStreamingTests.cs do assert real outcomes." — round 1 demand resolved.

Round 2 NEW demand (not anti-spiral — different evidence)

v3 tests reviewer identified a new gap exposed by the ChatStreamContentAggregator helper extraction:

  • ToolCallLoop.cs:292 now sets LLMCallContext.IsStreaming = true, but no ToolCallLoopTests.cs assertion verifies middleware observes this flag. The line could regress to false and existing tests would still pass because the test provider's ChatAsync path remains functional.
  • Demand: install an ILLMCallMiddleware, run ExecuteAsync, assert middleware.observed.IsStreaming.Should().BeTrue() while provider response aggregates from ChatStreamAsync. Also make test provider's ChatAsync throw to lock the boundary.

This is healthy iteration (fix round 1 extracted a helper which exposed a new middleware contract that needs its own test). Fix codex round 2 dispatched to add the missing ToolCallLoop middleware test.


中文

Round 1 reject 已解决

v2 tests reviewer 要求 ScriptGenerateGAgentTests + WorkflowGenerateGAgentTests。fix codex round 1 加上 StudioGenerateGAgentStreamingTests.cs。v3 tests reviewer 确认: "ChatRuntimeStreamingBufferTests.csStudioGenerateGAgentStreamingTests.cs 里加的测试都断言真实结果。" — round 1 需求已解决。

Round 2 NEW demand (非死循环 — 是不同的证据)

v3 tests reviewer 识别出一个被 ChatStreamContentAggregator helper 抽取暴露出来的新缺口:

  • ToolCallLoop.cs:292 现在设 LLMCallContext.IsStreaming = true,但 ToolCallLoopTests.cs 里没有断言验证 middleware 观察到这个 flag。这一行如果回退到 false,现有测试仍会通过(因为 test provider 的 ChatAsync 路径仍可用)。
  • 要求: 装一个 ILLMCallMiddleware,跑 ExecuteAsync,断言 middleware.observed.IsStreaming.Should().BeTrue() 且 provider 响应来自 ChatStreamAsync 聚合。同时让 test provider 的 ChatAsync 抛异常,锁死边界。

这是健康的迭代(round 1 抽取的 helper 暴露了新的 middleware contract,需要单独测试覆盖)。fix codex round 2 已 dispatch 补这个 ToolCallLoop middleware 测试。

@loning loning added auto-loop-fixing Phase 8 fix codex round in flight (AI iterating to consensus) and removed auto-loop-reviewing Phase 8 reviewer codex round in flight labels May 19, 2026
Applied 3 fixes (FIX_DONE:687:round-2:applied-3:rejected-0:blocked-0):
- (A) architect: add Refactor (iter15/cluster-024) Old/New marker on
  ChatStreamContentAggregator.AggregateResponseAsync, documenting
  stream-derived LLMResponse aggregation as the replacement for direct
  provider ChatAsync response execution
- (A) tests v3 NEW demand: add ToolCallLoop regression test installing
  ILLMCallMiddleware, asserting context.IsStreaming == true and that
  response is aggregated from ChatStreamAsync. Test provider's ChatAsync
  throws to lock the boundary (catches regression if line 292's
  IsStreaming=true is reverted)
- (A) quality v2 carried-over: revert unrelated spawn-codex.sh strict-mode
  change out of this PR (round 1 added explanation; round 2 actually
  reverts per quality reviewer's intent — the fix lives on auto-refact-dev)

Build pass; Aevatar.AI.Tests project full pass; test_stability_guards pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@loning
Copy link
Copy Markdown
Contributor Author

loning commented May 19, 2026

🤖 Fix codex round 2 — FIX_DONE:687:round-2:applied-3:rejected-0:blocked-0

Pushed: 911d38b2

English

Round 2 addressed all v3 (round-1-review) demands:

  • (A) Tests v3 reject (NEW demand from round 1 helper extraction): added ToolCallLoopTests regression test with ILLMCallMiddleware asserting context.IsStreaming.Should().BeTrue() AND that response aggregates from ChatStreamAsync. Test provider's ChatAsync throws so the boundary is locked — if ToolCallLoop.cs:292 ever regresses IsStreaming = truefalse, this test fails immediately.
  • (A) Architect v3 comment: added // Refactor (iter15/cluster-024): Old/New marker on ChatStreamContentAggregator.AggregateResponseAsync (the new helper extracted in round 1 needed its own self-doc, which round 1 missed).
  • (A) Quality v2 carried-over: actually reverted unrelated spawn-codex.sh strict-mode change out of this PR. Round 1 had added an explanation comment instead; quality reviewer's intent was a clean cluster scope, so round 2 reverts the file to origin/dev. The fix continues to live on auto-refact-dev where it belongs.

Validation: build pass; full Aevatar.AI.Tests project pass; test_stability_guards.sh pass.

Next action (automatic): re-dispatch 3 reviewer codex against new HEAD 911d38b2. Round 3 verdicts coming.


中文

Round 2 处理了所有 v3 (round-1-review) 需求:

  • (A) Tests v3 reject (round 1 抽取 helper 暴露的新需求): 在 ToolCallLoopTests 加回归测试,装 ILLMCallMiddleware、断言 context.IsStreaming.Should().BeTrue()、且响应来自 ChatStreamAsync 聚合。test provider 的 ChatAsync 抛异常以锁死边界 — 如果 ToolCallLoop.cs:292IsStreaming = true 回退成 false,这个测试立刻挂。
  • (A) Architect v3 comment: 在 ChatStreamContentAggregator.AggregateResponseAsync 上补 // Refactor (iter15/cluster-024): 旧/新注释 (round 1 抽出的新 helper 缺自文档,round 1 没补)。
  • (A) Quality v2 遗留项: 真正把无关的 spawn-codex.sh 严格模式改动回退出本 PR。round 1 是加了解释注释,但 quality reviewer 的本意是 cluster scope 干净,所以 round 2 把文件回退到 origin/dev。修复本身继续在 auto-refact-dev 上,属其所归。

验证: 构建通过; Aevatar.AI.Tests 项目全过; test_stability_guards.sh 通过。

下一步 (自动): 重新派 3 个 reviewer codex 对新 HEAD 911d38b2 复评。Round 3 结论稍后。

@loning loning added auto-loop-reviewing Phase 8 reviewer codex round in flight and removed auto-loop-fixing Phase 8 fix codex round in flight (AI iterating to consensus) labels May 19, 2026
@loning
Copy link
Copy Markdown
Contributor Author

loning commented May 19, 2026

🔄 Multi-codex review round 3 (v4) — fix codex round 3 dispatched (LAST round per max budget)

Verdicts | 复评结论

Reviewer v2 (round 1) v3 (round 2) v4 (round 3) Trend
Architect comment comment comment refined: now nits on Refactor comment accuracy
Tests reject reject comment reject → comment ✅ ; new ask: extend source-regression scan
Quality comment comment approve ChatStreamContentAggregator extraction landed

1 approve + 2 actionable comments → enter fix-retry round 3 (last round per max_fix_rounds=3). If round 3 doesn't produce unanimous approve → escalate human per Phase 8 anti-spiral safeguards.


English

Round 3 demands (small, concrete, in-scope)

  1. Architect — Refactor comments accuracy:
    • AIGAgentBase.cs: protected ChatAsync helper overloads were deleted but no nearby // Refactor (iter15/cluster-024): comment documents the deleted surface. Add one.
    • ScriptGenerateGAgent.cs:57 and WorkflowGenerateGAgent.cs:57: existing Refactor comment says "old pattern was non-streaming ChatAsync directly called provider.ChatAsync" — but GenerateWithReasoningAsync was already streaming on origin/dev. Either remove the comment or rewrite it to describe the real refactor (cluster-024 made it route via shared aggregator, not via direct ChatAsync).
  2. Tests — source-regression scan widening:
    • ChatRuntimeStreamingBufferTests.cs:390 source-regression assertion only scans src/Aevatar.AI.Core, but the cluster also removed formal-entrypoint ChatAsync use from src/Aevatar.Studio.Hosting and agents/Aevatar.GAgents.ChatbotClassifier. A future regression in those paths wouldn't be caught.
    • Extend scan to all three paths; keep provider boundary abstraction excluded.

Quality already approves (specifically calls out the ChatStreamContentAggregator extraction landed cleanly + size/naming all good).

Fix codex round 3 dispatched to address both. If round 3 succeeds → unanimous approve → auto-merge. If round 3 fails to flip the comments to approve → controller will escalate via needs-human-review label per Phase 8 anti-spiral safeguards.


中文

Round 3 需求(都小、具体、in-scope)

  1. Architect — Refactor 注释准确性:
    • AIGAgentBase.cs: protected ChatAsync 重载被删,但附近没有 // Refactor (iter15/cluster-024): 注释记录已删的 surface。补一个。
    • ScriptGenerateGAgent.cs:57 / WorkflowGenerateGAgent.cs:57: 现有 Refactor 注释说"老模式是非流式 ChatAsync 直接调 provider.ChatAsync" — 但 GenerateWithReasoningAsyncorigin/dev 上本来就是流式的。要么删该注释,要么改写为描述真实重构 (cluster-024 让它走共享 aggregator,而不是直接走 ChatAsync)。
  2. Tests — source-regression 扫描范围扩大:
    • ChatRuntimeStreamingBufferTests.cs:390 的 source-regression 断言只扫 src/Aevatar.AI.Core,但本 cluster 也从 src/Aevatar.Studio.Hostingagents/Aevatar.GAgents.ChatbotClassifier 移除了正式入口对 ChatAsync 的调用。这些路径未来若回退,扫描捕不到。
    • 扫描扩展到这三处;保持 provider 边界抽象豁免。

Quality 已 approve(明确提到 ChatStreamContentAggregator 抽取干净,大小/命名都好)。

fix codex round 3 已派出处理上述两项。round 3 成功 → unanimous approve → auto-merge。失败 → controller 按 Phase 8 anti-spiral 升级 needs-human-review label。

@loning loning added auto-loop-fixing Phase 8 fix codex round in flight (AI iterating to consensus) and removed auto-loop-reviewing Phase 8 reviewer codex round in flight labels May 19, 2026
…— last round

Applied 4 fixes (FIX_DONE:687:round-3:applied-4:rejected-0:blocked-0):
- (A) architect: add missing Refactor (iter15/cluster-024) anchor on
  AIGAgentBase.cs:202 documenting the deleted protected ChatAsync helper
  surface (subclasses now use ChatStreamAsync + local aggregation only
  when needed)
- (A) architect: remove inaccurate Refactor comment from
  ScriptGenerateGAgent.GenerateWithReasoningAsync — that method was
  already streaming on origin/dev. Only GenerateAsync keeps the
  direct-ChatAsync refactor comment
- (A) architect: same removal on WorkflowGenerateGAgent.GenerateWithReasoningAsync
- (A) tests: widen ChatRuntimeStreamingBufferTests source-regression
  scan from src/Aevatar.AI.Core only to: src/Aevatar.AI.Core +
  src/Aevatar.Studio.Hosting + agents/Aevatar.GAgents.ChatbotClassifier,
  ignoring provider-boundary abstractions and comment-only lines

Build pass; Aevatar.AI.Tests 558 pass; Aevatar.Studio.Tests 521 pass;
test_stability_guards pass; git diff --check pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@loning
Copy link
Copy Markdown
Contributor Author

loning commented May 19, 2026

🤖 Fix codex round 3 — FIX_DONE:687:round-3:applied-4:rejected-0:blocked-0 (LAST round)

Pushed: 7acfda4b

English

Round 3 addressed all v4 (round-2-review) actionable comments. Diff is tiny: 4 files, +12 / -9.

  • Architect comment Refactor/project namespace #1: added missing // Refactor (iter15/cluster-024): anchor on AIGAgentBase.cs:202 documenting that the removed protected ChatAsync helpers used to allow GAgents to use the non-streaming executor as a formal conversation path; subclasses now use ChatStreamAsync with explicit local aggregation only when needed.
  • Architect comment Feature/cqrs projection suite #2 (×2): removed inaccurate Old/New comments from ScriptGenerateGAgent.GenerateWithReasoningAsync and WorkflowGenerateGAgent.GenerateWithReasoningAsync — those methods were already streaming on origin/dev. Only the GenerateAsync methods (which previously called direct ChatAsync) keep the refactor comment.
  • Tests comment: widened the source-regression assertion in ChatRuntimeStreamingBufferTests.cs:390 from scanning only src/Aevatar.AI.Core to scanning all three user-facing AI executor surfaces touched by this cluster: src/Aevatar.AI.Core + src/Aevatar.Studio.Hosting + agents/Aevatar.GAgents.ChatbotClassifier. Provider-boundary abstractions stay excluded; comment-only lines ignored.

Validation: build pass; Aevatar.AI.Tests 558/558 pass; Aevatar.Studio.Tests 521/521 pass; test_stability_guards.sh pass; git diff --check pass.

Next action (automatic): re-dispatch 3 reviewer codex against new HEAD 7acfda4b. Round 4 verdicts coming. If unanimous → auto-merge. If still <unanimous → controller will escalate via needs-human-review label per Phase 8 anti-spiral safeguards (this is round 3 of 3 max).


中文

Round 3 处理了所有 v4 (round-2-review) 的可操作 comment。diff 极小:4 个文件,+12 / -9。

  • Architect comment Refactor/project namespace #1: 在 AIGAgentBase.cs:202 加上缺失的 // Refactor (iter15/cluster-024): 锚点,说明已删的 protected ChatAsync helper 过去允许 GAgent 把非流式 executor 当作正式对话入口;subclass 现在只在需要时用 ChatStreamAsync + 局部聚合。
  • Architect comment Feature/cqrs projection suite #2 (×2): 把 ScriptGenerateGAgent.GenerateWithReasoningAsyncWorkflowGenerateGAgent.GenerateWithReasoningAsync 上不准确的 Old/New 注释删掉 — 那两个方法在 origin/dev 上本来就是流式的。只有 GenerateAsync (之前直接调 ChatAsync) 留 Refactor 注释。
  • Tests comment: 把 ChatRuntimeStreamingBufferTests.cs:390 的 source-regression 断言从只扫 src/Aevatar.AI.Core 扩展到本 cluster 涉及的三个 user-facing AI executor surface: src/Aevatar.AI.Core + src/Aevatar.Studio.Hosting + agents/Aevatar.GAgents.ChatbotClassifier。provider 边界抽象豁免;注释行忽略。

验证: 构建通过; Aevatar.AI.Tests 558/558 通过; Aevatar.Studio.Tests 521/521 通过; test_stability_guards.sh 通过; git diff --check 通过。

下一步 (自动): 重新派 3 个 reviewer codex 对新 HEAD 7acfda4b 复评。Round 4 结论稍后。unanimous → auto-merge。仍 <unanimous → 按 Phase 8 anti-spiral 升级 needs-human-review label (本轮是 3 max 中的 round 3)。

@loning loning added auto-loop-reviewing Phase 8 reviewer codex round in flight and removed auto-loop-fixing Phase 8 fix codex round in flight (AI iterating to consensus) labels May 19, 2026
@loning
Copy link
Copy Markdown
Contributor Author

loning commented May 19, 2026

✅ Multi-codex consensus reached (round 4) — auto-merging after CI + dev rebase

Verdicts | 4 轮复评结论

Reviewer v2 (r1) v3 (r2) v4 (r3) v5 (r4)
Architect comment comment comment approve
Tests reject reject comment approve
Quality reject comment approve approve

Unanimous approve after 3 fix rounds + 4 review rounds. This is the maximum fix-round budget per Phase 8 policy and consensus landed within it.


English

Round-by-round summary

  • Fix round 1 (4 fixes): addressed quality reviewer's ChatStreamContentAggregator extraction suggestion (5 production sites now share one helper); added StudioGenerateGAgentStreamingTests for the tests-reviewer reject demand
  • Fix round 2 (3 fixes): addressed v3 tests reject (new demand surfaced by round-1 helper extraction): added ToolCallLoop middleware test asserting IsStreaming == true + reverted unrelated spawn-codex.sh drive-by
  • Fix round 3 (4 fixes): addressed v4 comments: added missing Refactor anchor on AIGAgentBase.cs deleted ChatAsync helpers; removed inaccurate Refactor comments on GenerateWithReasoningAsync (was already streaming on dev); widened source-regression scan from src/Aevatar.AI.Core only to all three touched paths (+Studio.Hosting + Classifier)

Final cluster shape

  • ChatStreamAsync is the only authoritative AI executor at all formal entrypoints
  • Direct provider.ChatAsync calls only allowed at provider boundary abstraction (excluded from regression guard)
  • Shared ChatStreamContentAggregator helper for offline aggregation use cases
  • Source-regression assertion locks the no-regression rule across src/Aevatar.AI.Core + src/Aevatar.Studio.Hosting + agents/Aevatar.GAgents.ChatbotClassifier

Next action

PR is currently BEHIND origin/dev by 6 commits + CI still pending on HEAD 7acfda4b. Controller will:

  1. Merge origin/dev into the cluster branch (clean — same path as 688/689 took)
  2. Run local guards
  3. Push (re-trigger CI)
  4. Wait for CI green → gh pr merge --merge to dev

Cluster-024 will then move to clusters_done.


中文

各轮要点

  • Fix round 1 (4 项): 处理 quality reviewer 关于抽取 ChatStreamContentAggregator 的建议(生产代码 5 处共享一个 helper);为 tests reviewer 的 reject 补 StudioGenerateGAgentStreamingTests
  • Fix round 2 (3 项): 处理 v3 tests reject (round 1 抽取 helper 暴露的新需求):加 ToolCallLoop middleware 测试断言 IsStreaming == true + 回退无关的 spawn-codex.sh drive-by
  • Fix round 3 (4 项): 处理 v4 各项 comment: 在 AIGAgentBase.cs 被删的 ChatAsync helper 处补缺失的 Refactor 锚点; 删除 GenerateWithReasoningAsync 上不准确的 Refactor 注释 (那俩在 dev 上本来就流式); 把 source-regression 扫描从只看 src/Aevatar.AI.Core 扩展到三个 touched 路径 (+Studio.Hosting + Classifier)

最终 cluster 形态

  • ChatStreamAsync 是所有正式入口唯一权威 AI executor
  • 直接调 provider.ChatAsync 只在 provider boundary 抽象层允许 (regression guard 豁免)
  • 共享 ChatStreamContentAggregator helper 服务离线聚合场景
  • source-regression 断言把"不回退"规则锁在 src/Aevatar.AI.Core + src/Aevatar.Studio.Hosting + agents/Aevatar.GAgents.ChatbotClassifier 三处

下一步

PR 当前落后 origin/dev 6 commits + HEAD 7acfda4b CI 还在跑。controller 将:

  1. origin/dev merge 进 cluster 分支 (clean,和 688/689 走过的路径一致)
  2. 跑本地 guards
  3. push (重启 CI)
  4. CI 全过 → gh pr merge --merge 进 dev

cluster-024 随后进入 clusters_done

@loning loning added auto-loop Created by codex-refactor-loop skill and removed auto-loop-reviewing Phase 8 reviewer codex round in flight labels May 19, 2026
@loning loning changed the base branch from dev to auto-refact-dev May 19, 2026 09:09
@loning loning merged commit a461989 into auto-refact-dev May 19, 2026
13 checks passed
@loning
Copy link
Copy Markdown
Contributor Author

loning commented May 19, 2026

✅ Auto-merged to auto-refact-dev after Phase 8 consensus

Commit: a4619897

English

PR #687 merged after 3 fix-codex rounds reaching 3/3 unanimous approve. Cluster-024 (ChatStreamAsync as the only AI executor) is now in the integration branch auto-refact-dev. The rollup PR auto-refact-dev → dev will open when iter15 clusters {025, 026, 027} also land (PR 688/689 = cluster 028/029 already in dev via controller's earlier targeting mistake — corrected for cluster-024 onward).

中文

PR #687 在 3 轮 fix-codex 后达成 3/3 一致 approve 并 merge。cluster-024 (ChatStreamAsync 作为唯一 AI executor) 已进 integration 分支 auto-refact-dev。等 iter15 剩余 cluster {025, 026, 027} 也落地后,会开 rollup PR auto-refact-dev → dev(PR 688/689 = cluster 028/029 已经在 dev 里,是 controller 早先的 target 错误 — 从 cluster-024 起已纠正)。

loning added a commit that referenced this pull request May 19, 2026
…PR revert)

cluster-024 fix codex round 3 reverted this file out of PR #687 scope
per quality reviewer's "unrelated drive-by" feedback. That left dev /
auto-refact-dev WITHOUT the fix, and Phase 6 sync pulled the unfixed
version back into auto-refact-dev.

Result: when no --add-dir args passed, the `ADD_DIRS[@]` expansion under
`set -u` throws "unbound variable" → spawn-codex.sh exits 1 → 3 Phase 9
solvers for #701 round 3 all failed immediately.

This is a SKILL-level fix (not cluster work), so it goes on
auto-refact-dev directly (not a cluster PR) and lives there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@loning loning mentioned this pull request May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-loop Created by codex-refactor-loop skill

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant