Refactor iter1: clean 3 architectural violations + add codex-refactor-loop skill#678
Merged
Merged
Conversation
Refactor (iter1/cluster-002): - Old pattern: enable/disable captured readmodel version then Task.Delay loop until projection version+status matched, violating "ACK 诚实" and "no query-time replay/priming". - New principle: lifecycle commands return accepted only; readmodel freshness is observed by separate follow-up queries or push events, never by polling inside the command handler. Deleted WaitForAgentStatusAsync and authoring-side wait-budget knobs. 8 AgentBuilderTool tests rewritten to deterministic assertions, no Task.Delay. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
iter1 batch A first cluster. Verification: 8/8 tests pass, architecture_guards + test_stability_guards green, scope-honest 2 files. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter1/cluster-005): - Old pattern: StreamingToolExecutor used a lock-protected mutable list/status scheduler while background tasks mutated execution state; ToolApprovalMiddleware kept denial counts in an instance dictionary that crossed tool calls without an actor-owned fact. Violated "单线程事实源" and "回调只发信号". - New principle: streaming tool execution state advances through a single serialized async coordinator loop fed by Channel<T>; middleware denial counters are request-scoped via context Items, not hidden service state. Background tool tasks now only post signals; coordinator loop owns scheduler mutation. 21 tests (11 executor + 10 middleware) rewritten to deterministic gates, no Task.Delay/Sleep. Removed StreamingToolExecutorTests.cs from test_polling_allowlist.txt. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
iter1 batch A. Verification: 21 tests pass (11 executor + 10 middleware), public API stable, scope-honest 5 files, no Task.Delay/Thread.Sleep in tests. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter1/cluster-003): - Old pattern: ScopeServiceEndpoints started a script run, pumped raw EventEnvelope from a local channel through a TaskCompletionSource, mapped EventEnvelope→AGUI in the endpoint, and synthesized completion. Violated "统一投影链路", "EventEnvelope 是唯一投影传输壳", and "Host 只做协议与组合". - New principle: host attaches a typed AGUI projection session stream and writes only SSE protocol frames; EventEnvelope→AGUI mapping and terminal classification live in the unified projection pipeline. Introduced 5 new projection types (port, lease, context, runtime lease, session projector) — judged appropriate pattern application by verify codex, each with class-level XML documentation. EventEnvelope remains the single projection transport. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…eline iter1 batch A last cluster. Verification: 15+88=103 integration tests pass, architecture_guards + query_projection_priming_guard + test_stability_guards green, scope-honest 10 files (5 new projection types each with XML class-level docs). Co-Authored-By: codex (gpt-5) <noreply@openai.com>
The skill packages the controller pattern used to drive parallel codex
subprocesses (analyze → implement → verify) through isolated git worktrees,
with /loop dynamic wakeups as the pacing primitive.
Structure:
- SKILL.md — bootstrap, phase 1-4 workflow, hard rules
- REFERENCE.md — state schema, batching heuristics, recovery playbook
- prompts/{audit,implement,verify}.md — codex prompt templates
- scripts/spawn-codex.sh — standardized codex exec wrapper
Complements the existing refactor-team skill: refactor-team uses Claude
Agent subagents in-process; codex-refactor-loop uses OS-level worktrees
and codex CLI for true multi-hour unattended parallel runs.
Also ignores .refactor-loop/ runtime state directory.
Refactor (iter1/cluster-001): - Old pattern: SkillRunnerGAgent dispatched UserAgentCatalogExecutionUpdateCommand to a well-known catalog actor; catalog held both membership and per-runner execution summaries, forming a hotspot funnel with cross-actor state writes. Violated "Actor 即业务实体", "单线程 actor 不做热点共享服务", and "聚合必须 actor 化". - New principle: catalog actor owns membership only; runners commit execution events; UserAgentCatalogProjector materializes execution fields from runner committed SkillRunnerState into the catalog readmodel. Concrete changes: - Removed Status/LastRunAt/NextRunAt/ErrorCount/LastError writes from UserAgentCatalogGAgent (catalog no longer owns execution facts). - Deleted UserAgentCatalogExecutionUpdateCommand and UserAgentCatalogExecutionUpdatedEvent proto messages with field number reservations; removed UserAgentCatalogLegacyAliases partials referencing them. - Removed execution fields from UserAgentCatalogEntry / UserAgentCatalogUpsertCommand proto with reserved field number markers. - Introduced UserAgentCatalogReadModelEntry as the read-side DTO so runner-owned execution facts never reappear in catalog state. - UserAgentCatalogProjector now consumes committed SkillRunnerState as the single source for execution fields. - Updated docs/canon/daily-command-pipeline.md to match. - 69 ChannelRuntime tests pass; 3 CI guards green (architecture, test_stability, projection_state_version). Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…-004) Refactor (iter1/cluster-004): - Old pattern: NyxID chat SSE subscribed to raw EventEnvelope and waited on TaskCompletionSource with 120s timeout; StreamingProxy endpoints called actor.HandleEventAsync inline and mapped projection frames in endpoint code. Violated "统一投影链路", "禁止 stream request-reply 冒充 RPC", and "投递语义 runtime-neutral". - New principle: agent SSE endpoints dispatch commands through IActorDispatchPort and observe typed projection session events; endpoints never infer terminal state from raw actor envelopes. Concrete changes: - NyxID chat SSE uses INyxIdChatSessionProjectionPort (typed AGUI session) instead of raw EventEnvelope subscription. - New NyxIdChatProjectionSession.cs (200 lines, six independent responsibilities) following the StreamingProxy session projection shape. - StreamingProxy endpoints dispatch via IActorDispatchPort. - 89 AI test coverage tests rewritten; source-regression assertions added to both NyxID and StreamingProxy test files to block future direct actor.HandleEventAsync or raw EventEnvelope subscription regressions. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
iter1 batch B. Verification: 69 tests pass, architecture+test_stability+projection_state_version guards green. Removed proto execution messages with field reservations, introduced UserAgentCatalogReadModelEntry, docs/canon updated. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
iter1 batch B. Verification: 89 tests pass (NyxID coverage + StreamingProxy coverage), architecture+test_stability guards green. New NyxIdChatProjectionSession.cs (200 lines, 6 responsibilities mirroring StreamingProxy session shape). Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter2/cluster-006): - Old pattern: ScopeBindingCommandApplicationService.UpsertAsync dispatched six service lifecycle commands then polled service catalog and serving-set query ports until both readmodels became visible, so the HTTP ACK meant "readmodel observed". Violated "ACK 诚实", "读写分离", and "禁止 query-time replay/priming". Same anti-pattern family as cluster-002 but in a different service. - New principle: scope binding upsert returns an accepted lifecycle receipt with stable service/revision/deployment ids; readiness for immediate invoke is observed through an explicit projection/session/query path outside the command application service. Concrete changes: - Deleted ReadModelVisibilityTimeout, ReadModelVisibilityPollInterval, WaitForBindingVisibleAsync, and the post-dispatch wait call. - Removed IServiceServingQueryPort dependency from the command application service. - Added accepted/propagating fields to ScopeBindingUpsertResult. - 33 unit + 88 integration tests pass; query_projection_priming_guard, architecture_guards, test_stability_guards green. - Tests now assert UpsertAsync does not poll readmodels. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
… path iter2 batch C. Verification: 121 tests pass, 3 CI guards green, scope-honest 4 files (net -24 lines, deletion-heavy). Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…r-007) Refactor (iter2/cluster-007): - Old pattern: HouseholdEntityTool resolved/created the actor via IActorRuntime, called actor.HandleEventAsync directly, then cast actor.Agent to IAgent<HouseholdEntityState> and returned fields from internal actor state. HouseholdEntity.RunReasoningAsync used ChatAsync instead of the streaming chain. Violated "投递语义 runtime-neutral", "查询始终走 readmodel", and "AI 对话主链必须流式化". - New principle: household tool dispatches HouseholdChatEvent through IActorDispatchPort and returns an accepted receipt with actor_id + propagation hint instead of live actor state; reasoning streams through ChatStreamAsync and only the accumulated text becomes the committed ReasoningCompletedEvent payload. Concrete changes: - HouseholdEntityTool: IActorDispatchPort dispatch; accepted response shape. - HouseholdEntity.RunReasoningAsync: ChatStreamAsync with incremental delta accumulation (no ToListAsync, no early break). - HouseholdEntityToolSource (SCOPE_EXTEND: factory must pass IActorDispatchPort). - Source-regression test assertions added: no HandleEventAsync(, no actor.Agent, no IAgent<HouseholdEntityState> cast in tool; no production ChatAsync( in entity. - 41 household tests pass; architecture + test_stability guards green. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…m reasoning iter2 batch C. Verification: 41 tests pass, architecture+test_stability guards green, scope-honest 5 files (4 in audit scope + 1 declared SCOPE_EXTEND for the tool factory). Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…008) Refactor (iter4/cluster-008): - Old pattern: cluster-004 routed StreamingProxy SSE endpoints through IActorDispatchPort, but Application/Rooms/StreamingProxyRoomCommandService and StreamingProxyNyxParticipantCoordinator still called actor.HandleEventAsync(envelope) directly, leaving production dispatch outside the runtime boundary. Violated "投递语义 runtime-neutral". - New principle: all StreamingProxy production dispatch sites (endpoints, application services, coordinators) go through IActorDispatchPort; only runtime/internal infrastructure may call HandleEventAsync directly. Concrete changes: - StreamingProxyRoomCommandService + StreamingProxyNyxParticipantCoordinator now inject IActorDispatchPort and dispatch via DispatchAsync. - Source-regression assertions added in StreamingProxyCoverageTests for both production files (no actor.HandleEventAsync, no .HandleEventAsync(). - 52 StreamingProxy tests pass. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
iter4 batch D. Closes the tail of cluster-004: room command service and Nyx participant coordinator now also dispatch via IActorDispatchPort. 52 tests pass. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter4/cluster-009): - Old pattern: UserAgentCatalogCommandPort polled the projection document store inside command paths to return CatalogCommandOutcome.Observed when materialization caught up within a wait budget. Same anti-pattern family as cluster-002 (AgentBuilderTool) and cluster-006 (ScopeBinding) — command path waiting on readmodel. Violated "ACK 诚实". - New principle: catalog command port returns Accepted only; callers needing Observed semantics observe through an explicit query/projection path. Concrete changes: - Removed IProjectionDocumentReader injection + projection polling loop + wait budget from UserAgentCatalogCommandPort. - CatalogCommandOutcome simplified to Accepted-only (Observed/NotFound deleted). - AgentBuilderTool + AgentDeliveryTargetTool callers updated to return accepted/propagating copy instead of mapping Observed → "deleted"/"created". - AgentDeliveryTargetTool delete relies on caller-scoped pre-check before accepted-only tombstone dispatch. - 39 targeted tests pass (CommandPort + AgentBuilderTool + AgentDeliveryTargetTool). - 3 CI guards green (architecture, query_projection_priming, test_stability). - Net -206 lines (deletion-heavy cleanup). Co-Authored-By: codex (gpt-5) <noreply@openai.com>
iter4 batch D. Completes the accepted-only command-path pattern across all 3 known violations (cluster-002, cluster-006, cluster-009). 39 tests pass, net -206 lines. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter5/cluster-012): - Old pattern: cluster-009 correctly removed Observed but left CatalogCommandOutcome as a one-value enum + UserAgentCatalogUpsertResult / UserAgentCatalogTombstoneResult records that only wrapped Accepted. Production callers ignored the result; tests/stubs manufactured new UserAgentCatalog*Result(CatalogCommandOutcome.Accepted) just to satisfy the obsolete shape. Violated "删除优先". - New principle: accepted-only command ports express success by completing the task; if no caller consumes a receipt, return Task and keep accepted/propagating wording at user-facing tool boundary. Concrete changes: - IUserAgentCatalogCommandPort.UpsertAsync/TombstoneAsync return Task. - Deleted CatalogCommandOutcome enum + UserAgentCatalog*Result records. - AgentBuilderTool + AgentDeliveryTargetTool callers await Task and keep their accepted JSON copy unchanged. - Tests stub Task.CompletedTask; new source-regression assertion that CatalogCommandOutcome no longer exists. - 40 targeted tests pass; architecture + test_stability guards green. Controller note: verify codex returned FAIL on a literal scope_paths mismatch — audit-iter-5.md listed the file as `agents/Aevatar.GAgents.ChannelRuntime/AgentDeliveryTargetTool.cs` but the real path is `src/Aevatar.AI.ToolProviders.AgentCatalog/AgentDeliveryTargetTool.cs`. The audit had a wrong path; implement codex correctly modified the actual file. All substantive checks pass. Controller overrides to pass. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…ction iter5 batch E. Completes cluster-009 cleanup by removing the leftover one-value enum + result record wrappers. Ports now return Task; 40 tests pass; net deletion of dead abstraction. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…ter-010) Refactor (iter5/cluster-010): - Old pattern: cluster-003 moved draft-run streaming to the projection pipeline, but ScopeGAgentEndpoints still exposed endpoint-local TryMapEnvelopeToAguiEvent(EventEnvelope) and BuildToolApprovalStruct(Any) wrappers, and ScopeGAgentEndpointsTests reflected into those Host methods to assert raw EventEnvelope -> AGUIEvent mapping. Violated "Host 只做协议 与组合" and locked the old API alive as test-locked residual. - New principle: host endpoint tests verify protocol/interaction behavior only; mapper behavior is tested directly on ScopeGAgentAguiEventMapper. Concrete changes: - Deleted TryMapEnvelopeToAguiEvent and BuildToolApprovalStruct from ScopeGAgentEndpoints (no production callers remain). - Created ScopeGAgentAguiEventMapperTests.cs calling the mapper directly (no reflection) for mapper + invalid-approval-payload cases. - Added source-regression assertion in ScopeGAgentEndpointsTests. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…nt reflection iter5 batch E. Removes a test-locked Host residual that cluster-003 left in place; mapper tests now target ScopeGAgentAguiEventMapper directly. Net deletion of endpoint reflection scaffolding. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…ster-011) Refactor (iter5/cluster-011): - Old pattern: docs/canon/daily-command-pipeline.md still described the pre-refactor path — direct actor.HandleEventAsync, WaitForCreatedAgentAsync projection polling, GetStateVersionForCallerAsync query-port, and UpdateRegistryExecutionAsync execution-update commands. After iter1+2+4 rewrote these paths, the canon misled implementers and QA toward the anti-patterns we just removed. Violated "变更必须可验证". - New principle: canon describes post-iter1/2/4 semantics — dispatch via command/dispatch ports, accepted-only command ACKs, explicit follow-up query/push observation, runner-owned execution facts projected into catalog readmodel. Concrete changes (docs/canon/daily-command-pipeline.md only): - Stage table rewritten around current code (command ports + accepted receipts; no public actor.HandleEventAsync path). - Replaced WaitForCreatedAgentAsync / GetStateVersionForCallerAsync wording with "accepted now, observe via follow-up query after projection catches up". - SkillRunner section updated: execution facts committed by SkillRunnerGAgent and projected into UserAgentCatalogDocument, not pushed back through catalog execution-update commands. - QA bullets updated to remove WaitForCreatedAgentAsync expectations. Verification: stale-keyword grep returned no hits; docs lint passed; 42 canon files lint clean. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…/2/4 semantics iter5 batch E. Pure docs update: docs/canon/daily-command-pipeline.md now matches accepted-only command paths, dispatch ports, and runner-owned execution facts. Net -49 lines. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…(cluster-015) Refactor (iter6/cluster-015): - Old pattern: docs/canon/llm-streaming.md + docs/canon/architecture.md used retired names WorkflowExecutionAGUIEventProjector, EventEnvelopeToAGUIEventMapper, and ProjectionSessionEventHub<WorkflowRunEvent>. Current code is WorkflowExecutionRunEventProjector + EventEnvelopeToWorkflowRunEventMapper + ProjectionSessionEventHub<WorkflowRunEventEnvelope>. Docs taught a pre-rename AGUI-specific mental model. Violated "变更必须可验证". - New principle: canon describes the current run-event projection branch; CQRS/readmodel and realtime output share the same projection input, output branch maps EventEnvelope to WorkflowRunEventEnvelope via the workflow run-event projector. Concrete changes (docs-only, 3 files): - Replaced retired component names in canon component tables, topology diagrams, anchors, and runtime semantics bullets. - Updated src/workflow/Aevatar.Workflow.Presentation.AGUIAdapter/README.md consistent with the canon update. - Stale-name grep returns no hits; docs lint passes. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
iter7 batch G. Closes the last canon drift from the cluster-015 rename. 1 line. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…guards (cluster-016) Refactor (iter7/cluster-016): - Old pattern: iter1-6 repeatedly fixed the same anti-pattern family (direct actor.HandleEventAsync, raw SubscribeAsync<EventEnvelope>, host-side EventEnvelope mapping) and then protected each capability with a file-local source-regression assertion in tests. A new Host/Application endpoint could reintroduce the pattern and only a capability-specific test would catch it. Violated "治理前置 — 架构规则必须可自动化验证". - New principle: architecture_guards.sh owns the broad source-regression bans with a short exact runtime allowlist; capability tests keep behavior assertions but cross-cutting dispatch/projection boundary rules fail centrally in CI. Concrete changes: - tools/ci/architecture_guards.sh: new guard block scanning src/, agents/, tools/ for actor.HandleEventAsync, .HandleEventAsync(, and raw SubscribeAsync<EventEnvelope>. Allowlist limited to 3 exact runtime transport files (LocalActorDispatchPort, LocalActor, RuntimeActorGrain). Filters out interface declarations and comments. - tools/ci/README.md: documents the new guard. Verification (probe-tested by implementer): - bash tools/ci/architecture_guards.sh passes clean on trunk. - Temporary probe `actor.HandleEventAsync(envelope, ct);` in a production non-runtime file made the guard fail (confirming it actually works). - Probe removed; guard passes again. This terminates the "same anti-pattern in another file → new cluster" cycle that filled clusters 008/014; future regressions are caught by CI rather than spawning per-file refactor clusters. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…ks to architecture guards iter7 batch G. Structural improvement: replaces per-file source-regression assertions with a single CI guard. Allowlist limited to 3 runtime transport files. Probe-tested. Terminates the 'same pattern, new file, new cluster' cycle. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…018) Refactor (iter8/cluster-018): - Old pattern: AGENTS.md explicitly forbids ports 5000/5050 (system conflict), but README/LOCAL_DEV_SETUP/ChronoStorageToolOptions + 2 test files still documented localhost:5000 as the local API endpoint, and CI had no guard. Future docs/SDK defaults/tests could trivially reintroduce the rule violation. Violated "治理前置 — 架构规则必须可自动化验证". - New principle: Web/API examples, defaults, and tests use 5100 (the AGENTS.md-approved default); a focused CI guard fails on the precise forbidden URL/default-port tokens without matching generic numeric 5000 used as timeout/page-size/histogram bucket values. Concrete changes (8 files): - 6 source/doc/test files: localhost:5000 → localhost:5100 (and defaultPort: 5000 → 5100 in ChronoStorageToolOptions). - tools/ci/architecture_guards.sh: new forbidden-Web-API-port guard scanning README*/LOCAL_DEV_SETUP/src/test/tools/docs/demos/apps with precise token patterns (localhost:5000, localhost:5050, 127.0.0.1:5000, 127.0.0.1:5050, defaultPort: 5000/5050, defaultPort = 5000/5050). - tools/ci/README.md: documents the new guard. Verification (probe-tested by implementer): - bash tools/ci/architecture_guards.sh passes clean on trunk after cleanup. - Temporary localhost:5000 probe makes the guard fail; probe removed; guard passes again. - Targeted tests pass: CliAppConfigStoreTests + ConfigurationCoverageTests. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…residuals iter8 batch H. Honors AGENTS.md hard rule with a probe-tested CI guard plus residual cleanup. 8 files. Same enforcement pattern as cluster-016: prevents future regressions instead of one-off cleanups. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter10/cluster-019): - Old pattern: production singleton tool-provider clients (NyxIdApiClient, WebApiClient, ChronoStorageApiClient, NyxIdSpecCatalog, ConnectedServiceSpecCache) created raw HttpClient with inconsistent dispose semantics — NyxIdApiClient didn't implement IDisposable, NyxIdSpecCatalog Dispose only released the timer not the http client, and ConnectedServiceSpecCache disposed _http unconditionally making injected-vs-owned ownership ambiguous. Singleton lifetime + raw HttpClient meant DNS pools never refreshed and handler lifetime was unmanaged. Violated async/resource hygiene and was inconsistent with the repo's existing IHttpClientFactory usage in NyxID broker/Studio storage paths. - New principle: production-facing long-lived API clients are typed/named IHttpClientFactory clients; handler lifetime is framework-managed; ownership is single — either fully injected (not disposed by client) or fully owned via explicit fallback path with disposal comment. Concrete changes (11 files): - NyxIdApiClient, WebApiClient, ChronoStorageApiClient: typed clients via AddHttpClient<T>(), eliminating raw HttpClient construction. - NyxIdSpecCatalog: kept singleton for in-memory cache state but moved HTTP to named IHttpClientFactory client; replaced legacy Timer with cancellable PeriodicTimer + async loop, Dispose async-aware. - ConnectedServiceSpecCache: clarified ownership — injected client is never disposed; owned-fallback path disposed exactly once with comment. - New ToolProviderHttpClientRegistrationTests: DI regression test asserting each provider registers via AddHttpClient. - 209 targeted NyxId/Web/ChronoStorage/registration tests pass. - architecture_guards + test_stability_guards green. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…actory iter10 batch I. Closes async/resource hygiene gap — singleton tool clients now use framework-managed handler lifetime with single-owner semantics. 11 files, 209 tests pass, new DI regression test added. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter11/cluster-021): - Old pattern: ForEach/MapReduce/ParallelFanOut enqueued one entry per input item into BackpressureQueueState.Queue (protobuf repeated), then BackpressureHelper.TryDrainOne called Queue.RemoveAt(0) on every child completion. RemoveAt(0) on a protobuf repeated field shifts all remaining entries, so draining N queued entries paid cumulative O(N²) element movement inside actor turns. Violated algorithmic complexity hygiene on a production fanout/backpressure hot path; turned the safety mechanism into a scalability ceiling. - New principle: persist a head_index cursor on BackpressureQueueState; TryDrainOne advances the cursor in O(1) and skips RemoveAt(0); compact the queue only when the consumed prefix exceeds half its length. FIFO order preserved; old persisted state with head_index absent (= 0) reads back identically to current semantics. Concrete changes (6 files): - workflow_state.proto: + int32 head_index = 4 (optional, backward-compat). - BackpressureHelper.cs: cursor-based TryDrainOne + occasional compaction. - ForEachModule / MapReduceModule / ParallelFanOutModule: cursor-aware queued-count reporting. - BackpressureTests: 11 tests covering FIFO, active-worker accounting, empty queue, old-state compatibility, compaction trigger, and source regression against RemoveAt(0). Co-Authored-By: codex (gpt-5) <noreply@openai.com>
iter11 batch J. Removes the O(N²) RemoveAt(0) drain from the production fanout safety mechanism. Backward-compatible proto change (head_index=0 reads old state). 11 tests pass. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter11/cluster-020): - Old pattern: GAgentBase.HandleEventAsync routed every actor message through StaticHandlerAdapter.HandleAsync, which then called MethodInfo.Invoke(_agent, [arg]) for every handled message. Each call allocated an object[] argument array, wrapped handler exceptions in TargetInvocationException, and paid reflection dispatch overhead per message — even though the pipeline and payload unpacker were already cached/expression-compiled. Violated hot-path hygiene on the unified actor message dispatch entry point. - New principle: handler invocation is an expression-compiled Func<IAgent, object, Task> built once at StaticHandlerAdapter construction; HandleAsync only invokes the cached delegate. Sync exceptions propagate directly (no TargetInvocationException wrapping). Sync void handlers wrap with Task.CompletedTask; Task handlers return the direct task. Concrete changes (2 files): - StaticHandlerAdapter.cs: + CompileHandler() at construction time; + cached Func<IAgent, object, Task> _handler; HandleAsync no longer calls MethodInfo.Invoke. - EventPipelineTests.cs: + 6 new tests covering sync exception propagation (no TargetInvocationException), source regression against Invoke( in HandleAsync, and allocation sanity for repeated dispatch. - 233 Foundation.Core tests pass. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…apter iter11 batch J. Removes per-message MethodInfo.Invoke from the actor dispatch hot path. Expression-compiled delegate at construction; sync exceptions propagate directly. 233 tests pass. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
…-022)
Refactor (iter12/cluster-022):
- Old pattern: workflow capability endpoints /api/agents and
/api/actors/{actorId}/{...} could be mapped without .RequireAuthorization()
or any object-level scope check. The standalone Aevatar.Workflow.Host.Api
registers AddAevatarPlatform() but not AddAevatarAuthentication(), so by
default anyone reaching the host can enumerate actor ids and read
readmodel snapshot/timeline/graph data for any actorId. CI had no rule
preventing future hosts from re-introducing the unauthenticated mapping.
Violated "治理前置 — 架构规则必须可自动化验证".
- New principle: a CI guard enforces that every workflow actor query
endpoint mapping in CapabilityApi must call .RequireAuthorization() OR
carry an explicit per-endpoint "security-allowlist: <reason>" comment.
Current mappings carry the allowlist comment explaining they are
dev-only on the workflow standalone host; production hosts must add
authentication and remove the allowlist before exposing these endpoints.
Concrete changes (3 files):
- tools/ci/architecture_guards.sh: +90 lines, probe-tested workflow actor
query endpoint guard. Scans ChatQueryEndpoints.cs MapGet/MapPost calls;
flags any actor-query mapping missing .RequireAuthorization() or
security-allowlist comment.
- src/workflow/Aevatar.Workflow.Infrastructure/CapabilityApi/ChatQueryEndpoints.cs:
+ per-endpoint "security-allowlist: workflow standalone host is dev-only;
production hosts must add .RequireAuthorization() — see cluster-022" comments.
- tools/ci/README.md: documents the new guard.
OUT OF SCOPE (documented as security review TODO):
- Adding a caller-scope contract on workflow query ports.
- Threading caller scope through AI tools (ActorInspectTool,
WorkflowStatusTool, EventQueryTool).
- Tenant/owner filtering on readmodel reads.
These require a security feature PR with proper threat-model review, not
an autonomous refactor loop.
Co-Authored-By: codex (gpt-5) <noreply@openai.com>
… authz iter12 batch K. Narrow-scope governance fix: CI guard surfaces the workflow standalone host's unauthenticated actor query mapping and prevents future regressions; per-endpoint security-allowlist comments make current dev-only status explicit. Full security fix (caller-scope contract, AI tool scope threading, tenant filtering) remains an out-of-scope security review TODO. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Refactor (iter13/cluster-023): - Old pattern: Directory.Packages.props pinned all OpenTelemetry packages to the single OpenTelemetryVersion=1.15.0 property. Production hosts and runtime projects emitted NU1902 vulnerable-package warnings for - OpenTelemetry.Api → GHSA-g94r-2vxg-569j - OpenTelemetry.Exporter.OpenTelemetryProtocol → GHSA-4625-4j76-fww9, GHSA-mr8r-92fq-pj8p, GHSA-q834-8qmm-v933 CI had no rule enforcing the pin advance and the warnings were silently carried. Violated build hygiene on deployable projects. - New principle: OpenTelemetry packages do not all ship the same patch line; pin per-package to the latest patched version available on the line. Vulnerable packages → 1.15.3 (clears the 4 advisories above). AspNetCore/Http instrumentation → 1.15.0 (no higher non-rc available). Instrumentation.Runtime → $(OpenTelemetryVersion)=1.15.1 (max available). Concrete change (1 file): - Directory.Packages.props: bumped OpenTelemetryVersion property from 1.15.0 to 1.15.1 and split per-package PackageVersion pins so that core (.Api), Exporter.Console, Exporter.OpenTelemetryProtocol, and Extensions.Hosting go to 1.15.3, while Instrumentation.Runtime stays at the property value (1.15.1) since no higher patch is published. AspNetCore/Http instrumentation explicitly pinned to 1.15.0 to match the property's prior effective version; advisories on those packages are not in the cleared set. Verification: - dotnet build aevatar.slnx --nologo passes with 0 errors. - dotnet list aevatar.slnx package --vulnerable --include-transitive shows no remaining hits for GHSA-g94r-2vxg-569j, GHSA-4625-4j76-fww9, GHSA-mr8r-92fq-pj8p, or GHSA-q834-8qmm-v933. - architecture_guards.sh + test_stability_guards.sh pass. Co-Authored-By: codex (gpt-5) <noreply@openai.com>
14-iteration unattended refactor loop summary: - 23 clusters merged - 3 CI guards installed (dispatch/projection regression, forbidden Web/API ports, workflow actor query authz governance) - 0 features added, 0 external repo changes - 4 reworks landed cleanly - iter14 must-fix audit returned 0 (loop saturated) Includes cluster ledger with net-line stats per cluster, CI guard probe-test notes, out-of-scope follow-up TODOs (notably cluster-022 full security fix), representative test counts, and recommended review/merge actions for backend reviewers + suggested /cso review for the security follow-up. This is the loop's wrap-up artifact; not a refactor itself. Aligns with CLAUDE.md docs/audit-scorecard/ placement convention. Co-Authored-By: codex-refactor-loop <noreply@aevatar>
…loop
iter14 wrap-up uncovered two skill gaps observed during the 14-iteration run:
1. Local CI passing is necessary but not sufficient — remote CI runs kafka
integration, projection provider e2e, host composition smoke, codecov,
and other jobs that don't fit on the controller. The original SKILL.md
stopped at "git push" and silently assumed remote CI would pass.
2. Several merges across the run printed "Already up to date." because cwd
leaked from a worktree-scoped command into the trunk merge call (harness
persists Bash cwd across invocations).
Skill changes:
- SKILL.md: new Phase 5 ("Remote CI watch") that fires after every push to
trunk when an open PR is present. Arms a Monitor polling gh pr checks
every 60s, classifies each failure (real / flaky / infra / preexisting /
info-only), and dispatches a focused remote-ci-fix codex per real
failure. Caps fix attempts at 2 per check, stops + escalates on stuck.
- SKILL.md Phase 4: added explicit cwd-discipline note ("merge MUST run
from $REPO_ROOT, never from a worktree") with detection heuristic
("Already up to date." after merge = cwd leak).
- prompts/remote-ci-fix.md: new template for the Phase 5 fix codex. Does
diagnose → local reproduce → minimal fix → local verify → commit-staged;
refuses to fix if not locally reproducible (infra/env gap).
- REFERENCE.md: extended state schema with remote_ci section
(pr_number, last_watched_sha, monitor_task_id, per-check attempt
tracking + classification). Documented cwd-leak diagnosis + Phase 5
stuck/long-running playbooks.
No production code changes. Skill is documentation + templates.
Co-Authored-By: codex-refactor-loop <noreply@aevatar>
3 failing checks (fast-gates, event-sourcing-regression, host-composition-smoke):
1. fast-gates + event-sourcing-regression: cluster-018 forbidden-port guard
fired on docs/audit-scorecard/2026-05-19-auto-refactor-loop-summary.md
line 69-70, where the scorecard *describes* the very tokens the guard
blocks (localhost:5000, 127.0.0.1:5000, defaultPort: 5000). False positive.
Fix: exclude docs/audit-scorecard/, docs/history/, and tools/ci/README.md
from the guard scan — these are documentation surfaces that legitimately
describe forbidden tokens for governance/audit purposes.
2. host-composition-smoke: cluster-019 left ambiguous constructors on
NyxIdSpecCatalog and ConnectedServiceSpecCache. Both classes had two
public ctors — a manual fallback (NyxIdToolOptions, HttpClient?=null,
ILogger? = null) and a DI-preferred (NyxIdToolOptions, IHttpClientFactory,
ILogger? = null). Microsoft.Extensions.DependencyInjection's
ValidateService (used by ASP.NET Core BuildServiceProvider with
ValidateOnBuild) does NOT honor [ActivatorUtilitiesConstructor] at
validation time; it raises InvalidOperationException "ambiguous
constructors". Failing test:
Aevatar.Hosting.Tests.MainnetHostCompositionTests
.AddAevatarMainnetHost_WithInMemoryDependencies_ShouldBuildAndStartFullComposition.
Fix: make the manual fallback ctor internal on both classes (only the
IHttpClientFactory ctor remains public; DI picks it unambiguously).
Tests continue to call the manual ctor through existing
InternalsVisibleTo("Aevatar.AI.Tests") declared in the csproj — no test
changes needed.
Verification:
- bash tools/ci/architecture_guards.sh passes (no false positive).
- dotnet build aevatar.slnx --nologo passes with 0 errors.
- Aevatar.Hosting.Tests MainnetHostCompositionTests: 3/3 pass.
- NyxIdSpecCatalogTests + ConnectedServiceSpecCacheTests: 15/15 pass.
Co-Authored-By: codex-refactor-loop <noreply@aevatar>
Single-PR mode produced PR #678 with 47 commits and made review painful because reviewers cannot independently ack individual cluster fixes. Stacked-PR mode addresses both pains: each cluster opens its own small focused PR (typically 1-15 files), reviewer can land them independently, and the integration branch opens a single rollup PR to dev only after all cluster PRs land. Skill changes: - SKILL.md Phase 0: introduces pr_mode in state.json. Default flipped from "single" to "stacked"; "single" reserved for ≤2-cluster loops or explicit user opt-out. Bootstrap surfaces the mode in PushNotification. - SKILL.md Phase 4: split into 4a (single) + 4b (stacked). Stacked path: - chooses PR base per cluster.dependencies[] from audit (empty → base = integration_branch; non-empty → base = first prerequisite cluster branch); - opens PR via gh pr create with cluster summary as body; - stack-rebases downstream on upstream merge (force-with-lease); - waits for all cluster PRs to merge into integration_branch, then opens exactly one rollup PR to review_base_branch with scorecard; - hard cap stack depth at 5, collapse stack to integration on cap with PushNotification. - SKILL.md Phase 4: added cwd discipline guard banner to cover the new gh pr create call alongside merge/push (same harness cwd-leak issue). - REFERENCE.md state schema: added integration_branch, review_base_branch, pr_mode top-level fields; cluster.dependencies[] in planned; pr_number + pr_base_branch in active; pr_number + merged_into in done; rollup_pr struct for the final integration→dev PR. - REFERENCE.md playbook: added Phase 4 stacked-PR rebase storm section (mitigations, conflict handling, bundle rework) and Phase 4 PR creation idempotency section (gh pr list before gh pr create to avoid dup PRs on rerun). No production code changes. Skill is documentation + templates. Co-Authored-By: codex-refactor-loop <noreply@aevatar>
…t findings)
A meta-audit of the 14-iteration loop run found that audit codex
shallowed out: iter3/9/14 returned total_clusters: 0 but 4 verifiable
CLAUDE-clause violations remained unfixed (VoicePresence host-side
session lock+state, OpenAIRealtimeProvider background-loop epoch dict,
ChatRuntime.ChatAsync non-streaming surface + Studio Generate residue,
Studio Infrastructure JSON catalog/draft serializers). Root causes:
1. Prompt rewarded "0" with phrases like "strongly prefer 0" and
"healthy signal".
2. Anchoring on prior reject lists.
3. rg-hit categorization stood in for actually opening files.
4. CI guards (string-level) treated as semantic proof.
5. "Current entry doesn't call it" mistakenly used to reject design
violations on public APIs.
Skill changes:
- prompts/audit.md rewritten as a mechanical 6-step procedure:
- Step 1 coverage manifest with >= 60 opened production files
(src 30, agents 15, workflow 10, tools/ci 3) per CLAUDE rule_id.
- Step 2 fixed analyzer pack of 6 rg commands controller validates
(ChatAsync / JSON serializers / dict+lock / Task.Run+Timer /
EnsureProjection+Replay / actorId-StartsWith+HandleEventAsync).
- Step 3 audit-iter-N-candidates.ndjson with >= 25 candidates,
separate from cluster file.
- Step 4 clusters formed only from accept verdicts; requires_design
flag for deep violations >30 files instead of auto-reject.
- Step 5 reject must cite CLAUDE clause + per-candidate evidence
(guard cover claim requires guard path + line + probe).
- Step 6 0-cluster output requires manifest + candidates + per-reject
evidence + second-pass; otherwise AUDIT_INCOMPLETE not AUDIT_DONE.
- Anti-anchoring red lines forbid "prefer 0" / "healthy signal" /
"covered by guard without scope check" / "current endpoint" excuses.
- SKILL.md Phase 1 controller validation step (a-g) verifies the
manifest, candidate file, analyzer pack presence, reject-reason
evidence sampling, and opened-file count before accepting an audit.
AUDIT_INCOMPLETE triggers re-dispatch with explicit missing-piece
call-out. Audit timeout raised 1800 -> 2400s to accommodate.
- docs/audit-scorecard/2026-05-19-meta-audit-loop-depth-diagnosis.md
captures the codex meta-audit verbatim: 4 missed violations with
file:line evidence, 5 mechanical improvements (now implemented in
the prompt), and codex's own self-critique of audit bias.
No production code changes; skill + retrospective doc only. The 4
missed violations are documented as iter15 candidates for future
auto-refactor runs against the new prompt.
Co-Authored-By: codex (gpt-5) <noreply@openai.com>
Co-Authored-By: codex-refactor-loop <noreply@aevatar>
…ov/patch as real failure
Two skill-hardening fixes combined into one commit (background commit
attempts hung at architecture_guards.sh and were stopped + redone):
1. 3600s codex timeout floor (user mandate):
- spawn-codex.sh hard-rejects --timeout < 3600 (exit 2) with pointer
to CLAUDE.md.
- SKILL.md Phase 1 audit dispatch bumped 2400s → 3600s.
- CLAUDE.md new "Codex CLI 调用规范" section (between CI gates and
编码风格) documenting the invocation interface (codex exec, stdin
via -, --dangerously-bypass-approvals-and-sandbox,
--skip-git-repo-check, -C cwd, --add-dir for worktree), the 3600s
floor with per-phase recommended ranges (audit 3600-7200,
implement 5400-7200, verify 3600-5400, rework 3600 even for
one-liners), the standard wrapper script, background scheduling
(Bash run_in_background + task notifications + ScheduleWakeup
fallback), prompt hard rules (no commit/push/install/test-skip/
Task.Delay, terminator marker required, SCOPE_EXTEND), and
anti-patterns (bare codex, short timeout, argv prompt, missing -C,
codex doing git topology, unvalidated output).
2. codecov/patch treated as real failure (not info-only):
- SKILL.md Phase 5 step 3 rewritten: pull codecov API per-file
patch.misses/partials → map to cluster → dispatch test-add codex
per cluster → re-push, codecov re-evaluates.
- Single explicit exception: project coverage moved <0.5pp AND
cluster declared deletion-heavy. Even then surface via
PushNotification, not silent dismissal.
- New prompts/test-add.md: scope locked to test/**/*.cs; refuses to
modify production code (prints TEST_BLOCKED on missing testability
hooks); coverage target = behavior, not raw line counts; respects
test_stability_guards.sh (no Task.Delay pacing); forbids loosening
existing tests or mock-everything pseudo-coverage.
- SKILL.md Files index added test-add.md.
No production code changes; skill + docs only.
Co-Authored-By: codex-refactor-loop <noreply@aevatar>
PR #678 codecov/patch reported 78.23% (291 hits / 66 misses / 15 partials of 372 patch lines). The 14-iteration refactor loop shipped new production code without sufficient tests; this commit closes the gap on the three highest-exposure surfaces. The other four PR clusters (001/007/020/021) were already covered by tests their own implement codex added — no new tests needed. New test files: - test/Aevatar.GAgentService.Tests/Projection/ScriptServiceAguiProjectionPortTests.cs (+197) Covers cluster-003's 5 new projection types: activation request shape, runtime lease identity, live sink attach/detach, release, and disabled-projection short-circuit. - test/Aevatar.AI.Tests/NyxIdChatProjectionSessionTests.cs (+268) Covers cluster-004's NyxIdChatProjectionSession (200 lines, 6 types): session lifecycle, runtime lease context identity, AGUI codec validation, projector default message ids, terminal RunFinished, invalid context and unmapped envelope filtering. - test/Aevatar.AI.Tests/ToolProviderHttpClientOwnershipTests.cs (+177) Covers cluster-019's IHttpClientFactory migration: typed/injected ownership for NyxID/Web/ChronoStorage clients, named factory use, and token-leak prevention plus cache reuse for ConnectedServiceSpecCache. All tests: - xUnit + FluentAssertions, no Task.Delay/Thread.Sleep - Assert behavior (not just line execution); refuse "mock everything for coverage" pattern per prompts/test-add.md - Carry the // Test-add (test-coverage/pr-678/cluster-NNN): comment block linking to the cluster intent Verification: - 11 new tests pass (2 GAgentService + 9 AI) - dotnet build aevatar.slnx --nologo: 0 errors - architecture_guards.sh: pass - test_stability_guards.sh: pass - Local cobertura shows NyxIdChatProjectionSession.cs at line-rate=1.0 No production code changes. This commit validates the new Phase 5 codecov-driven test-add path in codex-refactor-loop. Co-Authored-By: codex (gpt-5) <noreply@openai.com> Co-Authored-By: codex-refactor-loop <noreply@aevatar>
Phase 5 codecov/patch fix per the new test-add codex path. 3 new test files (+642 lines), 11 tests, no production changes. Co-Authored-By: codex-refactor-loop <noreply@aevatar>
7 tasks
loning
added a commit
that referenced
this pull request
May 19, 2026
…ill updates after PR #678 merge)
eanzhao
added a commit
that referenced
this pull request
May 19, 2026
…view) M4 — IProjectionSessionEventCodec<AGUIEvent> DI collision: NyxId chat, GAgent draft-run and script-service AGUI all projected AGUIEvent through one shared IProjectionSessionEventHub<AGUIEvent>; TryAddSingleton silently dropped whichever module's codec registered second, so a pipeline could run on the wrong codec/channel. Each pipeline now builds its own channel-scoped hub via a factory registration; adds a dedicated ScriptServiceAguiSessionEventCodec (it previously rode on the draft-run codec through the shared registration). M5 — UserAgentCatalogProjector StateVersion: the projector merges two authoritative sources (catalog + runner) and synthesized StateVersion as their sum. Add per-source monotonic guards so a stale/replayed event for one source cannot roll its version back or re-write older fields over newer ones; with the guards the sum is strictly monotonic per document and a valid overwrite watermark. The honest per-source versions are kept; a vector-typed StateVersion would need a proto schema change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eanzhao
added a commit
that referenced
this pull request
May 19, 2026
Three *ProjectionInfrastructureTests asserted `ImplementationType == typeof(<Projector>)` for projection materializers. AddProjectionArtifactMaterializer / AddCurrentStateProjectionMaterializer register materializers wrapped in the Observed*Materializer<TContext, TProjector> observability decorator, so ImplementationType is the decorator type — not the bare projector. These tests had been red on dev independently of PR #678 and of this branch's fixes. Assert the decorator wraps the expected projector via ImplementationType.GenericTypeArguments instead (the Observed* decorators are internal to Aevatar.CQRS.Projection.Core and cannot be named from this test assembly). 10 assertions across 3 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
14-iteration unattended refactor loop. 23 clusters merged, 3 CI guards installed, 0 new features, 0 external repo changes.
→ Full ledger: docs/audit-scorecard/2026-05-19-auto-refactor-loop-summary.md
Headline
Cluster ledger (condensed)
Self-documentation
Every refactored type carries a 3-5 line
// Refactor (iterN/cluster-XXX):comment naming old pattern and new principle, per CLAUDE.md self-documentation requirement.CI guards (probe-tested)
actor.HandleEventAsyncor rawSubscribeAsync<EventEnvelope>. 3-file runtime allowlist.localhost:5000/5050/127.0.0.1:5000/5050/defaultPort: 5000/5050per AGENTS.md hard rule..RequireAuthorization()or per-endpointsecurity-allowlist: <reason>comment onChatQueryEndpointsmappings.Skill
.claude/skills/codex-refactor-loop/packages the running pattern: controller spawnscodex execsubprocesses per phase in isolated worktrees, uses /loop dynamic wakeups. Complementsrefactor-team(Claude Agent subagents) by enabling true OS-level multi-hour unattended parallel runs..refactor-loop/runtime state added to.gitignore.Out-of-scope (security follow-up needed)
/csoor manual threat-model review in a separate PR.Test plan
dotnet build aevatar.slnx --nologopasses on trunk after each merge.tools/ci/architecture_guards.shpasses (3 new guards inside).tools/ci/test_stability_guards.shpasses.tools/ci/query_projection_priming_guard.shpasses.tools/ci/projection_state_version_guard.shpasses.dotnet list aevatar.slnx package --vulnerableclean for OpenTelemetry GHSA-g94r-2vxg-569j / 4625 / mr8r / q834.Notes