[codex] Refactor LLM reply continuation into agent runs#597
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## feature/lark-bot #597 +/- ##
====================================================
- Coverage 71.95% 71.92% -0.04%
====================================================
Files 1258 1258
Lines 91512 91512
Branches 11998 11998
====================================================
- Hits 65851 65818 -33
- Misses 20926 20961 +35
+ Partials 4735 4733 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes 🚀 New features to boost your workflow:
|
| }, | ||
| }; | ||
|
|
||
| await _actorDispatchPort.DispatchAsync(actor.Id, envelope, ct); |
There was a problem hiding this comment.
[codex] severity=blocker, category=concurrency
This uses synchronous IActorDispatchPort.DispatchAsync to start the run actor. ConversationGAgent awaits this while handling the inbound turn, and AgentRunGAgent later dispatches LlmReplyReadyEvent back to that same conversation actor, so the real Local/Orleans dispatch implementations can form an A -> B -> A wait cycle and also block the webhook for the full LLM run. This violates CLAUDE.md: cross-actor waits must be continuationized. Start the run through a one-way enqueue/publish mechanism that returns after acceptance, then let the run actor deliver the ready/drop signal independently.
| .Match(current, evt) | ||
| .On<AgentRunStartedEvent>(ApplyStarted) | ||
| .On<AgentRunReplyProducedEvent>(ApplyReplyProduced) | ||
| .On<AgentRunDroppedEvent>(ApplyDropped) |
There was a problem hiding this comment.
[glm-5.1] severity=major, category=arch
AgentRunGAgent reaches terminal states (ReplyProduced, Dropped, Failed) but has no self-deactivation or cleanup mechanism. Per AGENTS.md: "默认优先 run/session/task-scoped actor" and "长期 actor 只保留给事实拥有者". Each LLM reply creates a persistent actor that accumulates indefinitely. Consider persisting the terminal event then scheduling a self-deactivation (e.g. IActorRuntime.DeactivateAsync or a delayed cleanup self-message) so the actor does not remain alive after its single-shot purpose is served.
| } | ||
|
|
||
| internal const long MaxInboxRequestAgeMs = 5 * 60 * 1000; | ||
| if (State.Status is AgentRunStatus.ReplyProduced or AgentRunStatus.Dropped or AgentRunStatus.Failed) |
There was a problem hiding this comment.
[Consensus: 2 models] severity=major, category=bug
The terminal-state dedup guard at this line silently drops undelivered ready/drop notifications when conversation dispatch fails. The actor persists AgentRunReplyProducedEvent / AgentRunFailedEvent / AgentRunDroppedEvent before dispatching to the conversation actor; if that dispatch throws, a retry of the start command will hit this guard and return without re-emitting the signal — ConversationGAgent.PendingLlmReplyRequests stays stuck. The guard also has zero test coverage. Either move the guard to after a confirmed-delivered phase (dispatch retry as a persisted phase), or only mark the run terminal once the conversation acknowledges. Add a test covering both the legitimate-duplicate case and the failed-dispatch retry case.
Per-model verbatim
- kimi:
HandleStartAsyncintroduces a new terminal-state dedup guard (ignoring duplicate start commands when the run is alreadyReplyProduced,Dropped, orFailed). This idempotency behavior is not covered by any test inAgentRunGAgentTests.cs. Add a test that sends a secondAgentRunStartRequestedafter the run reaches a terminal state and asserts it is ignored. - codex: This terminal duplicate guard can drop undelivered results. The run persists
AgentRunReplyProducedEvent/AgentRunFailedEventbeforeDispatchReadyEventAsync, and persistsAgentRunDroppedEventbefore the drop notification; if that conversation dispatch fails, a retry of the start command sees terminal state here and returns without re-emitting the ready/drop signal, leavingConversationGAgent.PendingLlmReplyRequestsstuck. Make delivery a retryable persisted phase, or only mark the run terminal after the conversation has accepted/acknowledged the notification.
|
|
||
| _logger.LogInformation("Started channel LLM reply inbox on {StreamId}", InboxStreamId); | ||
| ArgumentException.ThrowIfNullOrWhiteSpace(correlationId); | ||
| return ActorIdPrefix + correlationId.Trim(); |
There was a problem hiding this comment.
[glm-5.1] severity=minor, category=arch
HandleStartAsync calls PersistDomainEventAsync(AgentRunStartedEvent) then synchronously calls ProcessAsync which runs the full LLM generation (up to 300s). If PersistDomainEventAsync for the started event succeeds but ProcessAsync throws an unguarded exception (e.g. from EnsureTargetActorAsync or a subsequent PersistDomainEventAsync call), the actor is left in Started state with no terminal event and no retry path. Wrap the ProcessAsync call in a try/catch that persists AgentRunFailedEvent on any unexpected exception to guarantee terminal state is always reached.
| @@ -712,6 +755,8 @@ private sealed class DispatchingActorRuntime(params (string Id, IActor Actor)[] | |||
| static pair => pair.Actor, | |||
There was a problem hiding this comment.
[glm-5.1] severity=minor, category=test
NoOpEventSourcing.TransitionState returns current unchanged, so AgentRunGAgent.State never advances past AGENT_RUN_STATUS_UNSPECIFIED in any test. This means the duplicate-terminal-start guard at HandleStartAsync line ~78 (State.Status is AgentRunStatus.ReplyProduced or AgentRunStatus.Dropped or AgentRunStatus.Failed) is never exercised. Either wire a real IEventSourcingBehavior that calls TransitionState, or add a dedicated test that pre-populates the state to a terminal status and asserts the guard fires.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c108ef4ab4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // The run has not entered a terminal state yet. Leaving it Started lets the | ||
| // durable dispatcher retry the start command and re-emit the ready/drop signal. | ||
| _logger.LogWarning( | ||
| ex, | ||
| "Agent run output notification was not accepted; run remains retryable: runId={RunId} correlation={CorrelationId}", | ||
| runId, | ||
| request.CorrelationId); | ||
| } |
There was a problem hiding this comment.
Re-queue failed output notifications
When the run actor generates a reply/drop but SendToAsync fails, this catch logs and returns successfully, so the consumed AgentRunStartRequested is acknowledged and nothing schedules another attempt. ConversationGAgent.DispatchPendingLlmReplyAsync only schedules its durable retry when the initial dispatcher call throws, while this path leaves state merely Started and does not persist the request or enqueue a retry, so a transient target dispatch failure can leave the pending LLM reply stuck until some unrelated reactivation/stale cleanup instead of notifying the user.
Useful? React with 👍 / 👎.
Multi-model fix-check on PR #597 (head
|
| # | comment | location | verdict |
|---|---|---|---|
| 1 | codex blocker concurrency (sync cross-actor wait) | AgentRunDispatcher.cs:58 |
✅ 4/5 resolved* |
| 2 | glm-5.1 minor arch (try/catch around ProcessAsync) | AgentRunGAgent.cs:78 |
✅ 5/5 resolved |
| 3 | glm-5.1 major arch (run-actor cleanup) | AgentRunGAgent.cs:86 |
✅ 5/5 resolved |
| 4 | consensus major bug (terminal guard / dispatch ordering) | AgentRunGAgent.cs:104 |
✅ 3/5 resolved* |
| 5 | codex bot P2 (re-queue failed output) | AgentRunGAgent.cs:136 |
✅ 5/5 resolved |
| 6 | glm-5.1 minor test (NoOpEventSourcing) | AgentRunGAgentTests.cs:1028 |
✅ 5/5 resolved |
*Divided rows verified against current source:
- Refactor/project namespace #1:
AgentRunDispatcher.DispatchAsyncnow uses_streamProvider.GetStream(actor.Id).ProduceAsync(envelope, ct)— one-way enqueue. The singleunresolveddissent claimed the file still usesIActorDispatchPort.DispatchAsync, which is factually incorrect at the current head. - Aevatar context database #4:
ProcessAsynclines 316–317 invokeDispatchReadyEventAsyncbeforePersistReplyProducedAsync, so a failed dispatch leaves the run inStartedandTryHandleOutputDispatchFailureAsyncschedules a durable retry. The 2partialdissents either misread the order or raise a future-refactoring concern, not a current defect.
Fixes landed in c108ef4ab4 ("Address agent run review comments") and 2e4a1d1bcb ("Harden agent run output retry handling").
Reviewer note: gemini failed both runs due to interactive auth prompt; remaining 5 reviewers were unanimous on 4/6 and majority-correct on 2/6.
Generated by /opencode-pr-fix-check. Threads not auto-resolved — please mark each thread resolved manually if you agree.
Summary
AgentRunGAgentcontinuation actors.IChannelLlmReplyRunDispatcherand the NyxidChat dispatcher that creates/getsAgentRunGAgentbycorrelationIdand dispatches typedAgentRunStartRequestedcommands.ConversationGAgentand tests to use the run dispatcher path and retires the old inbox interface/hosted service path.Validation
dotnet build test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologodotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologodotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/Aevatar.GAgents.Channel.Protocol.Tests.csproj --nologobash tools/ci/test_stability_guards.shbash tools/ci/architecture_guards.shRefs #596