Progressive Lark reply via NyxID relay edit-message (#352) by eanzhao · Pull Request #374 · aevatarAI/aevatar

eanzhao · 2026-04-24T05:47:47Z

Closes #352.

Consumes the new POST /api/v1/channel-relay/reply/update endpoint shipped in ChronoAIProject/NyxID#480 (merged as ChronoAIProject/NyxID#483) so Lark bot replies render progressively instead of leaving the user in a 5–30s silent wait.

Redesigned after PR #368 merge: PR #368 moved the reply token from OutboundDeliveryContext (serializable) into ConversationGAgent._nyxRelayReplyTokens (actor-owned in-memory). This invalidated the earlier design where the inbox runtime called the outbound port directly, because the inbox runtime is a stream subscriber and has no access to the actor-scoped token. The current design flips the responsibility: the inbox runtime only signals deltas, and the actor is the sole caller of the outbound port.

Flow

Inbox runtime (async LLM, no outbound port access):
  LLM delta → TurnStreamingReplySink.OnDeltaAsync (throttle 750ms)
    → actor.HandleEventAsync(LlmReplyStreamChunkEvent)
  LLM stream ends → TurnStreamingReplySink.FinalizeAsync (bypass throttle)
    → actor.HandleEventAsync(LlmReplyStreamChunkEvent) final flush
    → actor.HandleEventAsync(LlmReplyReadyEvent)

ConversationGAgent (single-threaded turn, owns reply token + streaming state):
  HandleLlmReplyStreamChunkAsync:
    - resolve runtimeContext with reply token (same path as HandleLlmReplyReadyAsync)
    - read per-correlation streaming state (PlatformMessageId, Disabled, EditCount)
    - first chunk → runner.RunStreamChunkAsync(chunk, null) → placeholder send
    - subsequent → runner.RunStreamChunkAsync(chunk, PMID) → edit
    - any failure → mark state.Disabled, drop further chunks
  HandleLlmReplyReadyAsync:
    - if streaming state healthy and LLM completed:
        force final update if text changed since last flush,
        persist ConversationTurnCompletedEvent directly (no re-send)
    - otherwise: existing RunLlmReplyAsync fallback path
    - cleanup streaming state + reply token in both paths

Architectural rules respected (CLAUDE.md)

Rule	How
Reply token must stay in actor	Inbox runtime can't call port; only dispatches `EventEnvelope`s. Actor's `_nyxRelayReplyTokens` stays the only authoritative store.
No middle-layer `entity/run/session` → state map	Sink state (throttle timestamp, last emitted) lives as instance fields on a per-invocation sink; actor's `_nyxRelayStreamingStates` is actor-owned runtime state (same lifecycle as `_nyxRelayReplyTokens`).
No generic actor query/reply	Chunk dispatch is fire-and-(await-ordered) signaling; no back-channel read.
Strong-typed event evolution	`LlmReplyStreamChunkEvent` is a new event instead of overloading `LlmReplyReadyEvent` with a bool — keeps each event's semantic single-purpose.
Runtime-only event	`LlmReplyStreamChunkEvent` is documented as never-persist; `HandleEventAsync` dispatches to handler without `PersistDomainEventAsync`.

Key changes

proto EmitResult.platform_message_id — surfaces om_xxx from relay /reply response so the actor can thread it into subsequent edits.
proto LlmReplyStreamChunkEvent — runtime-only event from inbox runtime to actor.
NyxIdApiClient.UpdateChannelRelayReplyAsync + text wrapper, distinct detection of 501 edit_unsupported via NyxIdChannelRelayReplyResult.EditUnsupported.
NyxIdRelayOutboundPort.UpdateAsync mirroring SendAsync shape (takes replyToken explicitly, matching the PR Fix NyxID relay callback authentication #368 contract).
IConversationTurnRunner.RunStreamChunkAsync + ConversationStreamChunkResult — unified "send or update based on currentPlatformMessageId" primitive, only invoked by the actor.
IStreamingReplySink + TurnStreamingReplySink — throttled chunk event dispatcher, per-invocation state, no outbound port dependency.
ChannelLlmReplyInboxRuntime — resolves actor reference once, builds sink, passes to generator; finalizes on stream end.
ConversationGAgent — new _nyxRelayStreamingStates dict, HandleLlmReplyStreamChunkAsync handler, streaming short-circuit in HandleLlmReplyReadyAsync, combined cleanup in RemoveNyxRelayReplyToken.

Degradation policy (v1)

Any placeholder or update failure permanently disables streaming for the turn and falls back to the legacy single-shot reply path. A partial stale placeholder may remain visible to the user; the final send arrives as a second message. Transient-failure retry classification tracked separately in #371 as backlog.

Feature flag

NyxIdRelayOptions.StreamingRepliesEnabled = true (default ON)
NyxIdRelayOptions.StreamingFlushIntervalMs = 750 (per issue spec)

Test plan

dotnet build aevatar.slnx clean
dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests — 298 passed (+17 new: sink throttle/dispatch, inbox runtime event selection, outbound port UpdateAsync + SendAsync platform_message_id surfacing)
dotnet test test/Aevatar.AI.Tests — 469 passed (+5 new: UpdateChannelRelayReplyAsync happy / 501 / generic error / empty text)
dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests — 108 passed (+6 new: actor chunk handler first/subsequent, disable on failure, reply-ready short-circuit vs fallback)
bash tools/ci/architecture_guards.sh
bash tools/ci/test_stability_guards.sh
Manual smoke on Lark after merge: send "写一段 300 字的介绍"; confirm placeholder within ~1s, text grows at ~750ms cadence, final matches non-streamed reply. Then flip flag off and confirm single-shot fallback.

Follow-ups

Streaming reply: retry transient update failures before disabling session #371 — classify relay error codes so transient failures (RateLimited, 5xx, some Conflict) can retry a bounded number of times before disabling, rather than disabling on the first update failure.

🤖 Generated with Claude Code

codecov · 2026-04-24T06:03:41Z

Codecov Report

❌ Patch coverage is 87.75510% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.91%. Comparing base (1074d7e) to head (339debc).
⚠️ Report is 8 commits behind head on dev.

Files with missing lines	Patch %	Lines
...c/Aevatar.AI.ToolProviders.NyxId/NyxIdApiClient.cs	87.75%	4 Missing and 2 partials ⚠️

@@           Coverage Diff           @@
##              dev     #374   +/-   ##
=======================================
  Coverage   69.91%   69.91%           
=======================================
  Files        1172     1172           
  Lines       83456    83504   +48     
  Branches    10969    10976    +7     
=======================================
+ Hits        58346    58385   +39     
- Misses      20898    20905    +7     
- Partials     4212     4214    +2

Flag	Coverage Δ
ci	`69.91% <87.75%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...c/Aevatar.AI.ToolProviders.NyxId/NyxIdApiClient.cs	`66.85% <87.75%> (+3.28%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Drives streaming edit-in-place of an LLM reply so Lark users see a placeholder within ~1s and incremental updates instead of a 5-30s silent wait. Built on top of the NyxID relay edit endpoint shipped in ChronoAIProject/NyxID#480 / #483. This redesign keeps the reply token inside the conversation actor (per PR #368's security boundary): the inbox runtime only accumulates deltas and signals them to the actor; the actor is the sole caller of the outbound port, holding both the reply token and the placeholder platform_message_id in in-memory runtime state. ## Flow Inbox runtime (async LLM, no outbound port access): LLM stream delta -> TurnStreamingReplySink.OnDeltaAsync (throttle 750ms) -> actor.HandleEventAsync(LlmReplyStreamChunkEvent) LLM stream ends -> TurnStreamingReplySink.FinalizeAsync (bypass throttle) -> actor.HandleEventAsync(LlmReplyStreamChunkEvent) final flush -> actor.HandleEventAsync(LlmReplyReadyEvent) ConversationGAgent (single-threaded turn, owns reply token + streaming state): HandleLlmReplyStreamChunkAsync: - resolve runtimeContext with reply token (same path as HandleLlmReplyReadyAsync) - read per-correlation streaming state (PlatformMessageId, Disabled, EditCount) - if first chunk: runner.RunStreamChunkAsync(chunk, null PMID) -> placeholder send - if subsequent: runner.RunStreamChunkAsync(chunk, PMID) -> edit - on failure: mark state.Disabled and drop further chunks for this turn HandleLlmReplyReadyAsync: - if streaming state is healthy and LLM completed: force final update if final text differs from last flushed, then persist ConversationTurnCompletedEvent directly (no re-send) - otherwise: existing RunLlmReplyAsync fallback path - cleanup streaming state + reply token in both paths ## Architectural rules respected - Reply token never leaves the actor. Inbox runtime can't call the outbound port; it can only signal via EventEnvelope. Actor's _nyxRelayReplyTokens dict remains the single authoritative store. - No middle-layer ID map. Sink per-turn state (throttle timestamp, last emitted text) lives in instance fields on a per-invocation sink; the actor's _nyxRelayStreamingStates dict is actor-owned runtime state, same lifecycle as _nyxRelayReplyTokens. - No generic actor query/reply. Chunk dispatch is fire-and-(awaited)- order-preserving signaling; no back-channel read. - New proto event LlmReplyStreamChunkEvent is a runtime-only signal documented as never-persist; HandleEventAsync dispatches to handler without PersistDomainEventAsync, matching existing event flow. - Strong-typed event evolution: new event instead of overloading LlmReplyReadyEvent with a boolean, keeping semantics single-purpose. ## Degradation policy (v1) Any placeholder or update failure permanently disables streaming for the turn and falls back to the legacy single-shot reply path. Partial stale placeholder may remain visible to the user; the final send arrives as a second message. Transient-failure retry is tracked separately in #371 as backlog. ## Feature flag NyxIdRelayOptions.StreamingRepliesEnabled defaults to true; StreamingFlushIntervalMs defaults to 750ms per the issue spec. ## Verification - dotnet build aevatar.slnx - dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests (298 passed, +17 new) - dotnet test test/Aevatar.AI.Tests (469 passed, +5 new) - dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests (108 passed, +6 new) - bash tools/ci/architecture_guards.sh - bash tools/ci/test_stability_guards.sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…treaming-reply # Conflicts: # test/Aevatar.GAgents.Channel.Protocol.Tests/ConversationGAgentDedupTests.cs

…treaming-reply # Conflicts: # agents/Aevatar.GAgents.Channel.Runtime/Conversation/ConversationGAgent.cs # agents/Aevatar.GAgents.ChannelRuntime/ChannelLlmReplyInboxRuntime.cs

…treaming-reply Resolve conflicts in ChannelLlmReplyInboxRuntime by combining streaming reply sink (TimeProvider-driven TurnStreamingReplySink) with dev's bot-owner LLM config resolution (INyxIdRelayScopeResolver + IUserConfigQueryPort feeding BuildEffectiveMetadataAsync). Tests on both sides retained; ambiguous NyxIdRelayOptions references in streaming tests qualified to the Channel.NyxIdRelay type.

eanzhao

我看了 aevatar #374 和 NyxID #483 两边的链路。happy path 方向是对的，但下面两个点会影响这个 feature 是否真正按预期工作，尤其是失败降级语义。

eanzhao · 2026-04-24T14:31:40Z

+        {
+            Logger.LogInformation(
+                "Streaming chunk delivery failed; disabling streaming for turn. correlation={CorrelationId}, code={Code}, editUnsupported={EditUnsupported}",
+                evt.CorrelationId,


[P1] 这里把 streaming 标成 Disabled 后，最终 LlmReplyReady 会回到 RunLlmReplyAsync 再发一次完整回复。但只要第一段 chunk 已经成功走过 /reply，NyxID reply token 就已经被消费；NyxID 的 /reply/update 也明确要求 JTI 已经被 /reply 消费过。因此后续任意 edit 失败、final flush 失败、LLM 终局失败后的“单次回复补偿”都会尝试复用已消费 token，实际大概率 401，用户只会看到 stale partial，而不是最终答案。建议重新设计失败降级：比如继续尝试最终 edit、把 partial 作为失败终态，或引入明确的 second-send/API-key 补偿路径。

已修复 at 0d3a91a。

现在 streaming 状态有两种失败分支，语义由 reply token 是否已被消费决定：

Disabled = 首次发送前就失败（runtime context 没 token，或第一次 /reply 本身失败），此时 token 仍然可用，LlmReplyReady 走 RunLlmReplyAsync 的 fallback 是安全的（保留旧行为）。

SuppressInterim = 第一 chunk 已经成功走过 /reply（PlatformMessageId 非空），之后的 /reply/update 失败。此时 JTI 已死，绝不回退到 RunLlmReplyAsync。后续 interim chunk 直接丢弃，但 TryCompleteStreamedReplyAsync 仍会尝试 final edit（本来就走 /reply/update）。

Final edit 还失败时，新代码也不会 fallback 到 /reply——按你说的"把 partial 作为失败终态"：直接用最后一次成功 flush 的文本落一个 ConversationTurnCompletedEvent，链路停，token 清理。用户看到 stale partial，但不会再 401 风暴。

新增三条 regression test 覆盖三种分支：

HandleLlmReplyStreamChunkAsync_InterimEditFailureAfterTokenConsumed_SuppressesSubsequentChunksWithoutDisablingFinalEdit

HandleLlmReplyReadyAsync_WhenTokenAlreadyConsumedAndInterimEditFailed_RetriesFinalEditInsteadOfReusingToken

HandleLlmReplyReadyAsync_WhenTokenConsumedAndFinalEditAlsoFails_PersistsLastFlushedPartialAsTerminalWithoutReusingToken

关于"second-send / API-key 补偿路径"——目前没加：NyxID relay 那边没 second-send 协议，绕过 token 用 bot API key 直发要打穿 actor 边界（token 是 actor-owned runtime state），改动面太大；当下诚实地把 partial 作为 terminal 比伪装 fallback 更稳。如果后面定义了独立的 second-send contract，可以再接进来。

eanzhao · 2026-04-24T14:31:40Z

+            if (string.IsNullOrEmpty(chunk.DeltaContent))
+                continue;
+
+            output.Append(chunk.DeltaContent);


[P2] 当前 streaming sink 只在 ChatStreamAsync 产出 DeltaContent 后才会被调用，所以首条 Lark 消息不是在 LLM 调用开始时发出，而是等模型 first delta。这个实现能把“首次可见”从完整回复完成提前到 first delta，但不能保证 issue 里说的 ≤1s 占位；如果模型冷启动、路由、工具调用或 provider 首 token 慢，用户仍会静默等待。若目标是稳定 ≤1s，需要在开始 LLM 前先发送固定 placeholder，再用 delta 覆盖。

已修复 at 0d3a91a。

在 NyxIdConversationReplyGenerator.GenerateReplyAsync 里，进入 ChatStreamAsync 之前就先走一次 streamingSink.OnDeltaAsync(placeholder, ct)。这样首条 Lark 消息在 LLM 真正产 delta 之前就发出去了，time-to-first-visible = Lark 出站 RTT，和模型冷启动、路由、tool call 前置无关。第一条真 delta 会通过 edit-in-place 把 placeholder 覆盖掉。

Placeholder 由 NyxIdRelayOptions.StreamingPlaceholderText 控制，默认 "…"（短、语言中立、emoji 渲染差异小）。设成空/空白可关掉——此时退回到你原来的"等首 delta 再发"语义，代价是冷启动路径 ≤1s 不能保证。

注意两点失败耦合：

Placeholder 发送失败 → PlatformMessageId 为空 → 走 P1 里 Disabled 分支 → LlmReplyReady 可以安全 fallback 到 RunLlmReplyAsync。

Placeholder 成功 + 后续 LLM 抛 / 超时 / 空 reply → FinalizeAsync 或 HandleLlmReplyReadyAsync 的 final edit 负责把 placeholder 覆盖为错误分类文案或最终文本（走 /reply/update，不再用死 token）。

新增 3 条测试：

GenerateReplyAsync_WithStreamingSinkAndPlaceholderConfigured_EmitsPlaceholderBeforeFirstDelta

GenerateReplyAsync_WithStreamingSinkButEmptyPlaceholderOption_SkipsPlaceholderEmit

GenerateReplyAsync_WithoutStreamingSink_SkipsPlaceholderEmit

…eview) P1: Once the first streaming chunk consumed the NyxID /reply token, a subsequent interim /reply/update failure marked the turn Disabled, and LlmReplyReady then fell through to RunLlmReplyAsync which issued a fresh /reply with the already-consumed JTI (401). The user was left staring at a stale partial. The fix splits the in-memory streaming state into two failure modes: Disabled (pre-send, reply token still fresh, fallback to /reply is safe) vs SuppressInterim (post-send, token dead, skip interim edits but still try the final edit). If the final edit also fails, the actor now persists the last flushed partial as the terminal completion rather than spinning on a guaranteed-401 fallback. P2: The streaming sink only fired on the first non-empty LLM delta, so cold-start, router handoff, or tool-call-before-first-token pushed the time-to-first-visible above the ≤1s target. Introduce an opt-in StreamingPlaceholderText (default "…") that the reply generator emits as the very first streaming chunk, before awaiting ChatStreamAsync. The first real delta overwrites the placeholder via edit-in-place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eanzhao force-pushed the feat/2026-04-24_lark-streaming-reply branch from e9116b1 to e5c3324 Compare April 24, 2026 07:45

eanzhao and others added 4 commits April 24, 2026 16:47

Merge remote-tracking branch 'origin/dev' into feat/2026-04-24_lark-s…

bb61981

…treaming-reply # Conflicts: # test/Aevatar.GAgents.Channel.Protocol.Tests/ConversationGAgentDedupTests.cs

Merge remote-tracking branch 'origin/dev' into feat/2026-04-24_lark-s…

e6dbcb9

…treaming-reply # Conflicts: # agents/Aevatar.GAgents.Channel.Runtime/Conversation/ConversationGAgent.cs # agents/Aevatar.GAgents.ChannelRuntime/ChannelLlmReplyInboxRuntime.cs

Merge branch 'dev' into feat/2026-04-24_lark-streaming-reply

68d8234

eanzhao commented Apr 24, 2026

View reviewed changes

eanzhao and others added 2 commits April 24, 2026 22:48

Merge branch 'dev' into feat/2026-04-24_lark-streaming-reply

339debc

eanzhao merged commit d750c9f into dev Apr 24, 2026
12 checks passed

eanzhao mentioned this pull request Apr 25, 2026

Lark streaming reply: explicit phase state machine + centralized unavailable guard #405

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progressive Lark reply via NyxID relay edit-message (#352)#374

Progressive Lark reply via NyxID relay edit-message (#352)#374
eanzhao merged 7 commits intodevfrom
feat/2026-04-24_lark-streaming-reply

eanzhao commented Apr 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

eanzhao left a comment

Uh oh!

eanzhao Apr 24, 2026

Uh oh!

eanzhao Apr 24, 2026

Uh oh!

eanzhao Apr 24, 2026

Uh oh!

eanzhao Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eanzhao commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Flow

Architectural rules respected (CLAUDE.md)

Key changes

Degradation policy (v1)

Feature flag

Test plan

Follow-ups

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eanzhao left a comment

Choose a reason for hiding this comment

Uh oh!

eanzhao Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eanzhao commented Apr 24, 2026 •

edited

Loading

codecov Bot commented Apr 24, 2026 •

edited

Loading