Skip to content

Implement issue #328 unified channel inbound backbone#346

Merged
eanzhao merged 5 commits intodevfrom
refactor/2026-04-23_issue-328-unified-channel-inbound
Apr 23, 2026
Merged

Implement issue #328 unified channel inbound backbone#346
eanzhao merged 5 commits intodevfrom
refactor/2026-04-23_issue-328-unified-channel-inbound

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 23, 2026

Summary

Closes #328.

This PR completes the Aevatar-side implementation for the unified channel inbound backbone and removes the old Nyx relay side path that was bypassing ConversationGAgent.

What changed

  • rename Aevatar.GAgents.Channel.Lark to Aevatar.GAgents.Platform.Lark and split solution filters into aevatar.channels.slnf and aevatar.platforms.slnf
  • add Aevatar.GAgents.Channel.NyxIdRelay as the channel-neutral Nyx relay transport and outbound port
  • extend ChatActivity with typed TransportExtras and thread outbound reply data through OutboundDeliveryContext
  • add Nyx identity lookup on ChannelBotRegistration query paths so relay traffic resolves registrations without depending on activity.Bot.Value
  • generalize LarkConversationTurnRunner into ChannelConversationTurnRunner
  • move relay ingress back onto the single inbound backbone: endpoint auth + parse + dispatch ChatActivity + 202, with business orchestration handled by ConversationGAgent
  • replace the old endpoint-scoped reply accumulation path with the event-driven LLM reply flow (NeedsLlmReplyEvent / LlmReplyReadyEvent + inbox runtime)
  • preserve the current slash-command short-circuit behavior, including unknown-slash fallback handling from the latest dev
  • remove the legacy direct Lark ingress/egress code that no longer belongs in the platform module
  • update ADR/canonical docs and add CI guards to prevent webhook code from directly creating NyxIdChatGAgent

Impact

  • Nyx relay is now a transport adapter instead of a second inbound business path
  • ConversationGAgent is again the single inbound fact source for relay traffic
  • relay-originated messages now carry typed conversation scope, Nyx identifiers, and outbound delivery context through the standard channel pipeline
  • platform-specific rendering stays under Platform.Lark; relay transport concerns live under Channel.NyxIdRelay

Validation

Passed locally:

dotnet build aevatar.slnx --nologo --no-restore -m:1
bash tools/ci/architecture_guards.sh

dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/Aevatar.GAgents.Channel.Protocol.Tests.csproj --nologo --no-build
dotnet test test/Aevatar.AI.Tests/Aevatar.AI.Tests.csproj --nologo --no-build --filter "FullyQualifiedName~NyxIdRelayAuthValidatorTests|FullyQualifiedName~NyxIdChatEndpointsCoverageTests"
dotnet test test/Aevatar.GAgents.Platform.Lark.Tests/Aevatar.GAgents.Platform.Lark.Tests.csproj --nologo --no-build
dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo --filter "FullyQualifiedName~ChannelConversationTurnRunnerTests|FullyQualifiedName~NyxRelayAgentBuilderFlowTests|FullyQualifiedName~ServiceCollectionExtensionsTests|FullyQualifiedName~NyxIdRelayTransportTests|FullyQualifiedName~NyxIdRelayOutboundPortTests|FullyQualifiedName~NyxIdRelayConversationTypeMapTests"

Notes

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 906698b160

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agents/Aevatar.GAgents.ChannelRuntime/ChannelContextMiddleware.cs
Comment thread agents/Aevatar.GAgents.NyxidChat/ServiceCollectionExtensions.cs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.72%. Comparing base (4f970ed) to head (87933e3).
⚠️ Report is 6 commits behind head on dev.

Files with missing lines Patch % Lines
...tar.AI.ToolProviders.NyxId/Tools/NyxIdProxyTool.cs 89.47% 1 Missing and 1 partial ⚠️
@@            Coverage Diff             @@
##              dev     #346      +/-   ##
==========================================
- Coverage   69.73%   69.72%   -0.02%     
==========================================
  Files        1139     1139              
  Lines       81202    81215      +13     
  Branches    10629    10634       +5     
==========================================
- Hits        56630    56628       -2     
- Misses      20470    20482      +12     
- Partials     4102     4105       +3     
Flag Coverage Δ
ci 69.72% <90.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...c/Aevatar.AI.ToolProviders.NyxId/NyxIdApiClient.cs 61.72% <100.00%> (+2.06%) ⬆️
...dServices/ActorDispatchUserConfigCommandService.cs 90.19% <ø> (-0.55%) ⬇️
...tar.AI.ToolProviders.NyxId/Tools/NyxIdProxyTool.cs 15.42% <89.47%> (+6.70%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@eanzhao eanzhao force-pushed the refactor/2026-04-23_issue-328-unified-channel-inbound branch from 906698b to 614647a Compare April 23, 2026 10:42
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fully closes #328 on the latest head.

Two issue-level commitments are still missing:

  1. relay reply credentials are not actually threaded through the typed backbone; they now live in a process-local messageId -> token cache, which violates the repo's ban on middle-layer in-memory fact maps and makes deferred replies non-durable across restart/scale-out;
  2. the relay auth boundary still does not implement the promised replay guard, and its HMAC verification does not match NyxID's current callback contract.

There are also still unresolved runtime issues in the existing Codex comments (DI option registration and prompt injection), so I don't think Closes #328 is accurate yet.

Comment thread agents/Aevatar.GAgents.ChannelRuntime/ChannelCallbackCredentialCache.cs Outdated
Comment thread agents/channels/Aevatar.GAgents.Channel.NyxIdRelay/NyxIdRelayAuthValidator.cs Outdated
Comment thread agents/channels/Aevatar.GAgents.Channel.NyxIdRelay/NyxIdRelayAuthValidator.cs Outdated
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more blocking issue on the latest head: the deferred-reply side effect now happens before the request event is committed, so the flow is still not durable/ordered in the way #328 described.

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

评审:#346 vs #328 实施完成度 + 架构合规

读完 diff、protos 和关键路径后,主线(统一 inbound、ConversationGAgent 单一事实源、ChatActivity 唯一契约、Lark legacy 退出)已经达成。下面是承诺核对 + 几个真实的架构问题。

#328 已落地

  • Phase 0Channel.Lark → Platform.Larkagents/platforms/aevatar.channels.slnf / aevatar.platforms.slnf 分流
  • Phase 1 契约
    • ChatActivity.TransportExtras(nyx_message_id / nyx_agent_api_key_id / nyx_platform / nyx_conversation_id)
    • OutboundDeliveryContext(reply_message_id / reply_access_token)
    • NeedsLlmReplyEvent / LlmReplyReadyEvent
    • ChannelBotRegistrationnyx_channel_bot_id / nyx_agent_api_key_id / nyx_conversation_route_id + IChannelBotRegistrationQueryByNyxIdentityPort
  • Phase 2Aevatar.GAgents.Channel.NyxIdRelay project(Transport / OutboundPort / CallbackPayload / ConversationTypeMap / AuthValidator with JWT+HMAC+message-id+OIDC/JWKS cache & key-miss refresh)
  • Phase 3:runner 改名 ChannelConversationTurnRunnerResolveConversationChatType exhaustive 覆盖 4 个 ConversationScope
  • Phase 4HandleRelayWebhookAsync 改造成 auth → parse → dispatch → 202
  • Phase 5LarkChannelAdapter.cs (-804) / LarkStreamingHandle.cs (-117) / LarkChannelServiceCollectionExtensions.cs (-32) / LarkWebhookRequest/Response.cs 删除
  • CI 门禁channel_relay_nyx_chat_direct_create_guard.sh 已挂入 architecture_guards.sh
  • ADR-0013 + canon 更新

⚠️ 架构 / 合规问题

1. 【严重】中间层进程内字典作为事实源 — 直接违反 CLAUDE.md "中间层状态约束(强制)"

agents/Aevatar.GAgents.ChannelRuntime/ChannelCallbackCredentialCache.cs:6IMemoryCache 维护 correlationId → (UserAccessToken, ReplyToken) 映射,被 runner(ChannelConversationTurnRunner.cs:233 ResolveReplyToken)和 LLM inbox(ChannelLlmReplyInboxRuntime.cs:154 Peek)当 outbound 凭证事实源消费。一旦 cache miss outbound 直接 hard-fail(NyxIdRelayOutboundPort.cs:35-40 missing_reply_access_token)。

CLAUDE.md 明文:

禁止中间层维护 entity/actor/workflow-run/session 等 ID → 上下文/事实状态的进程内映射(Dictionary<>/ConcurrentDictionary<>/HashSet<>/Queue<>)。
InMemory 实现仅限开发/测试,不外溢到中间层业务语义。

NyxID #469 已经协商好"零持久化 + per-process cache"的安全 trade-off,但本仓 CLAUDE.md / ADR-0013 没有显式 carve-out。建议二选一:

  • ADR-0013 里把 "短寿命 reply credential 走 process-local cache、跨节点 best-effort、进程重启=丢失 in-flight reply" 写明(含 multi-replica 时 callback 与 LLM 异步发生在不同节点的失败模型);
  • 或在 CLAUDE.md "中间层状态约束" 节加显式例外("短寿命凭证、不进 event log / state / DB 时允许 process-local cache,TTL ≤ token exp")。

否则 arch-audit / refactor-team 后面会一直 flag 这块。

2. 【高】Cache TTL (30 min) 与 reply-token JWT exp (15 min) 不对齐 + 失败路径不消费

ChannelCallbackCredentialCache.cs:8 DefaultTtl = TimeSpan.FromMinutes(30),NyxID #469 reply-token exp = 15 min。中间 15 min 缓存里的 token 已过期但仍会被 outbound 取出投递,制造 401 噪音。

ConsumeCallbackCredentials 只在 outbound 成功路径被调(ChannelConversationTurnRunner.cs:245, 591),任意失败 entry 卡到自然过期。NyxID #469 明文 "Single-use consume(dispatcher 用完 immediately remove)"。

建议:TTL 默认对齐 token exp 且可配置;Take 时机改为 "发送动作完成后" 而非仅成功路径。

3. 【中】Phase 3 实现偏差:承诺 LlmReplyGeneratorActor,落地的是 IHostedService + 静态 stream id 订阅

ChannelLlmReplyInboxRuntime.cs:13 是单例 IHostedService,订阅一个静态字符串 "channel-runtime:llm-reply:inbox"不是 actor。issue 原话:

LlmReplyGeneratorActor 订阅 NeedsLlmReplyEvent,内部调 NyxIdChatGAgent,产出 LlmReplyReadyEvent

本仓事实模式(LarkConversationInboxRuntime 同款),所以不是本 PR 独创,但要 doc 一下:actor 边界 / 单线程不变量 / event-sourced 重放语义在这条路径上都不成立。IStreamProvider 默认实现是 InMemoryStreamProvidersrc/Aevatar.Foundation.Runtime/Streaming/InMemoryStreamProvider.cs:15)— 多节点 / 主备切换 / host 重启都会丢 in-flight reply。

4. 【中】ConversationGAgent.HandleInboundActivityAsync 双写非原子

agents/Aevatar.GAgents.Channel.Runtime/Conversation/ConversationGAgent.cs:75-77

await PersistDomainEventAsync(result.LlmReplyRequest);    // → event store
await EnqueueDeferredLlmReplyAsync(result.LlmReplyRequest); // → inbox stream

两步不在一个事务里。actor crash 在中间或 enqueue 抛异常被 EnqueueDeferredLlmReplyAsync 静默 catch + log + return(ConversationGAgent.cs:287-294),committed 事件不会重放进 inbox 触发 reply。CLAUDE.md 强制:

committed domain event 必须可观察:write-side 完成 committed event 后必须送入 projection 主链

正确形状应是:persist NeedsLlmReplyEvent → projection / event-stream subscriber 消费 committed event → 派发 LLM 调用 → 产出 LlmReplyReadyEvent。否则进程崩溃 / 重启 / 网络抖动都会卡死一次 LLM reply。

最低限度:(a) enqueue 失败时 emit ConversationContinueFailedEvent + retry policy;(b) 启动时扫描未完成 NeedsLlmReplyEvent → 重新 enqueue。

5. 【低】agents/Aevatar.GAgents.NyxidChat/NyxIdRelayOptions.cs:1-7 是空 back-compat 壳 — 违反 "删除优先"

public sealed class NyxIdRelayOptions : Aevatar.GAgents.Channel.NyxIdRelay.NyxIdRelayOptions;

注释直接写 "Back-compat alias"。CLAUDE.md "空转发、重复抽象、无业务价值代码直接删除,不保留兼容空壳"。consumer 改 using 即可。

6. 【低】issue §B "relay 相关代码完全搬离 NyxidChat project" 未完成

NyxidChat 仍持有:NyxIdChatEndpoints.Relay.cs (218 lines), NyxIdRelayOptions.cs, NyxIdRelayModels.cs(含 RelayMessage 旧契约), NyxIdRelayPromptConfiguration.cs, NyxIdRelayReplies.cs, NyxIdRelayWorkflowCards.cs, Relay/INyxRelayBridgeIdempotencyGuard.cs

endpoint 留 NyxidChat 合理(route 注册位置是 host 化选择),但 NyxIdChatEndpoints.Relay.cs:50-51 同时反序列化 NyxIdRelayCallbackPayload(新)和 RelayMessage(旧)只为 workflow card action 短路用 + BuildRelayMessage (line 179-215) 反向转换;应在 Channel.NyxIdRelay 里给 workflow card action 一条独立 typed contract,把 RelayMessage 类族删掉。

7. 【低】HTTP 层直接 IActorRuntime.CreateAsync<ConversationGAgent> + actor.HandleEventAsync,未走 IActorDispatchPort

agents/Aevatar.GAgents.NyxidChat/NyxIdChatEndpoints.Relay.cs:134, 146。CLAUDE.md "传输载体可替换:上层依赖投递契约(IActorDispatchPort),不依赖具体载体"。Studio 端已经走 dispatch port,channel relay 这条入口反而绕过;建议对齐。

8. 【低】NyxIdRelayAuthValidator 自称 "JWT + HMAC + replay 三重校验",实际 replay 只查 message-id header 存在 + 与 body 匹配,真正 replay-dedup 在 ConversationGAgent.ProcessedMessageIds

架构上单一 dedup 点是对的,但命名 / 注释要诚实:boundary 不做 replay 拦截,靠 actor 串行 + processed-id 去重。doc 写清楚就行,免得后人误读。


总结

ConversationGAgent 一统主干、ChatActivity 唯一契约、Lark legacy 退出 — 这三条核心目标是稳的。但 #1(cache 作事实源)+ #4(双写非原子)是真实的多节点 / crash-recovery 隐患,#3 偏差 + #2 TTL 漂移会放大 #1 的影响面。建议:

  • 至少把 #1 / #3 写进 ADR-0013 "已知偏差 / 后续工作",明确 multi-replica 部署的失败模型;
  • #4 在合并前修(很小,让 enqueue 失败 → emit ConversationContinueFailedEvent 走 actor 的失败路径);
  • #2 / #5 / #6 / #7 / #8 起 follow-up issue 跟踪。

#1 不修也至少要让 arch-audit 显式 ack,避免下次审计 noise。

@eanzhao eanzhao force-pushed the refactor/2026-04-23_issue-328-unified-channel-inbound branch from 614647a to 7749126 Compare April 23, 2026 11:27
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the earlier blocking issues look fixed on the latest head. I found one new credential-boundary regression that I still consider blocking, so I’m leaving it inline.

Comment thread agents/Aevatar.GAgents.ChannelRuntime/channel_runtime_messages.proto Outdated
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up 评审:fix 复盘

读了 force-push 之后的两个 commit(1954c6fc + 77491265),分门别类讲一下。

✅ 已修:高/严重项

  • #1 ChannelCallbackCredentialCache 删了IChannelCallbackCredentialCache.cs / ChannelCallbackCredentialCache.cs 删除,凭证流改走 TransportExtras.NyxUserAccessToken + OutboundDeliveryContext.ReplyAccessTokenchat_activity.proto:278, 264)。中间层进程内字典作为事实源这个 CLAUDE.md 直接违规消除。

  • #4 双写非原子修得很漂亮 — 真正按 actor 模型重写:

    • 新 state 字段 pending_llm_reply_requestsconversation_state.proto:16)持有未完成 LLM 请求;
    • 新事件 DeferredLlmReplyDispatchRequestedEventconversation_events.proto:51-54)作为 self-durable timeout payload;
    • HandleInboundActivityAsync:85-89 persist + ScheduleSelfDurableTimeoutAsync 入 inbox;
    • HandleDeferredLlmReplyDispatchRequestedAsync:134-176 消费 timeout,inbox 不可用 / enqueue 失败时自动 reschedule(DeferredLlmDispatchRetryDelay = 5s);
    • OnActivateAsync:40-44 + SchedulePendingLlmReplyDispatchesAsync:360-369 在 actor 重激活时扫 pending 重排;
    • HandleLlmReplyReadyAsync 失败且可重试时也走 ScheduleDeferredLlmReplyDispatchAsync 重排(ConversationGAgent.cs:232-242);
    • State 在 ApplyTurnCompleted / ApplyContinueFailed 终态时 RemovePendingLlmReplyRequest 清理(lines 402, 463)。

    这是 CLAUDE.md "self continuation 事件化" + "single-thread fact source" 的标准形状。多节点 / 进程崩溃 / 重启都能 cover。

  • #8 replay guard 真做了 — 新 INyxIdRelayReplayGuard + NyxIdRelayReplayGuardagents/channels/Aevatar.GAgents.Channel.NyxIdRelay/NyxIdRelayReplayGuard.cs:10-51),ConcurrentDictionary + sweep + window;validator ValidateAsync 做了 JWT → registration credential resolve → HMAC(per-registration secret,不再用全局 _relayOptions.HmacSecret 兜底优先级正确)→ timestamp window → replay claim 五重校验,命名和实现一致。HMAC secret 走 INyxIdRelayRegistrationCredentialResolver 从 projection 拿 NyxAgentApiKeyHash,把"全局共享 secret"那个隐患也顺手收掉。

  • #2 TTL 漂移:cache 没了,议题自然 moot。

⚠️ 新增问题:凭证持久化与 NyxID #469 Aevatar 侧承诺冲突

Cache 改成 TransportExtras 字段流转之后,凭证现在落进了 event store 和 actor state

  • TransportExtras.NyxUserAccessToken(用户 JWT)→ ChatActivityNeedsLlmReplyEvent.activity(持久化领域事件)→ ConversationGAgentState.PendingLlmReplyRequests(持久化 actor state)。即使 turn 完成后 state 里清掉了,event store 里的 NeedsLlmReplyEvent 永久保留 JWT。
  • OutboundDeliveryContext.ReplyAccessToken(reply_token)→ 同链路;额外ChannelConversationTurnRunner.cs:243-247 进入 ConversationTurnCompletedEvent.outbound_deliveryconversation_events.proto:21)→ projection → 可能进 Elasticsearch。

NyxID #469 提案里 Aevatar 侧的明示承诺:

零持久化:token 不进任何 grain state / event log / DB / secret store / 文件 / config
Orleans event envelope 只含 correlation_id,不带 token,token 放 process-local dict,用 correlation_id 配对

当前实现直接违反这两条。前一版 IMemoryCache 方案是为这两条承诺存在的,现在为了换"多节点可工作 + survive restart"把承诺扔了。

reply_token 短寿(15 min)+ 单用,影响窗口可控;但 user JWT 是用户身份 token,不是 per-callback bound token,进入事件存储就是潜在 audit / 数据泄露面。

建议(按代价排序):

  1. 最低成本ConversationTurnCompletedEvent.outbound_delivery.reply_access_token 不应该进完成事件 — completed 事件只需要 reply_message_id(审计用),token 是"为了发出 reply 的临时凭证",发完就没用,不该进 audit log / projection。在 ConversationGAgent 里持久化 completed 事件前 strip 一下 ReplyAccessToken(或者给 completed 事件用一个不含 token 的 OutboundReceipt 类型)。这条几行改动就能少一份 token 流到 projection。
  2. 中等成本NeedsLlmReplyEvent 是为了 actor 重激活后能重排 LLM 调用。但 LLM 调用真正需要的是 NyxId access token;reply 是后续步骤,token 放在 NeedsLlmReplyEvent 里给 LLM 用,事实上是在 actor 状态里持有用户 JWT 直到 LLM 调用完成。可以考虑把 NyxIdAccessToken 从 NeedsLlmReplyEvent.activity.transport_extras 里拆出来,单独放 NeedsLlmReplyEvent.short_lived_credentials(不进 ApplyLlmReplyRequested 的 state copy,仅用于一次 inbox dispatch)。
  3. 最大成本:完全回到 NyxID #469 的"零持久化"形态:actor 只存 correlation_id + 业务事实,凭证另存 process-local(接受 multi-replica / restart 失 reply 的代价)。

至少要在 ADR-0013 里把这条 trade-off 显式写明("Aevatar 选择把短寿 callback 凭证持久化到 event store / actor state,换取 multi-replica 可达性,代价是和 NyxID #469 的零持久化承诺出入")。

❌ 未修:低优项(你可能就当 follow-up 了)

下面这几条在我上一轮 review 里都列了,本次 fix 没动:

  • #5 agents/Aevatar.GAgents.NyxidChat/NyxIdRelayOptions.cs 仍是空 back-compat 壳,注释 "Back-compat alias" 一字未改。CLAUDE.md "删除优先" 仍违反。
  • #6 NyxidChat 里 NyxIdChatEndpoints.Relay.csNyxIdRelayModels.cs(含 RelayMessage)、NyxIdRelayPromptConfiguration.csNyxIdRelayReplies.csNyxIdRelayWorkflowCards.csRelay/ 还在;endpoint NyxIdChatEndpoints.Relay.cs:49-50 仍同时反序列化 NyxIdRelayCallbackPayload + RelayMessageBuildRelayMessage 反向转换还在。
  • #7 endpoint NyxIdChatEndpoints.Relay.cs:129, 141actorRuntime.CreateAsync<ConversationGAgent> + actor.HandleEventAsync 直调,未走 IActorDispatchPort
  • #3 LLM reply 仍是 IHostedService (ChannelLlmReplyInboxRuntime),不是 LlmReplyGeneratorActor:现在 actor 侧 pending state + durable timeout 已经接住"事实源唯一" + "重激活恢复",hosted service 退化为无状态 worker,工程上可以接受。可以在 ADR-0013 里说明这个 reduce-to-stateless-worker 的设计意图,免得后人疑惑。

总结

合并门槛已经清掉:#1 / #4 / #8 真修了,#2 议题消除。

唯一一条建议合并前再处理ConversationTurnCompletedEvent.outbound_delivery 不要带 reply_access_token 进 projection — 这是个几行改的 net-positive,能立即缩小 token blast radius。其他凭证持久化细节起一个 follow-up issue 跟踪 + ADR-0013 加一段就够。

#5 / #6 / #7 / #3 是低优 follow-up。

Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 23, 2026

Addressed the latest follow-up review in ad238685.

  • Moved the Nyx relay callback HMAC material out of ChannelBotRegistration proto/state/readmodel/query payloads and into credential_ref + IAevatarSecretsStore; resolver now does late secret lookup via the stored ref.
  • Narrowed ConversationTurnCompletedEvent.outbound_delivery to a non-secret OutboundDeliveryReceipt so reply_access_token no longer flows into committed events / projection / audit surfaces.
  • Added targeted coverage for provisioning, registration projection/query mapping, resolver late lookup, and completed-event receipt persistence.

Validation run:

  • dotnet build aevatar.slnx --nologo --no-restore -m:1
  • dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo --no-restore --filter "FullyQualifiedName~NyxLarkProvisioningServiceTests|FullyQualifiedName~NyxIdRelayRegistrationCredentialResolverTests|FullyQualifiedName~ChannelBotRegistrationProjectorTests|FullyQualifiedName~ChannelBotRegistrationGAgentTests|FullyQualifiedName~RegistrationQueryPortTests|FullyQualifiedName~ServiceCollectionExtensionsTests"
  • dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/Aevatar.GAgents.Channel.Protocol.Tests.csproj --nologo --no-build --filter "FullyQualifiedName~ConversationGAgentDedupTests|FullyQualifiedName~ChannelRuntimeProtoTests"
  • bash tools/ci/test_stability_guards.sh
  • bash tools/ci/query_projection_priming_guard.sh
  • bash tools/ci/projection_state_version_guard.sh
  • bash tools/ci/projection_state_mirror_current_state_guard.sh
  • bash tools/ci/architecture_guards.sh

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 23, 2026

Review: PR 落地度 + 架构违规

整体 Phase 0–5 承诺 基本都落地了(契约 / 三层 auth / 事件驱动 LLM reply / endpoint shim / CI 门禁 / ADR + supersede 都在),但有 3 条违反 Issue #328 Phase 0 premise 和 CLAUDE.md 架构规则 的问题必须处理,其中第 1 条是硬违规。


🔴 1. Channel.NyxIdRelay 编译期依赖 Platform.Lark,outbound 硬编码只走 Lark(blocker)

证据

agents/channels/Aevatar.GAgents.Channel.NyxIdRelay/Aevatar.GAgents.Channel.NyxIdRelay.csproj:17

```xml
<ProjectReference Include="....\platforms\Aevatar.GAgents.Platform.Lark\Aevatar.GAgents.Platform.Lark.csproj" />
```

agents/channels/Aevatar.GAgents.Channel.NyxIdRelay/NyxIdRelayOutboundPort.cs:3,11,77-96

```csharp
using Aevatar.GAgents.Platform.Lark;
// …
private readonly LarkMessageComposer? _larkComposer;
// …
private string ComposeReplyText(string platform, ConversationReference conversation, MessageContent content)
{
var normalizedPlatform = …;
if (normalizedPlatform == "lark" && _larkComposer is not null)
{
var composed = _larkComposer.Compose(content, new ComposeContext { … });
return composed.PlainText ?? string.Empty;
}
return content.Text ?? string.Empty;
}
```

违反的承诺/规则

  1. Issue 架构重构:统一 channel inbound 主干,Nyx relay 降级为 transport adapter #328 Phase 0 直接 premise:"Channel(传输)" vs "Platform(平台)" 两个正交 concern,目录和命名空间必须 self-evident。现在 channels/Channel.NyxIdRelay 在编译期 reverse-dependplatforms/Platform.Lark,恰好把 Phase 0 辛辛苦苦拆出来的方向反向拉回去。
  2. Issue 架构重构:统一 channel inbound 主干,Nyx relay 降级为 transport adapter #328 §A Phase 2 spec 原文:"NyxIdRelayOutboundPort.csIChannelOutboundPortSendAsyncPOST /api/v1/channel-relay/reply { message_id, content }内部 dispatch 到 per-platform IMessageComposer(按 ChatActivity.Platform"。实现没走 IMessageComposer,直接取了具体类型 LarkMessageComposer,写成 if == \"lark\"
  3. CLAUDE.md "依赖反转:上层依赖抽象,禁止跨层反向依赖和对具体实现的直接耦合"NyxIdRelayOutboundPort 依赖具体 LarkMessageComposer,不是抽象。
  4. CLAUDE.md "项目名=命名空间=目录语义"Channel.* 名字承载的是 transport 不绑平台的语义,一旦 csproj 里引用 Platform.Lark,语义就碎了。
  5. Issue 架构重构:统一 channel inbound 主干,Nyx relay 降级为 transport adapter #328 结论原文:"Aevatar 侧的 relay transport 必须是 channel-neutral 的,不能和 Lark 绑死"——这条就是立 issue 的核心动机。

现在的运行时后果

  • 任何非 lark 的 payload.platform(Telegram / Slack / Discord / Feishu / WeChat)走 SendAsync 时静默掉进 return content.Text ?? string.Empty——没有平台 composer,没有 streaming/mentions/markup,也没错出来。
  • 新增平台(即 Issue 架构重构:统一 channel inbound 主干,Nyx relay 降级为 transport adapter #328 明确列出的 Telegram/Slack/Discord)必须改 Channel.NyxIdRelay.csproj、改 ctor、改 if 链——Phase 0 的 "复制 platforms/Platform.Lark/ 改成 Telegram 就对了" 根本做不到。

修正方向(二选一,我偏向 A)

  • A. 走 IMessageComposer(issue 原 spec)NyxIdRelayOutboundPort 注入 IEnumerable<IMessageComposer>(或 IReadOnlyDictionary<string, IMessageComposer>),按 platform lookup;Platform.Lark 注册 LarkMessageComposer : IMessageComposer。csproj 去掉 Platform.Lark 引用。仓库里 IMessageComposer 抽象已经在 agents/Aevatar.GAgents.Channel.Abstractions/Composition/IMessageComposer.cs 存在,不用新建。
  • B. 走 IPlatformAdapter:已有 IEnumerable<IPlatformAdapter> 抽象和 LarkPlatformAdapter 注册(见下方 issue 2),扩它支持 compose 或新增一个方法。

无论走哪条,csproj 里 Channel.NyxIdRelay → Platform.Lark 的 ProjectReference 必须删掉,这是门禁级约束。

建议顺手加一条 CI 守卫:agents/channels/*.csproj 不得 ProjectReference 到 agents/platforms/*


🟡 2. LarkPlatformAdapter 是空壳 compatibility shell,但没删

证据agents/Aevatar.GAgents.ChannelRuntime/Adapters/LarkPlatformAdapter.cs:10-14

```csharp
///


/// Legacy direct Lark adapter kept only as a compatibility shell.
/// ChannelRuntime no longer accepts direct Lark callbacks or sends direct
/// platform replies; supported production traffic goes through Nyx relay.
///

public sealed class LarkPlatformAdapter : IPlatformAdapter
```

  • TryHandleVerificationAsync → 永远返回 410 Gone
  • ParseInboundAsync → 永远返回 null
  • SendReplyAsync → 永远返回 Succeeded=false, \"lark_direct_platform_reply_retired\", Permanent

ServiceCollectionExtensions.cs:172 仍注册:
```csharp
services.TryAddEnumerable(ServiceDescriptor.Singleton<IPlatformAdapter, LarkPlatformAdapter>());
```

违反

  • CLAUDE.md "删除优先:空转发、重复抽象、无业务价值代码直接删除,不保留兼容空壳"
  • CLAUDE.md "单一主干,插件扩展":现在实际的平台 dispatch 有两条半成品路径——dead 的 IPlatformAdapter.SendReplyAsync + 新的 NyxIdRelayOutboundPort.ComposeReplyText 里的 hardcoded lark。恰好应该合成一条。

建议:删掉 LarkPlatformAdapter 这个 adapter 和它的注册;如果 IPlatformAdapter 除此以外没别的实现,把抽象也删掉(或者改用 issue 1 的修复方案把它改成真正的 compose 接口,让 NyxIdRelayOutboundPort 使用它,替代 if == \"lark\")。


🟡 3. HandleRegisterAsync 仍硬编码 "只允许 lark"

证据agents/Aevatar.GAgents.ChannelRuntime/ChannelCallbackEndpoints.cs:78-85

```csharp
var platformNormalized = request.Platform.Trim().ToLowerInvariant();
if (!string.Equals(platformNormalized, "lark", StringComparison.OrdinalIgnoreCase))
{
return Results.Conflict(new
{
error = $"Platform '{platformNormalized}' is not in the supported production contract. ChannelRuntime currently provisions only Lark via Nyx relay ingress.",
});
}
```

违反:Issue #328 Phase 0 结论 "Aevatar 侧的 relay transport 必须是 channel-neutral 的" + 设计表格列出的 5 个平台(Lark / Telegram / Slack / Discord / WeChat)。现在 inbound 主干解耦了但 registration 这扇前门仍只让 Lark 进,Phase 0 的解耦对新平台来说只完成了一半。

建议:把白名单改成 NyxID relay 支持的平台集合(从 NyxIdRelayConversationTypeMap 支持的 platform 集合派生,或留给 INyxLarkProvisioningService 泛化成 INyxChannelBotProvisioningService),或者至少把拒绝文案从 hardcoded 改成配置驱动。


✅ 其它都对得上 Issue 承诺

  • ChatActivity.TransportExtras / OutboundDeliveryContext proto 齐全(chat_activity.proto:260-265, 268-279, 306-308
  • ChannelBotRegistration 加了 nyx_channel_bot_id / nyx_agent_api_key_id + IChannelBotRegistrationQueryByNyxIdentityPort,projector + QueryPort 都接上
  • NeedsLlmReplyEvent / LlmReplyReadyEvent proto + 事件驱动三段式:ConversationGAgent.PersistDomainEventAsync(result.LlmReplyRequest) (line 85) → self-continuation DeferredLlmReplyDispatchRequestedEventChannelLlmReplyInboxRuntime.ProcessAsyncactor.HandleEventAsync(LlmReplyReadyEvent)HandleLlmReplyReadyAsyncProcessedCommandIds 幂等。这条链不是 "ambient TCS 搬家",是正经的 committed-event + 幂等 handler 模式 —— 过。
  • NyxIdRelayAuthValidator 三层都在:JWT (ValidateJwtAsync) + HMAC (ValidateSignatureX-NyxID-Signature) + Replay (_replayGuard.TryClaimX-NyxID-Message-Id)
  • ChannelConversationTurnRunner.ResolveConversationChatType exhaustive 覆盖 DirectMessage/Group/Channel/Thread,不再把非 Group 都压成 p2p
  • NyxRelayAgentBuilderFlow.TryResolve 正确处理 slash 短路(非 / 开头 fall through,未知 /xxx 短路 help)
  • AgentBuilderTool.ResolveDailyReportGithubUsernameAsync 三层 fallback 都在(explicit / UserConfig / NyxID GET /user),credentials_required 路径清晰
  • CI 守卫 channel_relay_nyx_chat_direct_create_guard.sh 已接入 architecture_guards.sh
  • ADR-0013 + ADR-0011 supersede + canon doc 更新到位

建议

先处理 issue 1(csproj 反向依赖 + 硬编码 lark)再合。issue 2/3 可以合并处理或跟进 PR 追上——但 issue 1 的 Channel.NyxIdRelay → Platform.Lark 是门禁级结构违规,合进去之后再拆会拖很大。

@eanzhao eanzhao merged commit 565f150 into dev Apr 23, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

架构重构:统一 channel inbound 主干,Nyx relay 降级为 transport adapter

1 participant