Skip to content

[Channel RFC] Implement Lark channel adapter#288

Merged
eanzhao merged 29 commits intodevfrom
feat/2026-04-21_issue-261-lark-channel-adapter
Apr 22, 2026
Merged

[Channel RFC] Implement Lark channel adapter#288
eanzhao merged 29 commits intodevfrom
feat/2026-04-21_issue-261-lark-channel-adapter

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 21, 2026

Summary

Implements the Lark adapter baseline from #261, then carries that branch forward through the Lark Nyx relay migration and typed-tool work from #296 Phase 0-2 and #297.

Closes #261.
Closes #296.
Closes #297.
Closes #298.
Closes #299.
Closes #300.
Closes #301.
Closes #302.
Closes #303.
Closes #304.

Follow-up: #308.

Problem

#261 landed the new Lark adapter surface, but the repository still had a split Lark runtime contract:

  • direct Lark -> Aevatar callback ingress still existed
  • Lark turn replies still depended on older Nyx proxy / persisted-session assumptions
  • channel registration state/readmodels still carried Lark credential-bearing fields
  • shipped Lark card-action flows still depended on card.action.trigger
  • Aevatar lacked a typed Nyx-backed Lark tool surface for proactive messaging/business actions

That left the codebase between the old direct-callback runtime and the Nyx-relay target defined later in #296.

Solution

This PR now lands the full Lark production contract through #296 Phase 0-2:

  • lock Lark production ingress to Lark -> NyxID -> Aevatar
  • add ADR/runbook coverage for the Nyx relay webhook contract
  • add Nyx-backed Lark provisioning that persists only non-secret Nyx handles locally
  • harden /api/webhooks/nyxid-relay with Nyx OIDC/JWKS validation and durable 202 ingress
  • switch inbound-triggered Lark turn replies to Nyx channel-relay/reply
  • remove credential-bearing fields from the Lark public registration readmodel/runtime contract
  • retire the direct Aevatar-side Lark callback path as a supported runtime contract
  • migrate shipped Lark card-action production flows to text / deep-link / open_url patterns
  • add Nyx-backed typed Lark tools for proactive send, chat lookup, sheets append, and approval actions

What Changed

Lark runtime and cutover

  • added the new Aevatar.GAgents.Channel.Lark adapter package and tests from #261
  • added Nyx relay ingress handling at /api/webhooks/nyxid-relay
  • added Nyx relay JWT validation via OIDC discovery / JWKS
  • moved Lark turn replies to Nyx channel-relay/reply
  • retired direct Aevatar-side Lark callback handling and removed rollback-window semantics
  • updated cutover ADR/runbook to treat direct Lark callback as retired, not fallback

Provisioning and registration model

  • added Nyx-backed Lark provisioning flow that creates Nyx bot / API key / route and mirrors only non-secret handles locally
  • cleaned the public Lark registration readmodel so it carries only non-secret identifiers/status fields
  • split direct-callback secret material into runtime-only direct-callback binding documents and stopped projecting that path for Lark
  • kept Telegram/direct-callback follow-up cleanup out of this PR and tracked it in #308

Lark interaction behavior

  • migrated social_media / approval-style Lark flows away from card.action.trigger
  • changed the supported Lark interaction model on the Nyx relay path to text / open_url / deep-link patterns

Typed Lark tools

  • added Aevatar.AI.ToolProviders.Lark
  • added Nyx-backed typed tools for:
    • lark_messages_send
    • lark_chats_lookup
    • lark_sheets_append_rows
    • lark_approvals_list
    • lark_approvals_act
  • updated prompt/tool-selection paths to prefer typed Lark tools over generic proxy execution where relevant

Naming / cleanup

  • renamed shared legacy direct-binding code to neutral direct-callback naming because Telegram still uses that shape
  • removed Lark-specific legacy / rollback wording from code and docs

Impact

  • Lark production topology is now Nyx relay based instead of direct Aevatar callback based
  • Aevatar no longer owns Lark credential persistence for the production path
  • public channel registration readmodels no longer expose Lark secret-bearing fields
  • direct Aevatar-side Lark callback is no longer part of the supported production contract
  • shipped Lark production flows no longer rely on card.action.trigger
  • Aevatar now has a typed Nyx-backed Lark tool surface for proactive/business operations beyond the turn-reply path

Validation

  • dotnet build agents/channels/Aevatar.GAgents.Channel.Lark/Aevatar.GAgents.Channel.Lark.csproj --nologo
  • dotnet test test/Aevatar.GAgents.Channel.Lark.Tests/Aevatar.GAgents.Channel.Lark.Tests.csproj --nologo
  • dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo
  • dotnet test test/Aevatar.AI.Tests/Aevatar.AI.Tests.csproj --nologo --filter "NyxIdChatEndpointsCoverageTests|NyxRelayJwtValidatorTests|NyxIdRelayJwtTests|NyxIdRelayAndPairingTests"
  • dotnet build src/Aevatar.AI.ToolProviders.Lark/Aevatar.AI.ToolProviders.Lark.csproj --nologo
  • dotnet test test/Aevatar.AI.ToolProviders.Lark.Tests/Aevatar.AI.ToolProviders.Lark.Tests.csproj --nologo
  • bash tools/ci/architecture_guards.sh
  • bash tools/ci/query_projection_priming_guard.sh
  • bash tools/ci/test_stability_guards.sh

Notes

  • This PR intentionally closes the Lark track through #296 Phase 0-2 and #297, but does not remove the remaining non-Lark channel-runtime credential ownership. That broader follow-up is tracked in #308.
  • Telegram migration and direct-callback credential ownership cleanup remain separate from this Lark-focused cutover.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 94.54390% with 64 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.75%. Comparing base (1a1be7f) to head (f99ae84).
⚠️ Report is 30 commits behind head on dev.

Files with missing lines Patch % Lines
...r.AI.ToolProviders.Lark/LarkProxyResponseParser.cs 92.75% 0 Missing and 15 partials ⚠️
...c/Aevatar.AI.ToolProviders.NyxId/NyxIdApiClient.cs 87.80% 8 Missing and 7 partials ⚠️
...I.ToolProviders.Lark/Tools/LarkMessagesSendTool.cs 89.00% 4 Missing and 7 partials ⚠️
....ToolProviders.Lark/Tools/LarkApprovalsListTool.cs 95.39% 3 Missing and 4 partials ⚠️
src/Aevatar.Mainnet.Host.Api/Program.cs 0.00% 4 Missing ⚠️
...olProviders.Lark/Tools/LarkSheetsAppendRowsTool.cs 96.20% 0 Missing and 3 partials ⚠️
...tar.AI.ToolProviders.Lark/LarkSheetsRangeHelper.cs 96.66% 0 Missing and 2 partials ⚠️
...I.ToolProviders.Lark/Tools/LarkApprovalsActTool.cs 97.87% 0 Missing and 2 partials ⚠️
....Implementations.Local/Actors/LocalActorRuntime.cs 88.23% 1 Missing and 1 partial ⚠️
src/Aevatar.AI.ToolProviders.Lark/LarkNyxClient.cs 99.29% 0 Missing and 1 partial ⚠️
... and 2 more
@@            Coverage Diff             @@
##              dev     #288      +/-   ##
==========================================
+ Coverage   69.34%   69.75%   +0.40%     
==========================================
  Files        1126     1139      +13     
  Lines       79998    81170    +1172     
  Branches    10464    10619     +155     
==========================================
+ Hits        55476    56620    +1144     
+ Misses      20470    20458      -12     
- Partials     4052     4092      +40     
Flag Coverage Δ
ci 69.75% <94.54%> (+0.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rc/Aevatar.AI.ToolProviders.Lark/ILarkNyxClient.cs 100.00% <100.00%> (ø)
...vatar.AI.ToolProviders.Lark/LarkAgentToolSource.cs 100.00% <100.00%> (ø)
...c/Aevatar.AI.ToolProviders.Lark/LarkToolOptions.cs 100.00% <100.00%> (ø)
....ToolProviders.Lark/ServiceCollectionExtensions.cs 100.00% <100.00%> (ø)
...vatar.Configuration/ServiceCollectionExtensions.cs 100.00% <100.00%> (ø)
src/Aevatar.AI.ToolProviders.Lark/LarkNyxClient.cs 99.29% <99.29%> (ø)
...AI.ToolProviders.Lark/Tools/LarkChatsLookupTool.cs 99.00% <99.00%> (ø)
...ar.Configuration/SecretsStoreCredentialProvider.cs 90.00% <90.00%> (ø)
...tar.AI.ToolProviders.Lark/LarkSheetsRangeHelper.cs 96.66% <96.66%> (ø)
...I.ToolProviders.Lark/Tools/LarkApprovalsActTool.cs 97.87% <97.87%> (ø)
... and 7 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 21, 2026

Review — Lark Channel Adapter

整体按 IChannelTransport / IChannelOutboundPort / IMessageComposer<T> 契约分层清晰,测试覆盖也比较到位(conformance、fault、webhook、composer)。不过跑完细节后有几处要在 draft 合并前落实的问题。

Blocking(建议修完再转 Ready)

1. 卡片动作 webhook 把原始 event JSON 塞进 Content.Text(LarkChannelAdapter.cs:339-359)

Content = new MessageContent { Text = eventObject.GetRawText(), ... }

这违反 CLAUDE.md 的 "API 字段单一语义":Text 语义是人类可读文本,不是 JSON 负载。后续消费方会把一坨 JSON 当成用户说的话进行意图识别 / 展示。卡片动作应该落在结构化字段(例如 ActionId + Value,或 activity 上的 typed sub-message),目前 abstractions 里的 ChatActivity 有没有 card-action 字段值得顺手确认,没有就 proto 加一个。

2. 卡片动作会话 scope 硬编码为 Group(LarkChannelAdapter.cs:347)

Conversation = ConversationReference.Create(..., ConversationScope.Group, null, \"group\", chatId)

来自 p2p 会话的按钮回调会被错标为 Group,canonical key 前缀也不对,下游去重 / 路由会错位。需要从 context 或 event 字段推断(至少回溯 open_message_id 对应的会话),或者在没有足够信号时显式失败,而不是默默标为 Group。

3. 流式回复按到达顺序拼接,不是按 sequence 顺序(LarkStreamingHandle.cs:27-35)

_deltas.Add(chunk.Delta ?? string.Empty);
await _adapter.UpdateAsync(..., BuildMessage(string.Concat(_deltas)), ...);

HashSet<long> 只保证重复 seq 被吞掉,不保证顺序。合约写的是 "monotonic chunk emitted by caller",但实现把这个不变量挂在调用方上,任何重试 / 缓冲 / 并发触发的乱序就会得到错位文本。改成 SortedDictionary<long,string> 或在拼接前按 seq 排序更安全,也更能对得上 AppendIdempotentBySequenceNumberAsync 这条 probe 的语义。

4. Webhook 签名没有时间戳窗口校验(LarkChannelAdapter.cs:502-516)
VerifySignature 只比对 sha256,不校 X-Lark-Request-Timestamp 新鲜度。任何泄漏过的 signed body 都能无限期重放。加一个容忍窗口(典型 ±5min)并用 DateTimeOffset.UtcNow 校验。

非 Blocking 但需要关注

5. HandleWebhookAsync 只在 encrypted 分支验签(LarkChannelAdapter.cs:187-198)
明文事件走 TokenMatches(header, \"token\") 分支,语义上 OK,但建议把 "encrypt_key 已配置但收到未加密 body" 视为非法场景直接 401,避免回退路径被当成降级通道滥用。

6. 入站只处理 message_type == \"text\"(LarkChannelAdapter.cs:518-537)
图片 / 文件 / 贴纸 / 富文本 全部静默丢弃。如果暂不支持也请在 Capabilities 与 draft 说明里显式写出来,避免上层误以为已经覆盖。

7. MessageDisposition.Ephemeral 静默降级(LarkChannelAdapter.cs:383-384 + LarkMessageComposer.cs:105-106)
Evaluate 返回 Degraded,但 SendCoreAsync 直接把 disposition 改成 Normal 然后按 Normal 发。调用方拿到 EmitResult.Sent(..., capability: Degraded) 无法感知 "你当初想发 ephemeral,我改成 Normal 广播了"。建议要么 EmitResult.Failed(\"ephemeral_unsupported\", ...),要么在 capability 上更显式地标注降级方式。

8. LarkPayloadRedactorcontent / form_value / value 整体 [redacted](LarkPayloadRedactor.cs:62-67)
调试时取到的 blob 几乎没用;而且 value 是很通用的字段名,很多非敏感数据也会被整体抹掉。考虑做更细的白名单 / 按路径 redact,或者至少保留结构只替换字符串叶子。

9. HttpClient / BaseAddress 的 DI 写法(LarkChannelAdapter.cs:47-50 + LarkChannelServiceCollectionExtensions.cs)

  • 默认直接 new HttpClient(),没经过 IHttpClientFactory,长期跑会有 DNS 刷新 / socket 泄漏的老问题。建议 services.AddHttpClient<LarkChannelAdapter>(c => c.BaseAddress = ...)
  • https://open.feishu.cn 写死,Lark 全球版是 open.larksuite.com;要跨区域就得配置化。

10. _botCredential 是 Initialize 时快照(LarkChannelAdapter.cs:31, 74-79)
Lark tenant access token 有过期时间。目前没有刷新路径,过期后所有 send 会 401。至少要在 ResolveBotCredentialAsync 失败 / 响应 99991663 等 code 时触发重解,不然生产会阵发性全崩。

11. MentionRegex(?:>) 是死组(LarkChannelAdapter.cs:16-18)
(?:>) 就是 >,多套了一层非捕获组;看上去像原本想写 (?:\\s+[^>]*)?> 之类的占位,建议要么去掉 (?:) 让 regex 干净,要么写完整的宽松匹配(属性顺序、额外 attr)。

12. 默认 truncation 按 char 切(LarkMessageComposer.cs:152-158)
碰到 emoji / 部分代理对会切坏。改成按 StringInfo 或至少判断代理对。

13. Capabilities 共享(LarkChannelAdapter.cs:21)
private static readonly ChannelCapabilities LarkCapabilities = LarkMessageComposer.DefaultCapabilities.Clone(); —— 所有 adapter 实例共用同一个 cloned 对象。如果未来 capabilities 按 bot 订阅 / feature flag 动态调整,会互相污染。ComposeContextFor 里每次都 Clone() 了,OK;但顶层字段仍建议每实例一份。

单测 & 脚手架

  • 流式 probe AppendIdempotentBySequenceNumberAsync 的 happy case 全部是按 seq 递增到达,没法暴露上面 MAF-Inspired Framework Improvements #3 的乱序 bug,建议新增一条 "seq=2 先来,seq=1 后到" 的用例,把顺序不变量钉死。
  • LarkChannelAdapterWebhookTestsHandleWebhookAsync_WhenRedactorThrows_FailsClosed 很好;再补一条 "签名过期时间戳" 的否定用例能把 Aevatar context database #4 封死。
  • card action 目前只在 ParseCardAction 里 implement 了,没有专门测试覆盖(conformance 没跑,webhook tests 没写),建议至少加一条 direct parse 的单测。

架构 / 文档

  • draft PR 里说 "not yet switch the legacy Aevatar.GAgents.ChannelRuntime host callback pipeline over to this new adapter path",这个切换的 follow-up issue / PR 建议在 description 里留个链接,避免这条 adapter 在仓库里长期 dangling。
  • agents/channels/ 这个新子目录没对应 README,agents/ 下其它模块默认走平铺,路径变化建议同步一下 docs/canon/ 里的组件拓扑图(如果有)。

整体基础扎实,主要是上面 4 条 blocking 的语义 / 安全问题需要先消化掉。Draft 合并转 Ready 前看到这几处修复就很好了。

Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 21, 2026

已按这轮 review feedback 更新,变更在 e11c42f7

本次先把 blocking 项收掉:

  • card.action.trigger 不再把原始 JSON 塞进 Content.Text,在 channel abstractions 里补了 typed CardActionSubmission
  • card action 会话不再硬编码 Group,现在按 callback context.chat_type 推断 p2p/group;缺少足够信号时直接 drop callback,不再静默误标
  • streaming 改成按 SequenceNumber 重组,新增了乱序 seq=2 -> seq=1 的回归用例
  • webhook 签名校验补了时间窗(±5min),并补了过期时间戳的否定测试

顺手一起收了几项低风险问题:

  • ChannelCapabilities 改成实例级,不再共享静态对象
  • mention regex 去掉无效组
  • truncation 改成按 StringInfo text element,避免切坏 surrogate pair / emoji
  • payload redactor 不再把通用 value/content 整体抹掉,只继续对 form_value 做 fail-safe redact
  • AddLarkChannel 改成走 IHttpClientFactory

补的测试主要有:

  • direct card-action parse / typed payload 覆盖
  • expired timestamp negative case
  • out-of-order streaming case
  • surrogate pair truncation case
  • proto roundtrip 覆盖新的 typed card-action 字段

本地已重新验证:

  • dotnet test test/Aevatar.GAgents.Channel.Lark.Tests/Aevatar.GAgents.Channel.Lark.Tests.csproj --nologo --tl:off
  • dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/Aevatar.GAgents.Channel.Protocol.Tests.csproj --nologo --tl:off
  • dotnet build aevatar.foundation.slnf --nologo --tl:off
  • dotnet test aevatar.foundation.slnf --nologo --tl:off --no-build
  • bash tools/ci/test_stability_guards.sh

还没在这次提交里一起展开的点主要是:encrypt_key 已配置时是否强制拒绝未加密 body、跨 region base address 配置化、bot credential refresh path、以及 ephemeral downgrade 语义是否继续收紧。这几项如果希望也在当前 draft 里一起收,我可以继续跟进。

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 21, 2026

Review follow-up — e11c42f7 Fix Lark review feedback

origin/dev 合并下来后在本地 build + 跑了 Aevatar.GAgents.Channel.Lark.Tests(69/69 通过),对着先前评论逐条核对。

Blocking — 全部修复 ✅

# Issue Fix
1 CardAction 把 event JSON 塞进 Content.Text 新增 CardActionSubmission proto(action_id / submitted_value / arguments / form_fields / source_message_id),Content.Text 不再被污染;新增 HandleWebhookAsync_CardAction_UsesTypedPayloadAndDirectMessageScope 覆盖
2 CardAction scope 硬编码 Group TryBuildCardActionConversationcontext.chat_type / open_chat_type / conversation_type 推断;p2p→DM,group→Group,缺失→带 warn log drop;新增 HandleWebhookAsync_CardActionWithoutChatType_DropsCallback 钉死行为
3 流式按到达顺序拼接 HashSet<long> + List<string>SortedDictionary<long,string>,idempotent check 改成 ContainsKeyAppendAsync_OutOfOrderSequence_ReassemblesBySequenceNumber 用 seq=2,1 顺序验证 "AB"
4 签名没时间戳窗口 5min SignatureValidityWindow + TryParseSignatureTimestamp 兼容秒/毫秒;HandleWebhookAsync_EncryptedPayloadWithExpiredTimestamp_Returns401 覆盖;fixture 的固定 timestamp 也跟着改成 UTC now

Non-blocking — 大部分修复

# Issue Status
8 Redactor 把 content / value 全砸 [redacted] ✅ 收窄到仅 form_valuecontent / value 保留;token / encrypt / email / phone / avatar_url 仍 remove,OK
9a new HttpClient() 不走 factory AddHttpClient(LarkChannelDefaults.HttpClientName) + IHttpClientFactory.CreateClient(...);同时把默认 BaseAddress 抽到 LarkChannelDefaults.DefaultBaseAddress
11 MentionRegex(?:>) 死组 ✅ 直接改成 >
12 Truncation 按 char 切 ✅ 改走 StringInfo.SubstringByTextElements;新 test Compose_WhenTextContainsSurrogatePair_DoesNotSplitTextElement("A🙂B" maxLen=2 → "A🙂")
13 Capabilities 静态共享 ✅ 改成 per-instance _capabilities,getter 再 .Clone() 一份出去

还没处理的(可选,不阻塞 Ready)

  1. 明文事件仅查 token,不过签名Refactor workflow execution lifecycle and align MessageId semantics #5)—— 仍是当前行为,按 Lark "只有加密模式才签" 的官方契约是 OK 的。如果将来支持 "encrypted 开关可关但仍要求签名" 的变种部署,再加一条 fail-closed 策略即可。
  2. message_type != \"text\" 的入站静默丢弃Refactor/core cqrs parallel subsystems #6)—— ExtractTextContent 仍只认 text,图/文件/post/sticker/rich text 不会冒泡。至少建议在 Capabilities.SupportsFiles = false 的 XML comment 或 README 里显式写明,避免上层误认为已覆盖。
  3. Ephemeral 静默降级到 NormalMore primitives #7)—— SendCoreAsync 仍然把 MessageDisposition.Ephemeral 改写为 Normal 继续发;EmitResult 只给 capability: Exact(Evaluate 的 Degraded 也没透传)。调用方感知不到 "我想发 ephemeral,你改广播了"。要么 EmitResult 返回 capability: Degraded + error code,要么直接 Failed(\"ephemeral_unsupported\", ...)
  4. Region/domain 不可配置(#9b)—— DefaultBaseAddress 硬编码 https://open.feishu.cn;全球版 open.larksuite.com 需要对应 AddHttpClient 的 options 或 binding 层面的 feature flag。当前至少 HttpClient 名字是可覆盖的,但没有公开 API 让调用方改 base address。
  5. Bot token 过期无刷新路径Add Redis Persistence Support for Orleans Integration #10)—— _botCredential 仍是 InitializeAsync 一次性快照;真实 Lark tenant_access_token 约 2h 过期。这条如果不在本 PR 解,建议追一个 issue 绑到 issue [Channel RFC] Lark adapter migration (LarkPlatformAdapter → LarkChannelAdapter + group chat + streaming) #261 的 follow-up 队列。

其他观察

  • ChannelAbstractionsProtoTests 已加了 CardActionSubmission 序列化 round-trip 校验,proto 加字段这一层 surface test 是闭合的,不会出现加字段忘记露出的情况。
  • SignatureValidityWindowage.Duration()(绝对值)是对的 —— 同时挡了 "时间戳来自未来" 的攻击,细节到位。
  • TryBuildCardActionConversationchat_type 读 3 个字段(chat_type / open_chat_type / conversation_type)兜底 Lark 不同事件版本的 key 差异,挺稳。

4 条 blocking + 5 条关键非 blocking 都落地了,这批修复我没发现回归,剩下 5 条可选项按优先级择一处理,Draft 转 Ready 我这边没异议。

@eanzhao eanzhao marked this pull request as ready for review April 21, 2026 12:39
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 21, 2026

已补上 issue #261 剩余 acceptance,对应提交 8d5242f5

这次把缺的主链和验证一起收齐了:

  • host callback 已切到 ConversationGAgent,Lark ingress 不再回落到 legacy ChannelUserGAgent
  • 新增 durable inbox,webhook activity 先落 channel-runtime:lark:durable-inbox,再由 subscriber 驱动 middleware / actor 处理
  • durable message-id dedup 由 ConversationGAgent 的持久 ProcessedMessageIds 权威兜底,duplicate webhook 不会重复出站
  • 补了 Day One flow 的端到端测试:launch parity、submit -> create agent
  • 补了 group chat 端到端测试
  • 顺手修了 card action callback 的 chat type 语义:保留 card_action 事件路由,同时把真实会话类型透传给 Day One tool,避免私聊表单提交被误判成非 p2p

新增 / 关键覆盖:

  • test/Aevatar.GAgents.ChannelRuntime.Tests/LarkConversationHostCutoverTests.cs
  • Ingress_DailyReportIntent_ShouldRouteThroughConversationGAgent
  • Ingress_DailyReportSubmit_ShouldExecuteAgentBuilderAfterRenderedCard
  • Ingress_GroupChat_ShouldReplyThroughConversationActor
  • Ingress_DuplicateWebhook_ShouldDeduplicateByProcessedMessageId

本地验证已重新跑过:

  • dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo --tl:off --no-restore
  • dotnet test test/Aevatar.GAgents.Channel.Lark.Tests/Aevatar.GAgents.Channel.Lark.Tests.csproj --nologo --tl:off --no-restore
  • bash tools/ci/test_stability_guards.sh

按当前实现,issue #261 提到的 Day One 回归 parity、Group chat 端到端、durable dedup 这几条现在都有对应主链实现和回归测试了。

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6c64d2841

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agents/channels/Aevatar.GAgents.Channel.Lark/LarkChannelAdapter.cs
Comment thread agents/Aevatar.GAgents.ChannelRuntime/LarkConversationTurnRunner.cs
eanzhao and others added 6 commits April 22, 2026 12:05
When EncryptKey is configured, url_verification requests were
exempt from signature verification. An attacker could forge
url_verification requests without a valid signature, and if
VerificationToken was also empty, the request would be accepted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — 3 concrete concerns

Heavy PR; JWT validator, redactor, and signature path all look solid. Three issues worth addressing before merge.

1. [MEDIUM] Lark webhook signature can be fully bypassed when a registration has neither encryptKey nor verificationToken

agents/Aevatar.GAgents.ChannelRuntime/Adapters/LarkPlatformAdapter.cs:151-188

  • L151: if (!string.IsNullOrEmpty(encryptKey)) → signature required only when encryptKey is set.
  • L179: if (string.IsNullOrEmpty(encryptKey) && !string.IsNullOrWhiteSpace(registration.GetVerificationToken())) → token check only when token is set.
  • If both are empty (misconfigured or partially‑migrated registration), ParseInboundAsync falls through to dispatch with zero authentication. Anyone who knows the public /channel/callback/{id} URL + the registration id can inject im.message.receive_v1 events into the inbox.

The adapter is still DI‑registered (ServiceCollectionExtensions.cs:179), so this path is live for any legacy / non‑migrated Lark registration. Suggest a hard guard at the top of ParseInboundAsync: if both encryptKey and verificationToken are empty, log warning and return null. Same shape needed in LarkChannelAdapter.cs. Also worth a unit test in LarkPlatformAdapterTests.cs — the scenario is currently uncovered.

2. [MEDIUM] No replay‑window check on Lark webhook timestamp

agents/Aevatar.GAgents.ChannelRuntime/Adapters/LarkPlatformAdapter.cs:160-176, same in LarkChannelAdapter.cs

Code verifies SHA256(timestamp + nonce + encrypt_key + body) but never validates that X-Lark-Request-Timestamp is recent. A captured signed payload can be replayed indefinitely.

Actor‑level dedup in ChannelUserGAgent._processedMessageIds is capped at 200 entries (MaxProcessedMessageIds = 200, ChannelUserGAgent.cs:50) and per‑actor‑activation only — it rolls over quickly on busy chats and doesn't help card actions (no message_id). Suggest rejecting requests where |now - timestamp| > 5min, in line with Lark's own guidance.

3. [LOW] TokenValidationParameters.ValidAlgorithms not constrained

agents/Aevatar.GAgents.NyxidChat/NyxRelayJwtValidator.cs:70-81

Current JWKS key types (RSA/EC) incidentally prevent alg confusion, but explicitly pinning e.g. ValidAlgorithms = new[] { "RS256", "ES256" } is cheap defense‑in‑depth against future JWKS changes and is standard hardening for IdentityModel validators.


Non‑issues (worth noting so future reviewers don't repeat the hunt)

  • Credentials in ChannelBotDirectCallbackBindingDocument — not a regression. ChannelBotDirectCallbackBindingProjector.cs:32-37 tombstones Lark entries (Platform == "lark"Tombstone), so Lark credentials are not projected. Telegram still uses this runtime‑only doc by design. The PR description's "runtime‑only direct‑callback binding documents and stopped projecting that path for Lark" is accurate.
  • tcs.Task.Result after Task.WhenAny in NyxIdChatEndpoints.cs:219,434 — safe; access is inside the completedTask == tcs.Task branch and after an IsFaulted check.
  • No CancellationToken on DispatchToUserActorAsync in ChannelCallbackEndpoints.cs:199-237 — intentional per the inline comment; inbox enqueue should not be tied to webhook request lifetime.

Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 22, 2026

Addressed the security feedback from #288 (review) in d9401558.

Changes:

  • LarkPlatformAdapter: fail closed when neither encrypt_key nor verification_token is configured; added replay-window enforcement for signed callbacks; kept signature verification on the original encrypted body.
  • NyxRelayJwtValidator: pinned accepted signing algorithms to RS256 / ES256.
  • Added/updated regression coverage in runtime/channel adapter and JWT validator tests, and refreshed the runtime signature tests to use live timestamps so the new replay-window check exercises the intended paths.

Validation:

  • dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --nologo --tl:off --filter "FullyQualifiedName~LarkPlatformAdapterTests"
  • dotnet test test/Aevatar.GAgents.Channel.Lark.Tests/Aevatar.GAgents.Channel.Lark.Tests.csproj --nologo --tl:off --filter "FullyQualifiedName~LarkChannelAdapterWebhookTests"
  • dotnet test test/Aevatar.AI.Tests/Aevatar.AI.Tests.csproj --nologo --tl:off --filter "FullyQualifiedName~NyxRelayJwtValidatorTests"
  • bash tools/ci/test_stability_guards.sh

@eanzhao eanzhao merged commit 5d89865 into dev Apr 22, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment