Skip to content

Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path#323

Merged
eanzhao merged 3 commits intodevfrom
fix/2026-04-23_lark-daily-report-bot-flow
Apr 23, 2026
Merged

Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path#323
eanzhao merged 3 commits intodevfrom
fix/2026-04-23_lark-daily-report-bot-flow

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 23, 2026

Summary

修复 PR #323 第一轮 review 反馈,并更新 title 反映实际范围。

实际效果(诚实版)

之前的 title "Make Lark slash commands work without relay API key" 过强 —— 即便去掉 relayMode 闸门,AgentBuilderTool.ExecuteAsync 仍会因为 RegistrationTokenLarkConversationTurnRunner.ToInboundEvent 里被硬编码为 string.Empty(post-#308 credential boundary 的产物)而立刻返回 "No NyxID access token available. User must be authenticated."

这个 PR 真正交付的:把 /daily-report alice 在没有 relay key 的 bot 上的反馈,从 LLM 自由发挥的中文幻觉("还差一个必要条件: 当前没有可用的私聊会话 ID...")替换成确定性的英文 auth 错误。还不能完整跑通 daily-report 创建 —— 那需要单独修通 relay 路径上的 access token 注入。

改动清单

  • LarkConversationTurnRunner.cs:104 — 拿掉 relayMode 闸门,所有 bot 都先尝试 NyxRelayAgentBuilderFlow(slash 命令不再依赖 outbound API key)。
  • NyxRelayAgentBuilderFlow.cs/daily-report / /social-media 接受首个不含 = 的位置参数作为 github_username / topic;并把 evt.ConversationId 直接打进 tool args(不再单纯依赖 AgentToolRequestContext metadata)。Review 第一轮指出的 separator 边界 bug(=foo / key= 被当成位置参数)已收紧 —— 现在只有完全不含 = 的 token 才算位置参数。
  • NyxLarkProvisioningService.cs — relay API key platform 改为 generic(NyxID 把 lark 当成 channel-bot 平台标识,不是 relay key 的平台名)。
  • docs/operations/2026-04-22-lark-nyx-cutover-runbook.md — 加 "Backfill Notes" section,提示 pre-Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path #323platform=lark 注册过的 relay key 需要 rotate。

测试

  • NyxRelayAgentBuilderFlowTests 加 3 个用例:位置参数 + conversation_id 透传(daily-report 和 social-media 各一个)+ 恶意 = token 边界(theory 两组数据)。
  • LarkConversationTurnRunnerTests 加 1 个端到端用例:registration 没有 NyxAgentApiKeyId 时,/daily-report alice 走确定性 slash 分支,返回的 reply 包含 "No NyxID access token available" 而 replyGenerator.GeneratedActivities 为空(LLM 兜底没被触发)。这一条用例就是为了把 "slash 分支必须命中、不准悄悄回滚到 LLM" 钉住。

dotnet testAevatar.GAgents.ChannelRuntime.Tests 里运行:与本 PR 相关的 15 个用例(NyxRelayAgentBuilderFlow + LarkConversationTurnRunner)全绿。仓库 dev 上有 2 个无关的 ServiceCollectionExtensionsTests 在我 rebase 前就已经在 fail,与本 PR 改动无关(已通过在干净 dev tree 上复现确认)。

还没解决但 flag 出来的问题

Test plan

  • dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --filter \"FullyQualifiedName~NyxRelayAgentBuilderFlow|FullyQualifiedName~LarkConversationTurnRunner\" — 15/15 通过
  • dotnet build agents/Aevatar.GAgents.ChannelRuntime/Aevatar.GAgents.ChannelRuntime.csproj — 0 errors
  • 在私聊里发 /daily-report eanzhao 验证 reply 文本是 "Create daily report agent failed: No NyxID access token available..." 而不是中文 LLM 幻觉
  • 在私聊里发 /daily-report github_username= 验证返回 help text 而不是 silently 把 github_username= 当成 username

🤖 Generated with Claude Code

eanzhao and others added 2 commits April 23, 2026 11:59
The Lark relay API key registers with NyxID as a `generic` platform key
because NyxID treats `lark` as the channel-bot platform identifier, not a
relay platform name. Asserting the platform field in the test pins down
the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent builder slash flow was gated on `registration.NyxAgentApiKeyId`
being populated, so bots without a relay key fell through to the LLM and
the LLM hallucinated about missing private-chat conversation IDs. Now
the slash flow runs unconditionally, the daily-report and social-media
commands accept a positional first argument (e.g. `/daily-report alice`),
and the inbound conversation_id is passed into the tool call so the
downstream agent always has the chat id even when metadata propagation
is brittle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bba76bbb1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +202 to +203
if (separator <= 0 || separator == token.Length - 1)
return token.Trim();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore malformed named args when resolving positional input

FirstPositionalArgument currently treats tokens like github_username= or topic= as valid positional arguments because separator == token.Length - 1 falls into the return path. In those cases the required-field checks are bypassed and we dispatch tool calls with invalid values (for example /daily-report github_username= becomes username github_username=), whereas malformed named args should be rejected and the help/validation response should be returned.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one concrete regression in the new positional-argument fallback and left it inline.

if (string.IsNullOrWhiteSpace(token))
continue;
var separator = token.IndexOf('=');
if (separator <= 0 || separator == token.Length - 1)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This positional fallback now treats malformed named args like github_username= or schedule_time= as the positional argument, because separator == token.Length - 1 also returns true here. That changes validation behavior in a bad way: /daily-report github_username= will create an agent with the literal username github_username=, and /daily-report schedule_time= eanzhao will turn the broken schedule_time= token into the username while silently defaulting the schedule. We should only fall back to tokens that have no = at all, and keep empty key= tokens invalid. Please add a regression test for the malformed-arg case too.

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

读完改动,主要改动方向我是赞同的:platform=generic/位置参数/透传 conversation_id 都是把链路钉住,确定性优先于 LLM 幻觉的方向没问题。下面是几个我想 flag 的点:

1. FirstPositionalArgument 的 separator 判定偏宽 (NyxRelayAgentBuilderFlow.cs:201-203)

var separator = token.IndexOf('=');
if (separator <= 0 || separator == token.Length - 1)
    return token.Trim();

这个条件把两类"显然是 key=value 写坏了"的 token 也当作位置参数:

  • =foo (以 = 开头): separator == 0 <= 0 → 直接返回 "=foo" 作为 github_username
  • key= (空值): separator == token.Length - 1 → 返回 "key=" 整个字符串作为 github_username。这个路径更容易被真实用户踩到 —— 用户如果输入 /daily-report github_username= 以为是清空,结果 ParseNamedArgumentsgithub_username 是空值 → GetOptional normalize 掉后返回 null → 走 positional fallback → 拿到 "github_username=" 作为 username 塞进 tool args,下游报一个很诡异的错。

更保守的判定应该只接受"完全没 ="的 token:

if (token.IndexOf('=') < 0)
    return token.Trim();

位置参数本来就是给 /daily-report alice 这种干净形式兜底的,形态怪的 key=value 宁可走 help text 也不要猜。

2. 去掉 relayMode 闸门之后,没有 relay key 的 bot 会走到哪里?

LarkConversationTurnRunner.cs:104 去掉 relayMode 之后,所有私聊的 slash 命令都会进入 AgentBuilderTool.ExecuteAsync。但 AgentBuilderTool.cs:101-103 一上来就校验 NyxIdAccessToken:

var token = AgentToolRequestContext.TryGet(LLMRequestMetadataKeys.NyxIdAccessToken);
if (string.IsNullOrWhiteSpace(token))
    return """{"error":"No NyxID access token available. User must be authenticated."}""";

而 metadata 里这个 token 来自 inboundEvent.RegistrationToken —— 在 LarkConversationTurnRunner.cs:349 里被硬编码成 string.Empty。搜了一下,整个仓库只有这一处对它赋值。

结论是:即便改完这个 PR,/daily-report eanzhao 在私聊里走到 AgentBuilderTool 也会直接返回 "No NyxID access token available. User must be authenticated."。确实比 LLM 中文幻觉好(至少是确定性的),但 PR title "Make Lark slash commands work without relay API key" 是过强的 —— 实际效果是"把幻觉替换成一句确定的 auth 错误"。

这也是为什么 test plan 里两条手工验证还没 check —— 建议至少把第一条 在私聊里发 /daily-report eanzhao 实际跑一下,确认下游错误文案是否可接受、是否对用户有可操作的下一步指引。如果只是想消除幻觉,这改动够;如果想真正让没 relay key 的 bot 能用 slash 命令管理 agent,RegistrationToken/metadata 注入那侧还有一块缺口。

3. platform=generic 对已有 relay key 不会自动回迁

NyxLarkProvisioningService.CreateRelayApiKeyAsync 只影响新建的 relay key。环境里已经用 platform=lark 注册过的 relay key 不会因这次改动而失效或修复。如果 NyxID 对 platform=lark 的 API key 有特殊行为(比如拒绝作为 relay 使用),需要同步一个一次性的清理步骤(rotate / delete + 重新 provision)。至少 runbook 里应该加一行说明 —— 我看 git status 里 docs/operations/2026-04-22-lark-nyx-cutover-runbook.md 是 modified 但不在这个 PR,可能你已经在别的分支改了,顺手对齐一下就好。

4. 测试覆盖只到 TryResolve,没覆盖端到端

NyxRelayAgentBuilderFlowTests 新增的两个用例验证了"命令解析正确"。但 LarkConversationTurnRunner.TryHandleAgentBuilderAsync 这条链路 —— 尤其是把 relayMode 闸门去掉之后的行为 —— 在 LarkConversationTurnRunnerTests.cs 里没有对应用例。建议加一条:没有 NyxAgentApiKeyId 的 registration,收到 /daily-report alice 时,turn runner 的结果/外发 payload 是什么。这样(2)里提的那个"实际返回 auth 错误"才会被测试钉住,以后别人改 metadata 注入逻辑时不会悄悄回滚行为。


1 是真 bug(边界解析);2 是文案跟实际效果对不上 + 需要人肉验证;3 是迁移/运维补齐;4 是测试补强。

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.74%. Comparing base (14e6dd7) to head (b655181).
⚠️ Report is 4 commits behind head on dev.

@@           Coverage Diff           @@
##              dev     #323   +/-   ##
=======================================
  Coverage   69.73%   69.74%           
=======================================
  Files        1139     1139           
  Lines       81170    81170           
  Branches    10619    10619           
=======================================
+ Hits        56605    56610    +5     
+ Misses      20469    20465    -4     
+ Partials     4096     4095    -1     
Flag Coverage Δ
ci 69.74% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Tighten FirstPositionalArgument: only accept tokens with no '=' so a
  user typo like `/daily-report github_username=` no longer leaks the
  literal `github_username=` string into the tool call.
- Add an end-to-end LarkConversationTurnRunner test that pins the new
  behavior: a registration without NyxAgentApiKeyId still routes
  /daily-report through the deterministic slash flow and returns the
  AgentBuilderTool's auth error rather than falling through to the LLM.
- Document the relay-key platform change in the cutover runbook so
  pre-existing relay keys get rotated explicitly when NyxID enforces the
  contract.

Note on scope: AgentBuilderTool still returns "No NyxID access token
available" because RegistrationToken is hard-coded to empty in
LarkConversationTurnRunner.ToInboundEvent (post-#308 credential
boundary). Wiring the access token through the relay path is out of
scope for this PR; the test pins the deterministic-error contract so
that the LLM fallback regression cannot re-emerge silently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao eanzhao changed the title Fix Lark /daily-report bot flow: relay platform + slash command without API key Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path Apr 23, 2026
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 23, 2026

Pushed b655181f addressing the four points:

  1. Separator bug — Fixed. FirstPositionalArgument now ignores any token containing =, including =foo and key=. Theory test added covering both edge cases — both fall through to the help text.

  2. Auth gap / title overstated — Conceded. Updated PR title to "Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path" and rewrote the description to be honest about scope: this PR converts the user-visible failure mode from LLM Chinese hallucination into a deterministic English No NyxID access token available error. Wiring a real access token through the relay path (relay JWT → inboundEvent.RegistrationToken) is separate channel-runtime credential boundary work and explicitly out of scope. Pinned the deterministic-error contract with the new turn-runner test below so future metadata refactors can't silently re-enable the LLM fallback.

  3. Runbook backfill note — Added a "Backfill Notes" section to docs/operations/2026-04-22-lark-nyx-cutover-runbook.md covering pre-Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path #323 relay keys with platform=lark and the rotation step.

  4. End-to-end test — Added RunInboundAsync_ShouldRouteSlashCommand_WhenRegistrationHasNoRelayApiKey in LarkConversationTurnRunnerTests. Asserts: (a) the registration without NyxAgentApiKeyId still routes /daily-report alice through the deterministic path; (b) the adapter receives a reply containing No NyxID access token available; (c) replyGenerator.GeneratedActivities is empty — LLM fallback must NOT fire. This is the test that pins the contract you flagged in (2).

Note: there were 2 pre-existing ServiceCollectionExtensionsTests failures unrelated to this PR (verified by reproducing on a clean dev checkout).

@eanzhao eanzhao merged commit dc7cb89 into dev Apr 23, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant