Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path by eanzhao · Pull Request #323 · aevatarAI/aevatar

eanzhao · 2026-04-23T04:02:55Z

Summary

修复 PR #323 第一轮 review 反馈,并更新 title 反映实际范围。

实际效果(诚实版)

之前的 title "Make Lark slash commands work without relay API key" 过强 —— 即便去掉 relayMode 闸门,AgentBuilderTool.ExecuteAsync 仍会因为 RegistrationToken 在 LarkConversationTurnRunner.ToInboundEvent 里被硬编码为 string.Empty(post-#308 credential boundary 的产物)而立刻返回 "No NyxID access token available. User must be authenticated."

这个 PR 真正交付的:把 /daily-report alice 在没有 relay key 的 bot 上的反馈,从 LLM 自由发挥的中文幻觉("还差一个必要条件: 当前没有可用的私聊会话 ID...")替换成确定性的英文 auth 错误。还不能完整跑通 daily-report 创建 —— 那需要单独修通 relay 路径上的 access token 注入。

改动清单

LarkConversationTurnRunner.cs:104 — 拿掉 relayMode 闸门,所有 bot 都先尝试 NyxRelayAgentBuilderFlow(slash 命令不再依赖 outbound API key)。
NyxRelayAgentBuilderFlow.cs — /daily-report / /social-media 接受首个不含 = 的位置参数作为 github_username / topic;并把 evt.ConversationId 直接打进 tool args(不再单纯依赖 AgentToolRequestContext metadata)。Review 第一轮指出的 separator 边界 bug(=foo / key= 被当成位置参数)已收紧 —— 现在只有完全不含 = 的 token 才算位置参数。
NyxLarkProvisioningService.cs — relay API key platform 改为 generic(NyxID 把 lark 当成 channel-bot 平台标识,不是 relay key 的平台名)。
docs/operations/2026-04-22-lark-nyx-cutover-runbook.md — 加 "Backfill Notes" section,提示 pre-Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path #323 用 platform=lark 注册过的 relay key 需要 rotate。

测试

NyxRelayAgentBuilderFlowTests 加 3 个用例:位置参数 + conversation_id 透传(daily-report 和 social-media 各一个)+ 恶意 = token 边界(theory 两组数据)。
LarkConversationTurnRunnerTests 加 1 个端到端用例:registration 没有 NyxAgentApiKeyId 时,/daily-report alice 走确定性 slash 分支,返回的 reply 包含 "No NyxID access token available" 而 replyGenerator.GeneratedActivities 为空(LLM 兜底没被触发)。这一条用例就是为了把 "slash 分支必须命中、不准悄悄回滚到 LLM" 钉住。

dotnet test 在 Aevatar.GAgents.ChannelRuntime.Tests 里运行:与本 PR 相关的 15 个用例(NyxRelayAgentBuilderFlow + LarkConversationTurnRunner)全绿。仓库 dev 上有 2 个无关的 ServiceCollectionExtensionsTests 在我 rebase 前就已经在 fail,与本 PR 改动无关(已通过在干净 dev tree 上复现确认)。

还没解决但 flag 出来的问题

Feature/cqrs projection suite #2 access token 注入缺口: RegistrationToken = string.Empty 是 [Channel RFC] Remove channel-runtime credential storage from Aevatar #308 "Remove channel runtime credential ownership" 的有意决策。要让 daily-report 在 relay bot 上端到端跑通,需要在 relay 路径(可能是 NyxRelayJwtValidator 或 inbound 流的某个 hook)把用户的 NyxID access token 注回 inboundEvent.RegistrationToken。这是单独的 channel-runtime credential boundary 工作,不在本 PR 范围。
MAF-Inspired Framework Improvements #3 已有 relay key 不会自动回迁: 已通过 runbook 说明覆盖。

Test plan

dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/Aevatar.GAgents.ChannelRuntime.Tests.csproj --filter \"FullyQualifiedName~NyxRelayAgentBuilderFlow|FullyQualifiedName~LarkConversationTurnRunner\" — 15/15 通过
dotnet build agents/Aevatar.GAgents.ChannelRuntime/Aevatar.GAgents.ChannelRuntime.csproj — 0 errors
在私聊里发 /daily-report eanzhao 验证 reply 文本是 "Create daily report agent failed: No NyxID access token available..." 而不是中文 LLM 幻觉
在私聊里发 /daily-report github_username= 验证返回 help text 而不是 silently 把 github_username= 当成 username

🤖 Generated with Claude Code

The Lark relay API key registers with NyxID as a `generic` platform key because NyxID treats `lark` as the channel-bot platform identifier, not a relay platform name. Asserting the platform field in the test pins down the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The agent builder slash flow was gated on `registration.NyxAgentApiKeyId` being populated, so bots without a relay key fell through to the LLM and the LLM hallucinated about missing private-chat conversation IDs. Now the slash flow runs unconditionally, the daily-report and social-media commands accept a positional first argument (e.g. `/daily-report alice`), and the inbound conversation_id is passed into the tool call so the downstream agent always has the chat id even when metadata propagation is brittle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bba76bbb1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-23T04:06:34Z

+            if (separator <= 0 || separator == token.Length - 1)
+                return token.Trim();


Ignore malformed named args when resolving positional input

FirstPositionalArgument currently treats tokens like github_username= or topic= as valid positional arguments because separator == token.Length - 1 falls into the return path. In those cases the required-field checks are bypassed and we dispatch tool calls with invalid values (for example /daily-report github_username= becomes username github_username=), whereas malformed named args should be rejected and the help/validation response should be returned.

Useful? React with 👍 / 👎.

eanzhao

I found one concrete regression in the new positional-argument fallback and left it inline.

eanzhao · 2026-04-23T04:08:21Z

+            if (string.IsNullOrWhiteSpace(token))
+                continue;
+            var separator = token.IndexOf('=');
+            if (separator <= 0 || separator == token.Length - 1)


This positional fallback now treats malformed named args like github_username= or schedule_time= as the positional argument, because separator == token.Length - 1 also returns true here. That changes validation behavior in a bad way: /daily-report github_username= will create an agent with the literal username github_username=, and /daily-report schedule_time= eanzhao will turn the broken schedule_time= token into the username while silently defaulting the schedule. We should only fall back to tokens that have no = at all, and keep empty key= tokens invalid. Please add a regression test for the malformed-arg case too.

eanzhao

读完改动,主要改动方向我是赞同的:platform=generic/位置参数/透传 conversation_id 都是把链路钉住,确定性优先于 LLM 幻觉的方向没问题。下面是几个我想 flag 的点:

1. `FirstPositionalArgument` 的 separator 判定偏宽 (`NyxRelayAgentBuilderFlow.cs:201-203`)

var separator = token.IndexOf('=');
if (separator <= 0 || separator == token.Length - 1)
    return token.Trim();

这个条件把两类"显然是 key=value 写坏了"的 token 也当作位置参数:

=foo (以 = 开头): separator == 0 <= 0 → 直接返回 "=foo" 作为 github_username。
key= (空值): separator == token.Length - 1 → 返回 "key=" 整个字符串作为 github_username。这个路径更容易被真实用户踩到 —— 用户如果输入 /daily-report github_username= 以为是清空,结果 ParseNamedArguments 里 github_username 是空值 → GetOptional normalize 掉后返回 null → 走 positional fallback → 拿到 "github_username=" 作为 username 塞进 tool args,下游报一个很诡异的错。

更保守的判定应该只接受"完全没 ="的 token:

if (token.IndexOf('=') < 0)
    return token.Trim();

位置参数本来就是给 /daily-report alice 这种干净形式兜底的,形态怪的 key=value 宁可走 help text 也不要猜。

2. 去掉 `relayMode` 闸门之后,没有 relay key 的 bot 会走到哪里?

LarkConversationTurnRunner.cs:104 去掉 relayMode 之后,所有私聊的 slash 命令都会进入 AgentBuilderTool.ExecuteAsync。但 AgentBuilderTool.cs:101-103 一上来就校验 NyxIdAccessToken:

var token = AgentToolRequestContext.TryGet(LLMRequestMetadataKeys.NyxIdAccessToken);
if (string.IsNullOrWhiteSpace(token))
    return """{"error":"No NyxID access token available. User must be authenticated."}""";

而 metadata 里这个 token 来自 inboundEvent.RegistrationToken —— 在 LarkConversationTurnRunner.cs:349 里被硬编码成 string.Empty。搜了一下,整个仓库只有这一处对它赋值。

结论是:即便改完这个 PR,/daily-report eanzhao 在私聊里走到 AgentBuilderTool 也会直接返回 "No NyxID access token available. User must be authenticated."。确实比 LLM 中文幻觉好(至少是确定性的),但 PR title "Make Lark slash commands work without relay API key" 是过强的 —— 实际效果是"把幻觉替换成一句确定的 auth 错误"。

这也是为什么 test plan 里两条手工验证还没 check —— 建议至少把第一条 在私聊里发 /daily-report eanzhao 实际跑一下,确认下游错误文案是否可接受、是否对用户有可操作的下一步指引。如果只是想消除幻觉,这改动够;如果想真正让没 relay key 的 bot 能用 slash 命令管理 agent,RegistrationToken/metadata 注入那侧还有一块缺口。

3. `platform=generic` 对已有 relay key 不会自动回迁

NyxLarkProvisioningService.CreateRelayApiKeyAsync 只影响新建的 relay key。环境里已经用 platform=lark 注册过的 relay key 不会因这次改动而失效或修复。如果 NyxID 对 platform=lark 的 API key 有特殊行为(比如拒绝作为 relay 使用),需要同步一个一次性的清理步骤(rotate / delete + 重新 provision)。至少 runbook 里应该加一行说明 —— 我看 git status 里 docs/operations/2026-04-22-lark-nyx-cutover-runbook.md 是 modified 但不在这个 PR,可能你已经在别的分支改了,顺手对齐一下就好。

4. 测试覆盖只到 `TryResolve`,没覆盖端到端

NyxRelayAgentBuilderFlowTests 新增的两个用例验证了"命令解析正确"。但 LarkConversationTurnRunner.TryHandleAgentBuilderAsync 这条链路 —— 尤其是把 relayMode 闸门去掉之后的行为 —— 在 LarkConversationTurnRunnerTests.cs 里没有对应用例。建议加一条:没有 NyxAgentApiKeyId 的 registration,收到 /daily-report alice 时,turn runner 的结果/外发 payload 是什么。这样(2)里提的那个"实际返回 auth 错误"才会被测试钉住,以后别人改 metadata 注入逻辑时不会悄悄回滚行为。

1 是真 bug(边界解析);2 是文案跟实际效果对不上 + 需要人肉验证;3 是迁移/运维补齐;4 是测试补强。

codecov · 2026-04-23T04:19:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.74%. Comparing base (14e6dd7) to head (b655181).
⚠️ Report is 4 commits behind head on dev.

@@           Coverage Diff           @@
##              dev     #323   +/-   ##
=======================================
  Coverage   69.73%   69.74%           
=======================================
  Files        1139     1139           
  Lines       81170    81170           
  Branches    10619    10619           
=======================================
+ Hits        56605    56610    +5     
+ Misses      20469    20465    -4     
+ Partials     4096     4095    -1

Flag	Coverage Δ
ci	`69.74% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Tighten FirstPositionalArgument: only accept tokens with no '=' so a user typo like `/daily-report github_username=` no longer leaks the literal `github_username=` string into the tool call. - Add an end-to-end LarkConversationTurnRunner test that pins the new behavior: a registration without NyxAgentApiKeyId still routes /daily-report through the deterministic slash flow and returns the AgentBuilderTool's auth error rather than falling through to the LLM. - Document the relay-key platform change in the cutover runbook so pre-existing relay keys get rotated explicitly when NyxID enforces the contract. Note on scope: AgentBuilderTool still returns "No NyxID access token available" because RegistrationToken is hard-coded to empty in LarkConversationTurnRunner.ToInboundEvent (post-#308 credential boundary). Wiring the access token through the relay path is out of scope for this PR; the test pins the deterministic-error contract so that the LLM fallback regression cannot re-emerge silently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eanzhao · 2026-04-23T04:35:24Z

Pushed b655181f addressing the four points:

Separator bug — Fixed. FirstPositionalArgument now ignores any token containing =, including =foo and key=. Theory test added covering both edge cases — both fall through to the help text.
Auth gap / title overstated — Conceded. Updated PR title to "Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path" and rewrote the description to be honest about scope: this PR converts the user-visible failure mode from LLM Chinese hallucination into a deterministic English No NyxID access token available error. Wiring a real access token through the relay path (relay JWT → inboundEvent.RegistrationToken) is separate channel-runtime credential boundary work and explicitly out of scope. Pinned the deterministic-error contract with the new turn-runner test below so future metadata refactors can't silently re-enable the LLM fallback.
Runbook backfill note — Added a "Backfill Notes" section to docs/operations/2026-04-22-lark-nyx-cutover-runbook.md covering pre-Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path #323 relay keys with platform=lark and the rotation step.
End-to-end test — Added RunInboundAsync_ShouldRouteSlashCommand_WhenRegistrationHasNoRelayApiKey in LarkConversationTurnRunnerTests. Asserts: (a) the registration without NyxAgentApiKeyId still routes /daily-report alice through the deterministic path; (b) the adapter receives a reply containing No NyxID access token available; (c) replyGenerator.GeneratedActivities is empty — LLM fallback must NOT fire. This is the test that pins the contract you flagged in (2).

Note: there were 2 pre-existing ServiceCollectionExtensionsTests failures unrelated to this PR (verified by reproducing on a clean dev checkout).

eanzhao and others added 2 commits April 23, 2026 11:59

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

eanzhao commented Apr 23, 2026

View reviewed changes

eanzhao changed the title ~~Fix Lark /daily-report bot flow: relay platform + slash command without API key~~ Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path Apr 23, 2026

eanzhao merged commit dc7cb89 into dev Apr 23, 2026
12 checks passed

eanzhao mentioned this pull request Apr 23, 2026

架构重构：统一 channel inbound 主干，Nyx relay 降级为 transport adapter #328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path#323

Replace LLM hallucination with deterministic slash-command flow on Lark daily-report path#323
eanzhao merged 3 commits intodevfrom
fix/2026-04-23_lark-daily-report-bot-flow

eanzhao commented Apr 23, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 23, 2026

Uh oh!

eanzhao left a comment

Uh oh!

eanzhao Apr 23, 2026

Uh oh!

eanzhao left a comment

Uh oh!

codecov Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

eanzhao commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if (separator <= 0 \|\| separator == token.Length - 1)
		return token.Trim();

Conversation

eanzhao commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

实际效果(诚实版)

改动清单

测试

还没解决但 flag 出来的问题

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao left a comment

Choose a reason for hiding this comment

Uh oh!

eanzhao Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao left a comment

Choose a reason for hiding this comment

1. FirstPositionalArgument 的 separator 判定偏宽 (NyxRelayAgentBuilderFlow.cs:201-203)

2. 去掉 relayMode 闸门之后,没有 relay key 的 bot 会走到哪里?

3. platform=generic 对已有 relay key 不会自动回迁

4. 测试覆盖只到 TryResolve,没覆盖端到端

Uh oh!

codecov Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eanzhao commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eanzhao commented Apr 23, 2026 •

edited

Loading

1. `FirstPositionalArgument` 的 separator 判定偏宽 (`NyxRelayAgentBuilderFlow.cs:201-203`)

2. 去掉 `relayMode` 闸门之后,没有 relay key 的 bot 会走到哪里?

3. `platform=generic` 对已有 relay key 不会自动回迁

4. 测试覆盖只到 `TryResolve`,没覆盖端到端

codecov Bot commented Apr 23, 2026 •

edited

Loading