Conversation
SkillRunnerGAgent.SendOutputAsync and FeishuCardHumanInteractionPort both hard-coded receive_id_type=chat_id when posting to Lark IM. In DM flows the OutboundConfig.ConversationId is a user open_id (ou_*), not a chat_id, which made the Lark API reject every outbound message with HTTP 400. Introduce LarkConversationTargets.ResolveReceiveIdType to derive the correct receive_id_type from the conversation_id prefix (ou_ -> open_id, on_ -> union_id, otherwise chat_id) and reuse it in both call sites. ChannelConversationTurnRunner's immediate Lark reaction is best-effort acknowledgment; failure cases (most commonly Lark code 231002 missing reaction permission) recur on every inbound message until ops adjusts the app scope. Demote the proxy-error path from Warning to Debug to keep the signal discoverable without spamming the run log on every turn. The exception path stays at Warning because thrown exceptions indicate genuine network or code-level issues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## dev #403 +/- ##
=======================================
Coverage 70.30% 70.30%
=======================================
Files 1174 1174
Lines 84061 84061
Branches 11049 11049
=======================================
Hits 59095 59095
Misses 20698 20698
Partials 4268 4268
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
eanzhao
left a comment
There was a problem hiding this comment.
Two concerns and a minor nit on the test data. Fix is correct for the immediate /daily outage, posting these as inline so the file context is preserved.
| // tenant-level config issue that recurs on every inbound message | ||
| // until ops fixes the app scope. Log at Debug so it stays | ||
| // discoverable when the channel is opted into verbose logging | ||
| // without spamming Warnings on every turn. |
There was a problem hiding this comment.
Demotion is broader than the PR describes. The PR rationale targets the recurring 231002 The operator has no permission to react noise, but TryGetProxyError (line 987-1034) returns true for any non-zero code or any error field. Demoting the whole proxy-error branch to LogDebug means a future Lark API regression with a new error code (rate limit, bot kicked, message archived, NyxID proxy 5xx packaged into the same envelope, etc.) also drops to Debug and we lose visibility.
Suggest gating the demotion on the actual error code:
if (TryGetProxyError(response, out var code, out var detail))
{
if (code == 231002)
_logger.LogDebug(/* known recurring config gap */ ...);
else
_logger.LogWarning(/* anything else is a real signal */ ...);
}Or keep the current shape but extract the code first and only demote the known-noisy allowlist. The exception path is already correctly kept at Warning.
There was a problem hiding this comment.
Done in 5da48f1. Extracted LarkProxyResponse.TryGetError so it returns the Lark business code, then gated the Debug demotion on code == LarkBotErrorCodes.NoPermissionToReact (231002) only. Anything else — Nyx envelope errors, unrecognized Lark codes — stays at LogWarning and now also logs the larkCode. The exception path remains at LogWarning.
| if (trimmed.StartsWith("on_", StringComparison.Ordinal)) | ||
| return "union_id"; | ||
|
|
||
| return DefaultReceiveIdType; |
There was a problem hiding this comment.
Silent fallback to chat_id for unknown prefixes re-creates the same failure mode this PR is fixing. The /daily 400 outage was "receive_id_type=chat_id was assumed for an ou_*". After this fix we still default to chat_id for any non-ou_/non-on_ input. If NyxID relay later emits Lark email, user_id, or any new shape, outbound 400s come back silently — same symptom, same lack of log signal.
Two options, in order of preference:
-
Quick: emit a
LogDebug(or warn-once-per-process) when the fallback branch is taken, so we get a breadcrumb the next time format drift hits prod. Requires threadingILoggerthrough the helper or returning a(type, fellThrough)tuple. -
Root fix (follow-up issue): lift
receive_id_typeto a typed field on the inboundConversationReference/OutboundConfig, populated byNyxIdRelayTransport.ParseInboundAsyncfrom whatever the relay actually claims, instead of inferring from a string prefix at the outbound site. CLAUDE.md "核心语义强类型...禁止塞入通用 bag / 禁止从前缀字符串猜语义" applies —conversation_idhere is silently a (id, type) pair encoded into a single string.
Not blocking — the prefix mapping covers today's known relay shapes. Worth a TODO/issue.
There was a problem hiding this comment.
Took the root-fix path you suggested. Added typed lark_receive_id + lark_receive_id_type fields to SkillRunnerOutboundConfig, WorkflowAgentState/Init/Initialized, UserAgentCatalogEntry, and UserAgentCatalogUpsertCommand. AgentBuilderTool now populates them at delivery-target creation time via LarkConversationTargets.BuildFromInbound(chatType, conversationId, senderId), so the p2p flow pins the user ou_* open_id straight from SenderId instead of guessing from the conversation_id prefix. Outbound senders prefer the typed pair when both fields are present.
Kept the prefix heuristic (ResolveReceiveIdType) as a fallback only for legacy state persisted before these typed fields existed, and added option-1's breadcrumb on top of it: LarkConversationTargets.Resolve(...) returns a FellBackToPrefixInference flag and the call sites (SkillRunnerGAgent.SendOutputAsync, FeishuCardHumanInteractionPort.SendMessageAsync) emit LogDebug when that path runs. Format drift will surface there before it silently rejects in Lark.
| [InlineData("on_union_user_token", "union_id")] | ||
| [InlineData("oc_group_chat_1", "chat_id")] | ||
| [InlineData("oc_dm_thread", "chat_id")] | ||
| public void ResolveReceiveIdType_ShouldMapKnownPrefixes(string conversationId, string expected) |
There was a problem hiding this comment.
Nit: [InlineData("oc_dm_thread", "chat_id")] is technically correct (Lark DMs do have an underlying oc_* chat_id) but doesn't reflect this codebase's relay shape — NyxIdRelayTransport for Lark DMs puts the sender ou_* into Conversation.Id, never the DM thread's oc_*. Worth either renaming the data label (oc_group_chat_2 works) or adding a one-line comment so future readers don't infer that DMs flow as oc_* here.
There was a problem hiding this comment.
Renamed oc_dm_thread → oc_group_chat_2 in LarkConversationTargetsTests so the data label matches what this codebase's relay actually does (DMs flow as ou_* here, not oc_*). The new BuildFromInbound tests cover the real p2p case explicitly: chat_type=p2p with senderId="ou_user_1" produces (ReceiveId="ou_user_1", ReceiveIdType="open_id").
eanzhao
left a comment
There was a problem hiding this comment.
这版方向是对的,但我看到两个会影响实际修复效果的问题,先留 comment。
| if (string.IsNullOrEmpty(trimmed)) | ||
| return DefaultReceiveIdType; | ||
|
|
||
| if (trimmed.StartsWith("ou_", StringComparison.Ordinal)) |
There was a problem hiding this comment.
这里还是在发送时从 conversation_id 前缀推断 Lark 的 receive_id_type,但创建链路并没有保证这个字段一定是 Lark 的 receive id。/daily、/social-media 仍把 ChannelInboundEvent.ConversationId 写进 OutboundConfig/UserAgentCatalog;NyxID relay payload 里真实用户 open_id 是单独的 SenderId,conversation id 可能是 route id 或 oc_* chat id,而且现有 p2p 测试也还断言 oc_* 会被保存。这样只要 DM agent 没预先存成 ou_*,这里仍会回落到 chat_id,原来的 400 还会继续出现。建议在创建 delivery target 时写入明确的 receive_id + receive_id_type(p2p 用 SenderId/open_id),不要在发送时从 conversation_id 猜。
There was a problem hiding this comment.
是的——这版改成在创建 delivery target 时直接写 lark_receive_id + lark_receive_id_type。AgentBuilderTool 现在用 LarkConversationTargets.BuildFromInbound(chatType, conversationId, senderId):p2p 取 SenderId/open_id,群/频道继续用 oc_*/chat_id。
字段沿着 SkillRunnerOutboundConfig / WorkflowAgentState / WorkflowAgentInitializedEvent / UserAgentCatalogUpsertCommand / UserAgentCatalogEntry 一路落到 UserAgentCatalogGAgent.HandleUpsertAsync(用 MergeNonEmpty),然后 SkillRunnerGAgent.SendOutputAsync 与 FeishuCardHumanInteractionPort.SendMessageAsync 优先读这对 typed 字段。原 OutboundConfig.ConversationId 仍然保留作为 LLM metadata 输入路径,但发送侧不再依赖它推 receive_id_type。
旧的前缀推断路径只为兼容已经持久化的旧 state 保留,并且 fallback 时会 LogDebug 一次,方便观察 format drift。p2p 现存测试断言 OutboundConfig.ConversationId == "oc_chat_1" 没有改动;新测试断言同时多一对 LarkReceiveId == "ou_user_1" / LarkReceiveIdType == "open_id",这正是 production p2p DM 的真实形态。
| @@ -278,7 +279,7 @@ private async Task SendOutputAsync(string output, CancellationToken ct) | |||
| await client.ProxyRequestAsync( | |||
There was a problem hiding this comment.
ProxyRequestAsync 不会把下游 Lark 错误自动变成异常:HTTP 非 2xx 会返回 {"error": true, ...} 字符串,Lark 常见应用错误也可能是 {"code": ..., "msg": ...}。这里忽略返回值会让实际发送失败的 runner 继续落 SkillRunnerExecutionCompletedEvent,TrySendFailureAsync 也会认为失败通知已经发出。这个 PR 正在修 rejected outbound DM,建议把结果解析成“Nyx error envelope 或 Lark code != 0 都失败”,并给 ou_* 成功路径和错误响应各补一个 call-site 测试。
There was a problem hiding this comment.
说得对,这个 PR 确实就在修 outbound DM 被拒,结果还把返回值丢了。
提取了 LarkProxyResponse.TryGetError 共享 helper,同时识别 Nyx envelope ({"error":...}) 和 Lark business code ({"code":<non-zero>,"msg":...}),两类失败都让 SkillRunnerGAgent.SendOutputAsync 抛 InvalidOperationException。这样 HandleTriggerAsync 不会在 silent rejection 上落 SkillRunnerExecutionCompletedEvent,retry / TrySendFailureAsync 都会按真实失败走。FeishuCardHumanInteractionPort.SendMessageAsync 同步切到这个 helper(之前的 result.Contains("\"error\"") 也漏 code != 0 的情况)。
新增四个 call-site 测试,全在 SkillRunnerGAgentTests:
SendOutputAsync_ShouldUseTypedReceiveTarget_WhenLarkReceiveIdIsPopulated— typedou_user_1/open_id成功路径,断言 URL 与 body。SendOutputAsync_ShouldFallBackToConversationIdPrefixInference_ForLegacyState— 旧 state 兜底路径仍然能跑通。SendOutputAsync_ShouldThrow_WhenLarkBusinessCodeIsNonZero— Larkcode=230002抛异常,消息里带code=230002与invalid receive_id。SendOutputAsync_ShouldThrow_WhenNyxProxyEnvelopeReportsError— Nyx 包裹的{"error":true,"message":"upstream timeout"}抛异常。
Five inline review comments from #403 turned into one consistent change: * Lift the Lark `receive_id` / `receive_id_type` pair to typed proto fields on `SkillRunnerOutboundConfig`, `WorkflowAgentState/Init/Initialized`, `UserAgentCatalogEntry`, and `UserAgentCatalogUpsertCommand`. The fields are populated at delivery-target creation time in `AgentBuilderTool` (`/daily_report` and `/social-media` flows) using the inbound `chat_type` + `sender_id`, so p2p DMs now pin the user `ou_*` open_id rather than relying on whatever the relay happens to put into `conversation_id`. `SkillRunnerGAgent.SendOutputAsync` and `FeishuCardHumanInteractionPort.SendMessageAsync` use the typed pair verbatim when both fields are present, falling back to the prefix heuristic only for legacy state and emitting a `LogDebug` breadcrumb on the fallback path. * Make `LarkConversationTargets.ResolveReceiveIdType` legacy-only and add `Resolve(typedId, typedType, legacyConversationId)` plus `BuildFromInbound(chatType, conversationId, senderId)`. * Extract a shared `LarkProxyResponse.TryGetError` helper that recognises both Nyx error envelopes (`{"error":...}`) and Lark business errors (`{"code":<non-zero>,"msg":...}`), returning the Lark code so callers can branch on it. `ChannelConversationTurnRunner.TrySendImmediateLark ReactionAsync` now demotes to `LogDebug` only for the documented recurring-config-gap code 231002 (named in `LarkBotErrorCodes`) and keeps `LogWarning` for any other proxy / Lark error so future regressions stay visible. * `SkillRunnerGAgent.SendOutputAsync` and `FeishuCardHumanInteractionPort.SendMessageAsync` now parse the proxy response and throw on Nyx envelope errors or Lark `code != 0`. The earlier code dropped both classes of failure on the floor, letting `HandleTriggerAsync` persist `SkillRunnerExecutionCompletedEvent` on silently-rejected outbound DMs. * Test data nit: rename the misleading `oc_dm_thread` fixture to `oc_group_chat_2` (this codebase's relay pins DMs as `ou_*`, never `oc_*`). * New / updated tests: - `LarkConversationTargetsTests`: cover `Resolve` typed-vs-legacy paths and `BuildFromInbound` p2p / group / missing-sender cases. - `SkillRunnerGAgentTests`: typed-`open_id` happy path, legacy fallback, Lark `code != 0` rejection, Nyx envelope rejection. - `AgentBuilderToolTests`: assert the typed `LarkReceiveId` / `LarkReceiveIdType` are populated for the p2p `/daily_report` flow. Verification: ChannelRuntime suite 382 passed, Channel.Protocol suite 119 passed, Foundation.Runtime.Hosting 151 passed (15 infra-skipped), `tools/ci/test_stability_guards.sh` and `tools/ci/architecture_guards.sh` pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eanzhao
left a comment
There was a problem hiding this comment.
复看 fix 后还有一个阻断点:typed Lark target 在 catalog 投影链路里被丢掉了。
| // both fields are present, outbound senders use them verbatim instead of | ||
| // inferring receive_id_type from the conversation_id prefix. For p2p the | ||
| // creator stores the user open_id (`ou_*`) here, not the DM thread chat_id. | ||
| string lark_receive_id = 22; |
There was a problem hiding this comment.
这里新增了 UserAgentCatalogEntry.lark_receive_id / lark_receive_id_type,但同一个 PR 没有把这两个字段加到 UserAgentCatalogDocument,UserAgentCatalogProjector.Materialize 也没有写,UserAgentCatalogQueryPort.ToEntry 也没有读回。FeishuCardHumanInteractionPort 的 target 是通过 UserAgentCatalogRuntimeQueryPort 从 projection document 重建出来的,所以 workflow/social-media 的 human-interaction 发送路径会把这两个 typed 字段丢掉,然后在 LarkConversationTargets.Resolve(...) 里继续回退到 ConversationId 前缀推断。也就是说 SkillRunner 自己发送修好了,但 catalog-backed human interaction 仍然可能用旧的 conversation_id/chat_id 路径。建议给 UserAgentCatalogDocument 增加同样的字段,并补齐 projector + query port 映射和投影测试。
There was a problem hiding this comment.
你说得对——这正是我漏掉的地方。在 1d92c53 把 typed 字段补到 catalog projection 链路:
UserAgentCatalogDocument新增lark_receive_id(= 24) 与lark_receive_id_type(= 25)。UserAgentCatalogProjector.Materialize把 entry 上的两个字段写进 document。UserAgentCatalogQueryPort.ToEntry把它们读回 entry,所以FeishuCardHumanInteractionPort通过UserAgentCatalogRuntimeQueryPort拿到的 entry 就带着 typed pair,不再回退到conversation_id推断。
补了两个测试:UserAgentCatalogProjectorTests.ProjectAsync_WithValidCommittedEvent_UpsertsDocument 现在断言 typed 字段被投影到 document;新增 ToEntry_ShouldRoundTripTypedLarkReceiveTarget_FromDocumentToEntry 锁住反向映射。
eanzhao
left a comment
There was a problem hiding this comment.
Re-reviewed 5da48f17. Prior three comments are addressed cleanly — typed lark_receive_id/lark_receive_id_type lifted into the protos and threaded through AgentBuilder → Skill/Workflow → Catalog → Feishu sender, prefix inference is now legacy-only with a FellBackToPrefixInference breadcrumb, reaction log gated on LarkBotErrorCodes.NoPermissionToReact, proxy response parsed via shared LarkProxyResponse.TryGetError so silent rejections turn into InvalidOperationException and trigger the retry/failure path. Three new findings + one nit; none block the merge.
| return new LarkReceiveTarget(trimmedConversation, DefaultReceiveIdType, FellBackToPrefixInference: false); | ||
| } | ||
|
|
||
| private static bool IsDirectMessage(string? chatType) |
There was a problem hiding this comment.
p2p + missing senderId silently regresses to the original bug. BuildFromInbound("p2p", "ou_user_1", senderId: "") falls through IsDirectMessage(...) && !IsNullOrEmpty(trimmedSender) to the else-branch and returns ("ou_user_1", "chat_id", FellBackToPrefixInference: false) — exactly the original /daily outage shape, plus the breadcrumb is suppressed (FellBack=false).
In practice the relay always emits Sender.PlatformId so this is unlikely, but it's the same silent-default anti-pattern the rest of the PR is aimed at. Two cheap options:
if (IsDirectMessage(chatType))
{
if (!string.IsNullOrEmpty(trimmedSender))
return new LarkReceiveTarget(trimmedSender, OpenIdReceiveIdType, FellBackToPrefixInference: false);
// Confused inbound: chat_type=p2p but no sender. Don't write a typed target — let the
// outbound resolver fall back to legacy + breadcrumb instead of pinning a wrong type.
return new LarkReceiveTarget(string.Empty, string.Empty, FellBackToPrefixInference: false);
}…then Resolve(typedId="", typedType="", legacyConversationId="ou_user_1") flips FellBack=true, callers LogDebug, and the prefix heuristic recovers the right open_id. Or set FellBack=true directly here. Either way the pathological case stays observable.
There was a problem hiding this comment.
Agreed — this was the silent-default anti-pattern I was supposed to be designing away from. Fixed in 1d92c53:
if (IsDirectMessage(chatType))
{
return string.IsNullOrEmpty(trimmedSender)
? new LarkReceiveTarget(string.Empty, string.Empty, FellBackToPrefixInference: true)
: new LarkReceiveTarget(trimmedSender, OpenIdReceiveIdType, FellBackToPrefixInference: false);
}BuildFromInbound("p2p", "ou_user_1", senderId: "") now returns ("", "", FellBack=true). Resolve(typedId="", typedType="", legacyConversationId="ou_user_1") flips to the prefix path, recovers open_id for ou_*, and call sites emit the Debug breadcrumb — pathological case stays observable.
Two paired tests:
BuildFromInbound_ShouldReturnEmptyTypedPairWithFellBack_WhenP2pAndSenderIsMissing— pins the new return shape.Resolve_ShouldRecoverOpenIdForP2pConfusedInbound_ViaPrefixInference— pins the recovery path so a future refactor that re-types the empty pair aschat_idimmediately fails.
| // Authoritative Lark outbound delivery target captured at create time. When | ||
| // both fields are present, SkillRunnerGAgent.SendOutputAsync uses them | ||
| // verbatim; conversation_id stays for LLM metadata propagation only. | ||
| string lark_receive_id = 7; |
There was a problem hiding this comment.
Lark-named fields living on platform-agnostic state. UserAgentCatalogEntry, WorkflowAgentState, and SkillRunnerOutboundConfig all carry a platform field and are now permanently coupled to a Lark-shaped (lark_receive_id, lark_receive_id_type) pair. This works for Day One — only api-lark-bot reaches these protos — but the moment Telegram/Discord land we either bolt on telegram_chat_id, discord_channel_id, … or refactor.
For the next iteration, consider lifting to a typed OutboundTarget sub-message keyed by platform, e.g.:
message OutboundTarget {
oneof target {
LarkReceiveTarget lark = 1;
TelegramChatTarget telegram = 2;
// ...
}
}Matches CLAUDE.md "核心语义强类型 / 命名语义优先" better than a per-platform field-explosion. Not blocking — file naming honestly says these are Lark-only — but worth a follow-up issue so the next platform doesn't accrete fields.
There was a problem hiding this comment.
Agreed — this is the right shape but explicitly non-blocking, and putting it in this PR would conflate the bugfix with a multi-platform refactor. Opened as aevatarAI/aevatar#408 so the next platform actually triggers the cleanup; the issue captures the proposed OutboundTarget oneof shape, the migration considerations (round-tripping through projection, deprecating the flat fields, reconciling with the existing platform discriminator), and "do this when a second platform's actual delivery code lands, not speculatively."
| @@ -196,7 +197,12 @@ await skillRunnerActor.Received(1).HandleEventAsync( | |||
| e.Payload.Unpack<InitializeSkillRunnerCommand>().OutboundConfig.NyxProviderSlug == "api-lark-bot" && | |||
There was a problem hiding this comment.
Test asymmetry: /daily_report p2p typed-field assertion but no parallel for /social-media. CreateSocialMediaAgentAsync (AgentBuilderTool.cs:387) calls the same BuildFromInbound(...) and writes the same typed fields onto InitializeWorkflowAgentCommand. If someone refactors that path later, only the daily-report test pins the contract. Worth a near-clone of this test for the social_media flow asserting LarkReceiveId == "ou_user_1" / LarkReceiveIdType == "open_id" lands on InitializeWorkflowAgentCommand. Cheap; same fixture infrastructure.
There was a problem hiding this comment.
Done in 1d92c53. Added [ChannelMetadataKeys.SenderId] = "ou_user_1" to the social_media test fixture and extended the InitializeWorkflowAgentCommand assertion with:
e.Payload.Unpack<InitializeWorkflowAgentCommand>().LarkReceiveId == "ou_user_1" &&
e.Payload.Unpack<InitializeWorkflowAgentCommand>().LarkReceiveIdType == "open_id"Now a refactor of CreateSocialMediaAgentAsync that drops the BuildFromInbound(...) call (or accidentally points it at ConversationId instead of SenderId) fails this test alongside the daily_report mirror.
| string.Equals(normalized, "dm", StringComparison.OrdinalIgnoreCase); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Nit: IsDirectMessage accepts p2p, direct_message, directmessage, dm. The only emitter in this repo is ChannelConversationTurnRunner.ResolveConversationChatType, which outputs exactly "p2p". The other three are unreachable defensive aliases — either drop them, or add a one-line comment naming where they're expected to land (Telegram bridge?) so future readers know they're not dead. (Pure cleanup, not blocking.)
There was a problem hiding this comment.
Dropped in 1d92c53. IsDirectMessage is now a one-liner ordinal "p2p" check, with a comment naming ChannelConversationTurnRunner.ResolveConversationChatType as the single emitter so future readers don't reinstate the aliases as defensive padding:
// Only "p2p" is emitted by ChannelConversationTurnRunner.ResolveConversationChatType today,
// which is the single source for ChannelMetadataKeys.ChatType in this repo. Keep the check
// narrow until a second emitter (e.g. a Telegram bridge) actually lands.
private static bool IsDirectMessage(string? chatType) =>
string.Equals(chatType?.Trim(), "p2p", StringComparison.Ordinal);The previous [Theory] data rows for direct_message / directmessage / dm (treated as DM) moved into BuildFromInbound_ShouldUseConversationChatId_ForNonP2pChatTypes (treated as non-DM), so the new contract is locked in by tests on both sides.
Five new inline review comments after 5da48f1. Four addressed here, one deferred to a follow-up issue. * Catalog projection gap (r3141450544): the prior pass added `lark_receive_id` / `lark_receive_id_type` to `UserAgentCatalogEntry` but not to `UserAgentCatalogDocument`, so the projection round-trip through `UserAgentCatalogProjector.Materialize` and `UserAgentCatalogQueryPort.ToEntry` dropped them. Workflow / social- media human-interaction sends (which read through the projection) silently fell back to `conversation_id` prefix inference even after the SkillRunner direct-send path was fixed. Added the typed fields to the document, materializer, and `ToEntry`, with both projector and round-trip tests asserting they survive end-to-end. * p2p + missing senderId silent regression (r3141451870): `BuildFromInbound("p2p", "ou_user_1", senderId: "")` previously returned `("ou_user_1", "chat_id", FellBack=false)` — exactly the original `/daily` outage shape, with the breadcrumb suppressed. Now returns an empty typed pair with `FellBack=true` so `Resolve(...)` flips to the legacy prefix path, recovers `open_id`, and emits the Debug breadcrumb. New paired tests cover the empty-typed-pair return and the `Resolve` recovery. * `IsDirectMessage` dead aliases (r3141451876): the only emitter in this repo is `ChannelConversationTurnRunner.ResolveConversationChat Type`, which produces exactly `"p2p"`. Dropped the unreachable `direct_message` / `directmessage` / `dm` aliases (and the case- insensitive match) to a strict ordinal `"p2p"` check, with a comment naming the single emitter so future readers don't reinstate them as defensive padding. * /social-media test asymmetry (r3141451874): added typed-field assertions to the `/social-media` `AgentBuilderTool` test mirroring the existing daily_report pin, so a refactor of `CreateSocialMediaAgentAsync` cannot drop the typed pair on `InitializeWorkflowAgentCommand` without a test catching it. Out of scope: the OutboundTarget oneof refactor proposed in r3141451872 is explicitly non-blocking per the reviewer; opened as issue #408 for when a second channel platform lands. Verification: ChannelRuntime suite 383 passed (was 382), `tools/ci/test_stability_guards.sh` and `tools/ci/architecture_guards.sh` pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eanzhao
left a comment
There was a problem hiding this comment.
Re-reviewed 1d92c53d. All four addressable prior comments landed cleanly; #408 captures the deferred OutboundTarget oneof with the right "do this when a second platform's actual delivery code lands" trigger. Verified end-to-end:
-
Catalog projection gap (r3141450544) — typed fields now mirror through
UserAgentCatalogDocument(proto fields 24/25),UserAgentCatalogProjector.Materialize, andUserAgentCatalogQueryPort.ToEntry; both directions pinned byUserAgentCatalogProjectorTests(Materializeround-trip + standaloneToEntrytest).FeishuCardHumanInteractionPortreads through the projection so workflow/social-media DM delivery now uses the typed pair, matching the SkillRunner direct-send path. Good catch — I missed this in my second pass; only checked the actor-state write, didn't follow into the read model. -
p2p + missing senderId (r3141451870) —
BuildFromInboundnow returns(empty, empty, FellBack=true)when chat_type is p2p but senderId is missing, exactly the suggested shape.Resolve(...)recoversopen_idvia the legacy prefix path andSkillRunnerGAgent/FeishuCardHumanInteractionPortfire the Debug breadcrumb. Paired test (Resolve_ShouldRecoverOpenIdForP2pConfusedInbound_ViaPrefixInference) pins the recovery contract end-to-end. -
IsDirectMessagealiases (r3141451876) — narrowed to strict-ordinal"p2p"with a comment namingChannelConversationTurnRunner.ResolveConversationChatTypeas the single emitter. Test data updated to assertP2P/direct_message/dmare now non-DM (route viaoc_*/chat_id), so any future emitter that introduces a non-"p2p"casing will trip the test rather than silently land in the DM branch. -
/social-media test asymmetry (r3141451874) — typed-field assertion mirrored onto the social_media
AgentBuilderTooltest againstInitializeWorkflowAgentCommand. Both AgentBuilder entry points are now pinned. -
OutboundTarget oneof (r3141451872) — deferred to #408, which captures the proto sketch, migration considerations (proto field deprecation, projection round-trip, reconciling with existing
platformdiscriminator), and the right trigger (next platform's delivery code, not speculative).
CI: fast-gates, projection-provider-e2e, changes green; coverage-quality/console-web/kafka-transport-integration/event-sourcing-regression still in progress. Nothing in the diff blocks merge from my side.
Problem & Approach
Two independent issues surfaced in the same
/dailyfailure trace (see #402 (comment) for the original log triage). Both are scoped to Lark channel runtime and unrelated to the GPT-5 temperature fix in #402.1. SkillRunner / human-interaction outbound DM rejected
SkillRunnerGAgent.SendOutputAsyncandFeishuCardHumanInteractionPort.SendMessageAsyncboth hard-codedreceive_id_type=chat_idwhen posting toopen-apis/im/v1/messages. In DM flows, theOutboundConfig.ConversationIdis the useropen_id(ou_*), not achat_id(oc_*), so every outbound message returned HTTP 400 from Lark.Fix: Introduce
LarkConversationTargets.ResolveReceiveIdTypeto derivereceive_id_typefrom the conversation_id prefix:ou_*→open_idon_*→union_idoc_*and anything else →chat_id(default)Reuse from both call sites.
2. Reaction acknowledgment spams Warnings
ChannelConversationTurnRunner.TrySendImmediateLarkReactionAsyncis a best-effort UX sugar that adds an "OK" emoji to the user's message as a delivery acknowledgment. When the Lark app lacks reaction permission (most common cause: Lark error code 231002The operator has no permission to react on the specific message), it fails on every inbound message until ops fixes the app scope. The currentLogWarningfloods the run log with the same line.Fix: Demote the proxy-error path to
LogDebugso the signal stays discoverable when the channel is opted into verbose logging without spamming Warnings on every turn. The exception path stays atLogWarningbecause thrown exceptions indicate genuine network or code-level issues, not a recurring config gap.Affected Paths
agents/Aevatar.GAgents.ChannelRuntime/LarkConversationTargets.cs(new helper)agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs(SendOutputAsync)agents/Aevatar.GAgents.ChannelRuntime/FeishuCardHumanInteractionPort.cs(SendMessageAsync)agents/Aevatar.GAgents.ChannelRuntime/ChannelConversationTurnRunner.cs(reaction failure log level)test/Aevatar.GAgents.ChannelRuntime.Tests/LarkConversationTargetsTests.cs(new helper unit tests)Verification
Results:
LarkConversationTargetsTests).Out of Scope
temperaturerejection is being fixed in Fix skill runner GPT-5 temperature handling #402.🤖 Generated with Claude Code