Skip to content

[WIP] Lark bot enhancement#562

Open
eanzhao wants to merge 79 commits intodevfrom
feature/lark-bot
Open

[WIP] Lark bot enhancement#562
eanzhao wants to merge 79 commits intodevfrom
feature/lark-bot

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented May 4, 2026

No description provided.

eanzhao and others added 3 commits May 3, 2026 18:47
Two production failure modes observed 2026-05-03 leave a Lark user
visibly stuck even though the cluster is healthy. Both healed locally,
no NyxID changes:

(1) /llm /route /model says "binding 已失效, 请发送 /init" while /init
says "已绑定, 请先 /unbind". After PR #558 re-DCR'd the cluster's
OAuth client to add the `proxy` scope, the user's existing NyxID
binding (issued for the previous client_id) is rejected at broker
token-exchange. ModelChannelSlashCommandHandler caught the rejection
and pointed the user at /init — which then refused because the local
readmodel still holds the (now-dead) binding_id, looping the user
forever.

Self-heal: when the broker throws BindingRevokedException /
BindingNotFoundException / BindingScopeMismatchException, dispatch
RevokeBindingCommand to the local ExternalIdentityBindingGAgent
(reason="auto_self_heal_*") so the readmodel flips to revoked, and
tell the user the binding was cleared and /init will work now.
Mirrors the dispatch shape UnbindChannelSlashCommandHandler uses
for explicit /unbind, but skips the NyxID-side revoke (NyxID is the
one that just told us the binding is gone).

(2) Bot replies as "..." forever when the LLM call fails. The
streaming sink fires the first chunk via channel-relay/reply,
consuming the single-use reply token and creating a placeholder
message. If the LLM then fails (e.g. upstream 429), pre-fix the
runtime fell through to RunLlmReplyAsync which issued a fresh /reply
against the dead token and got `401 Reply token already used` —
leaving the user staring at "..." with no error explanation.

Self-heal: ConversationGAgent.TryCompleteStreamedReplyAsync now
takes the Failed branch when streaming has already committed the
placeholder. Edits the existing message in place via
RunStreamChunkAsync (channel-relay/reply/update — no reply token
needed) with the classified failure text, then persists
ConversationTurnCompletedEvent so the runtime envelope retry loop
does not refire and consume the dead token again. If the edit also
fails (rare: Lark may refuse stale-message edits), persist the
last flushed partial as terminal — same defence-in-depth pattern
the existing PR #374 fix uses for the Completed path.

Tests:
  - 4 new ModelSlashCommandHandlerTests pinning the binding self-heal
    for each rejection shape + degraded-path when IActorRuntime is
    missing.
  - 2 new ConversationGAgentDedupTests pinning the streaming-Failed
    branch edits the placeholder + falls through to "persist last
    flushed partial" when the edit also fails.

Verification:
  dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (851/851)
  dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests --no-build (36/36 in dedup suite)
  dotnet test test/Aevatar.Foundation.Core.Tests             (230/230)
  bash tools/ci/test_stability_guards.sh                     (passed)
  bash tools/ci/query_projection_priming_guard.sh            (passed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… path

Both reviewers caught the same bug: pre-fix the self-heal returned the
"本地已自动清理" message even when IActorRuntime was unregistered or
when CreateAsync / HandleEventAsync threw — the local readmodel was
NOT actually cleaned in those paths, so the user follows the message
to /init, /init still sees the stale binding and refuses, recreating
the exact loop this PR exists to break.

Split the self-heal API into cleanedMessage + degradedMessage. The
cleaned message is returned ONLY when the local revoke envelope was
actually dispatched to the binding actor; otherwise the degraded
message points the user at /unbind explicitly.

Also add a single retry on the dispatch path (mirror
UnbindChannelSlashCommandHandler's PR #521 v4-pro review fix) so a
one-off Orleans hiccup doesn't downgrade an otherwise self-healable
binding to manual /unbind guidance.

Tests:
  - List_DegradesToUnbindGuidance_WhenSelfHealActorRuntimeMissing now
    asserts the reply contains "/unbind" and DOES NOT contain
    "已自动清理" — pinning the no-runtime degraded path.
  - List_DegradesToUnbindGuidance_WhenSelfHealDispatchKeepsThrowing
    is a new test exercising ThrowingActorRuntime (every CreateAsync
    throws) and asserts AttemptCount == 2 (retry-once contract) plus
    the degraded reply shape.

Verification:
  dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (852/852)
  bash tools/ci/test_stability_guards.sh                     (passed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ly-self-heal

Self-heal binding rejections + streaming-failed reply-token loop
@eanzhao eanzhao changed the title Lark bot enhancement [WIP] Lark bot enhancement May 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 80.84633% with 172 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.93%. Comparing base (65e4859) to head (c2f1969).

Files with missing lines Patch % Lines
...on/Studio/Services/NyxIdLlmServiceCatalogParser.cs 79.82% 23 Missing and 23 partials ⚠️
...rc/Aevatar.AI.LLMProviders.MEAI/MEAILLMProvider.cs 67.07% 21 Missing and 6 partials ⚠️
....ToolProviders.NyxId/Tools/NyxIdCodeExecuteTool.cs 42.30% 10 Missing and 5 partials ⚠️
....Studio.Hosting/NyxId/NyxIdLlmCatalogHttpClient.cs 46.15% 13 Missing and 1 partial ⚠️
...Aevatar.AI.ToolProviders.Lark/LarkCardKitClient.cs 87.96% 9 Missing and 4 partials ⚠️
...c/Aevatar.AI.ToolProviders.Skills/SkillRegistry.cs 64.70% 10 Missing and 2 partials ⚠️
...tar.AI.ToolProviders.NyxId/NyxIdAgentToolSource.cs 0.00% 10 Missing ⚠️
...atar.AI.ToolProviders.Ornn/OrnnSearchSkillsTool.cs 0.00% 7 Missing and 2 partials ⚠️
...c/Aevatar.AI.ToolProviders.Ornn/OrnnSkillClient.cs 87.09% 4 Missing and 4 partials ⚠️
src/Aevatar.AI.Core/Chat/ChatRuntime.cs 84.21% 4 Missing and 2 partials ⚠️
... and 6 more
@@            Coverage Diff             @@
##              dev     #562      +/-   ##
==========================================
- Coverage   72.02%   71.93%   -0.10%     
==========================================
  Files        1255     1258       +3     
  Lines       90723    91512     +789     
  Branches    11877    11998     +121     
==========================================
+ Hits        65343    65825     +482     
- Misses      20699    20956     +257     
- Partials     4681     4731      +50     
Flag Coverage Δ
ci 71.93% <80.84%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...Aevatar.AI.Abstractions/LLMProviders/LLMRequest.cs 71.92% <100.00%> (+0.50%) ⬆️
...evatar.AI.Abstractions/LLMProviders/LLMResponse.cs 100.00% <100.00%> (ø)
src/Aevatar.AI.Core/Chat/ChatHistory.cs 81.91% <100.00%> (+0.59%) ⬆️
...atar.AI.LLMProviders.Tornado/TornadoLLMProvider.cs 61.19% <100.00%> (+13.19%) ⬆️
...evatar.AI.ToolProviders.Lark/ILarkCardKitClient.cs 100.00% <100.00%> (ø)
....ToolProviders.Lark/ServiceCollectionExtensions.cs 100.00% <100.00%> (ø)
...c/Aevatar.AI.ToolProviders.NyxId/NyxIdApiClient.cs 69.50% <100.00%> (-0.20%) ⬇️
...Aevatar.AI.ToolProviders.NyxId/NyxIdToolOptions.cs 100.00% <100.00%> (ø)
src/Aevatar.AI.ToolProviders.Ornn/OrnnOptions.cs 100.00% <100.00%> (ø)
....ToolProviders.Ornn/ServiceCollectionExtensions.cs 100.00% <100.00%> (ø)
... and 19 more

... and 42 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

eanzhao and others added 14 commits May 4, 2026 16:58
DeepSeek v4-pro with thinking mode rejects requests when reasoning_content
from prior assistant turns is not echoed back. This change:

- Adds ReasoningContent to ChatMessage, LLMResponse, and LLMStreamChunk
- Propagates reasoning content through ChatRuntime streaming rounds
- Appends reasoning_content to conversation history for multi-turn
- Implements ExtractReasoningContent in MEAILLMProvider
- Wires reasoning content into non-streaming ConvertResponse

Closes #563
Channel conversation LLM runs inside Orleans actors with no HTTP auth
context. The scope resolver returns null, causing
ActorBackedUserMemoryStore to throw InvalidOperationException on every
turn and log a warning.

- Add TryResolveScopeId/TryResolveWriteActorId that return null instead
  of throwing
- ReadProjectedStateAsync returns null when no scope is available
- GetAsync returns UserMemoryDocument.Empty, BuildPromptSectionAsync
  returns empty string — no warning logged
- Write operations (Save/Add/Remove) still throw since they only run
  from Studio API with auth context

Closes #564
Gracefully handle missing scope in UserMemoryStore (fixes #564)
…ing-content

Propagate reasoning_content through LLM pipeline (fixes #563)
"/model encountered NyxID-side binding rejection ({Reason}) but IProjectionReadinessPort is not registered; cannot verify local readmodel cleanup. actor={ActorId}, subject={Platform}:{Tenant}:{User}",
reason,
actorId,
context.Subject.Platform, context.Subject.Tenant, context.Subject.ExternalUserId);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 3 models] severity=blocker, category=arch

The new TryDispatchLocalBindingRevokeAsync runs EnsureProjectionForActorAsync + WaitForBindingStateAsync inside the slash-command request path. Three reviewers flag this as a CLAUDE.md violation: query-time projection priming is forbidden — the application/query layer must not activate a projection scope and synchronously wait for read-model materialization on the request hot path. IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Refactor self-heal so the revocation is fire-and-forget (publish a continuation event, return early to the user with a 'binding cleanup in progress' message), or perform the projection wait off the request path. CLAUDE.md ref: 「禁止 query-time replay/priming」 / 「QueryPort/QueryService/ApplicationService 不得在请求路径...同步补投影」.

Per-model verbatim
  • codex: This makes the /model read/list path activate a projection scope and then synchronously wait for read-model cleanup. That violates CLAUDE.md: 禁止 query-time replay/priming and the ADR note that IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Move this repair behind an explicit write-side command/port that owns projection activation, and keep /model honest if the cleanup cannot already be observed.

  • v4-pro: TryDispatchLocalBindingRevokeAsync calls EnsureProjectionForActorAsync + WaitForBindingStateAsync synchronously in the slash-command handler path. This is projection priming in the request path: activating a projection and waiting for the readmodel to refresh within the request execution. Violates CLAUDE.md:

禁止 query-time priming:查询前若需要先'确保投影存在/刷新 read model',该动作必须在显式 activation、lease、binder 或后台物化流程中完成;禁止在 query 方法内同步补投影。

Move projection activation/binder setup before the slash-command path (e.g. upstream in the conversation pipeline when the inbound message arrives), or make the self-heal fire-and-forget (dispatch revoke + don't synchronously wait for readmodel). The degraded-message fallback already handles cases where the readmodel isn't cleaned within the request.

  • kimi: TryDispatchLocalBindingRevokeAsync calls _bindingProjectionPort.EnsureProjectionForActorAsync(actorId, ct) synchronously inside the slash command handler request path. This is projection priming in the application layer. CLAUDE.md forbids query-time priming: "查询前若需要先'确保投影存在/刷新 read model',该动作必须在显式 activation、lease、binder 或后台物化流程中完成;禁止在 query 方法内同步补投影。" While the OAuth endpoint does similar priming, endpoints are Host layer; a per-user-message slash handler is not. Consider activating binding projection scopes eagerly at startup (e.g. a hosted service that primes all known binding actors) instead of per-request.

var actor = await _actorRuntime
.CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct)
.ConfigureAwait(false);
var envelope = new EventEnvelope
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 4 models] severity=blocker, category=arch

actor.HandleEventAsync(envelope, ct) dispatches RevokeBindingCommand directly through IActorRuntime, bypassing IActorDispatchPort and the actor's inbox / serialization guarantees. Four reviewers flag this as a CLAUDE.md violation of Runtime 与 Dispatch 分责: IActorRuntime may resolve lifecycle/topology, but command delivery must go through the runtime-neutral publisher (publish/send). Inline HandleEventAsync reduces to direct method invocation on the target actor — exactly the pattern CLAUDE.md forbids: 「禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象」. Replace with the runtime-neutral dispatch port (IActorDispatchPort.PublishAsync(...) or the equivalent) and let the actor's inbox process the revoke command.

Per-model verbatim
  • codex: Calling actor.HandleEventAsync directly bypasses IActorDispatchPort and manually inlines delivery into the target actor. This violates CLAUDE.md: Runtime 与 Dispatch 分责 / 投递语义必须 runtime-neutral; IActorRuntime should only resolve lifecycle/topology, while command delivery should go through IActorDispatchPort.DispatchAsync with EnvelopeRouteSemantics.CreateDirect(...) or the existing command-dispatch helper.

  • v4-pro: actor.HandleEventAsync(envelope, ct) dispatches the RevokeBindingCommand directly to the actor via inline method call, bypassing the actor's inbox queue and serialization guarantees. Violates CLAUDE.md:

投递语义必须 runtime-neutral:publish/send 统一表示'进入目标 actor inbox 等待处理',不得因目标是 self 或底层 runtime 差异而退化为 inline dispatch。

The comment at L126 says this mirrors UnbindChannelSlashCommandHandler — but that handler also uses the same pattern and should be fixed too. Use IActorDispatchPort (or equivalent standard dispatch mechanism) instead of CreateAsync + HandleEventAsync.

  • kimi: The handler directly calls actor.HandleEventAsync(envelope, ct) via IActorRuntime, bypassing the standard IActorDispatchPort. CLAUDE.md requires runtime-neutral dispatch: "禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象。" While UnbindChannelSlashCommandHandler uses the same pattern, the new self-heal path amplifies the violation by coupling it with projection priming and retry logic. Refactor to use IActorDispatchPort or a dedicated binding command port so the slash handler does not depend on runtime internals.

  • mimo-v2.5-pro: The slash-command handler now directly orchestrates actor dispatch via IActorRuntime, EventEnvelope, and RevokeBindingCommand. Per CLAUDE.md, slash handlers are channel-routed application services — they should not construct and dispatch actor envelopes. Extract SelfHealRevokedBindingAsync / TryDispatchLocalBindingRevokeAsync into a dedicated IBindingSelfHealService (application layer) and inject that instead. This also simplifies testing: the current test file has to mock IActorRuntime, ExternalIdentityBindingProjectionPort, AND IProjectionReadinessPort just to cover one catch branch.

private readonly ILogger<ModelChannelSlashCommandHandler> _logger;

public ModelChannelSlashCommandHandler(
ILogger<ModelChannelSlashCommandHandler> logger,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 2 models] severity=major, category=di

Constructor injects the concrete ExternalIdentityBindingProjectionPort (and accepts nullable IActorRuntime + IProjectionReadinessPort). Two issues: (a) DI: CLAUDE.md 依赖反转 requires depending on abstractions — define IExternalIdentityBindingProjectionPort and inject the interface (current registration in IdentityServiceCollectionExtensions.cs is services.TryAddSingleton<ExternalIdentityBindingProjectionPort>() against the concrete type). (b) silent degradation: nullable self-heal collaborators mean if any registration is missing in production DI the self-heal — the user-facing fix this PR delivers — silently disappears with no error. Either make these required (non-nullable), or add a startup validation that logs a warning when self-heal will be inactive.

Per-model verbatim
  • glm-5.1: IActorRuntime, ExternalIdentityBindingProjectionPort, and IProjectionReadinessPort are all nullable and degrade gracefully, but the self-heal feature is the primary user-facing fix in this PR. If any of these aren't registered in the production DI container, the handler silently falls back to the degraded message and the binding loop is unbroken. Verify registrations exist in the host composition root — if they're not yet wired, this should be a blocker.

  • kimi: Constructor injects the concrete ExternalIdentityBindingProjectionPort instead of an abstraction. CLAUDE.md: "依赖反转:上层依赖抽象,禁止跨层反向依赖和对具体实现的直接耦合。" The port is registered as a singleton concrete class (IdentityServiceCollectionExtensions.cs:86, services.TryAddSingleton<ExternalIdentityBindingProjectionPort>()). Define IExternalIdentityBindingProjectionPort, register against it, and inject the interface.


public async Task<bool> RemoveEntryAsync(string id, CancellationToken ct = default)
{
var actor = await EnsureWriteActorAsync(ct);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 3 models] severity=major, category=concurrency

RemoveEntryAsync was reordered so EnsureWriteActorAsync runs before ReadProjectedStateAsync. Three reviewers flag this side-effect change: previously, calling RemoveEntryAsync("missing") was a no-op that returned false without materializing any actor; now it always activates the user-memory actor first (state hydration, reminder registration, storage I/O), even when the entry does not exist (returns false at the existence check). This violates CLAUDE.md 默认路径须定义资源语义 — a no-op delete now implicitly creates a resource, which has cost (cluster activation, cache pressure) and changes resource lifecycle. Move actor resolution after the existence check; only resolve the scope/actor when there is real work to do.

Per-model verbatim
  • codex: RemoveEntryAsync("missing") now creates or materializes the user-memory actor before checking whether the entry exists. That turns a no-op delete into implicit resource creation, violating CLAUDE.md: 默认路径须定义资源语义; keep the projected read first, and if the no-scope case must throw, resolve the scope ID explicitly without calling EnsureWriteActorAsync until a matching entry is found.

  • kimi: RemoveEntryAsync was reordered to call EnsureWriteActorAsync before ReadProjectedStateAsync. This causes unnecessary actor activation when the entry does not exist. Actor creation has side effects (state hydration, reminder registration, storage reads). Move actor resolution back after the existence check, or document why eager activation is required.

  • mimo-v2.5-pro: The reorder of EnsureWriteActorAsync before ReadProjectedStateAsync in RemoveEntryAsync changes the side-effect ordering. Previously the actor was only created when the entry existed. Now every RemoveEntryAsync call creates the actor even when the entry is missing (returns false at line 133). The actor creation is idempotent, but it's an unnecessary round-trip in the common "entry not found" case. Consider moving EnsureWriteActorAsync back after the existence check, or documenting that the actor must exist before the read (if there's a race-condition fix reason for the reorder).

}

public async Task<string> BuildPromptSectionAsync(int maxChars = 2000, CancellationToken ct = default)
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[codex] severity=major, category=bug

Removing the previous try/catch means a transient projection/read-model failure in BuildPromptSectionAsync will now fail the whole chat/prompt construction path instead of degrading by omitting memory. Restore the best-effort behavior here: catch non-cancellation exceptions from GetAsync, log the warning, and return string.Empty.

runtime.Actors.Should().NotContainKey("user-memory-user-1",
"no actor should be created when entry is missing");
var actor = runtime.Actors["user-memory-user-1"];
actor.ReceivedEnvelopes.Should().BeEmpty("no remove command should be dispatched when entry is missing");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[v4-pro] severity=major, category=test

RemoveEntryAsync_MissingEntry_ReturnsFalse changed from asserting runtime.Actors.Should().NotContainKey(...) (actor NOT created) to asserting runtime.Actors["user-memory-user-1"] EXISTS with empty envelopes. But the new RemoveEntryAsync reads state first and returns false before calling EnsureWriteActorAsync when the entry is missing — the actor should not be created at all. Either the test is wrong or the test infrastructure (EmptyReader, FakeActorRuntime) triggers an unexpected actor creation. Verify the test passes against the actual code change.

AttachOpenAIRawRepresentationForReasoning(meaiMsg, msg);
result.Add(meaiMsg);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 3 models] severity=major, category=arch

AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's serialized JSON via rawMessage.Patch.Set("$.reasoning_content"u8, ...) under #pragma warning disable SCME0001. Three reviewers flag the same fragility: Patch is an SDK-internal experimental surface (SCME0001), reasoning_content is not a documented field on AssistantChatMessage, and the JSONPath relies on the SDK's current serialization shape. Any SDK update that changes the patch model, the serializer, or the message structure will silently break or drop reasoning content propagation — there is no compile-time guarantee. Mitigations: (a) pin the OpenAI SDK version, (b) add an integration test that asserts the serialized JSON contains reasoning_content, and (c) file a feature request upstream for first-class reasoning content support so the workaround can be retired.

Per-model verbatim
  • glm-5.1: rawMessage.Patch.Set("$.reasoning_content"u8, ...) relies on the OpenAI SDK's internal Patch/SCME0001 surface to inject reasoning_content into the serialized JSON. This is fragile — any SDK update that changes the patch model or serialization pipeline will silently drop reasoning content or throw at runtime. Consider isolating this behind a version-checked helper (or a feature flag) so a future SDK bump doesn't cause a silent regression.

  • v4-pro: AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's AssistantChatMessage via rawMessage.Patch.Set("$.reasoning_content"u8, ...). This hardcoded JSON-path depends on the SDK's internal serialization structure (AssistantChatMessage.ModelReaderWriter.Write output) which could change between versions and silently break reasoning propagation. Consider a custom AssistantChatMessage subclass or an explicit BinaryContent wrapper if the SDK doesn't natively support reasoning_content.

  • mimo-v2.5-pro: AttachOpenAIRawRepresentationForReasoning uses rawMessage.Patch.Set("$.reasoning_content"u8, ...) with #pragma warning disable SCME0001. This is a fragile internal API — the OpenAI SDK doesn't document reasoning_content as a supported field on AssistantChatMessage, and the Patch property is implementation-detail. If the SDK changes wire format or removes the Patch escape hatch, this will silently break reasoning round-tripping. Add a fallback: if the patch throws (or if RawRepresentation doesn't survive a serialize-then-deserialize cycle), log a warning and degrade gracefully rather than crashing the entire chat call.

CancellationToken ct)
{
try
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[v4-pro] severity=minor, category=perf

MergeProxyRouteCandidatesAsync calls _nyxClient.DiscoverProxyServicesAsync synchronously within GetServicesAsync. If the /api/v1/proxy/services endpoint is slow or unreachable (and the catch swallows the exception with a warning), the catalog load time doubles on every /model or /route invocation. Consider making the proxy-service discovery happen on a background refresh, or caching with a TTL, rather than per-request.

NyxIdLlmServicesResult result,
string bearerToken,
CancellationToken ct)
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mimo-v2.5-pro] severity=minor, category=design

The proxy endpoint path /api/v1/proxy/services?per_page=100 is hardcoded here AND in NyxIdLlmServiceCatalogClient.cs:63. Two independent HTTP clients call the same endpoint. Extract the path into a shared constant (or better, into NyxIdLlmProviderSource / a config key) so the two call-sites stay in sync if the path changes.

/// 注册 Ornn 技能工具系统。ornn_search_skills 工具始终注册到 LLM;远程技能按需获取通过
/// IRemoteSkillFetcher 集成到统一的 use_skill 工具。所有 Ornn API 调用通过 NyxID 的
/// proxy 路由,因此调用方必须先注册 NyxIdApiClient(一般通过 AddNyxIdTools)。
/// </summary>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[kimi] severity=minor, category=di

AddOrnnSkills registers OrnnSkillClient which now requires NyxIdApiClient, but the extension does not register or assert that dependency. Hosts that call AddOrnnSkills without also calling AddNyxIdTools will hit a runtime DI resolution failure. Add an explicit services.TryAddSingleton<NyxIdApiClient>() guard here, or document that callers must register NyxID tools first and add a runtime guard that throws a clear InvalidOperationException if the client is missing.

configure?.Invoke(options);
services.TryAddSingleton(options);
services.TryAddSingleton<OrnnSkillClient>();
services.TryAddSingleton<IRemoteSkillFetcher, OrnnRemoteSkillFetcher>();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[v4-pro] severity=minor, category=di

DI contract change: OrnnSkillClient now REQUIRES NyxIdApiClient (was optional HttpClient). services.TryAddSingleton<OrnnSkillClient>() at line 23 will fail at resolution time if NyxIdApiClient is not already registered. The comment at line 12-13 documents the requirement but there's no runtime guard. If EnableOrnnSkills=true but AddNyxIdTools() was not called (unlikely in production, possible in test harnesses), the DI container throws InvalidOperationException. Consider adding a services.AddOrnnSkills guard that checks whether NyxIdApiClient is already registered.

@@ -401,6 +414,18 @@ private async Task DispatchOutputChunksAsync(
/// </summary>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mimo] severity=nit, category=other

The comment references "PR #569 review, codex P1 on SkillRunnerGAgent.cs:351" — the line number will drift as the file evolves. Prefer referencing the commit hash or a stable anchor rather than a line number.

<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<IsPackable>false</IsPackable>
<IsTestProject>true</IsTestProject>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[v4-pro] severity=nit, category=test

New test project lacks Aevatar.AI.ToolProviders.NyxId project reference. OrnnSkillClientTests.cs instantiates NyxIdApiClient directly (line: new Aevatar.AI.ToolProviders.NyxId.NyxIdApiClient(...)) via the using statement in OrnnSkillClientTests.cs:4, but the .csproj doesn't have a <ProjectReference> to Aevatar.AI.ToolProviders.NyxId. If the project builds, it's because Aevatar.AI.ToolProviders.Ornn transitively references it (line 11: <ProjectReference Include="..\Aevatar.AI.ToolProviders.NyxId\..." />). This works but is fragile — make the reference explicit.

eanzhao and others added 8 commits May 8, 2026 10:03
Codecov flagged 0% coverage on LarkCardKitClient (81 lines) and
ILarkCardKitClient (17 lines). Mirror the LarkNyxClient
HTTP-recording pattern from LarkCoverageTests so the four CardKit
methods exercise the real transport: a fake HttpMessageHandler
captures the request, the test asserts URL + method + body shape.

Coverage:
- CreateCardAsync: POST /cards, body embeds DataJson as a nested
  JSON object (load-bearing — Lark rejects double-encoded payloads).
- StreamElementContentAsync: PUT to .../elements/{element_id}/
  content, sequence + uuid wired through.
- StreamElementContentAsync omits `uuid` when the IdempotencyKey is
  blank (Lark rejects empty uuids on some endpoints).
- StreamElementContentAsync runs ids through Uri.EscapeDataString
  (verified via AbsoluteUri so percent-encoded chars survive — the
  default Uri.ToString() decodes %20 back to a literal space).
- SetCardSettingsAsync: PATCH on settings, settings JSON inline.
- UpdateCardAsync: PUT card-as-object, sequence carried, no uuid.
- ParseJsonObject rejects blank/malformed JSON at the boundary
  rather than letting it surface as a 400 from Lark.
- AddLarkTools registers ILarkCardKitClient as a singleton.

54/54 Aevatar.AI.ToolProviders.Lark.Tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review batch on PR #562 (10 inline comments). All in files I have
recent ownership of and require no architectural shifts:

- #16 (blocker, security): ssh_exec is now opt-in via NyxIdToolOptions.
  EnableSshExecTool. Hosts that haven't wired the approval middleware no
  longer see the tool by default. Mainnet host opts in (Lark bot needs it).
- #21 (major, bug): code_execute keeps the modern /execute + {language,
  script} contract, but on a NyxID-proxy upstream 404 it retries the legacy
  /run + {language, code} contract so deployments still pinned to old
  chrono-sandbox-service builds keep working.
- #22 (major, bug): SkillRegistry.IsFresh now exempts SkillSource != Remote
  from TTL — local skills are baked in at registration and don't need
  expiring; prior behavior dropped them from use_skill after the first 5min.
- #18 (major, bug): TurnRunner.TryResolveSenderBindingAsync narrows the
  catch to transient infra errors (Http/Timeout/IO/JSON) and surfaces
  non-transient (logic, NRE, serialization) at Error level so ops can
  distinguish "sender unbound" from "binding store broken".
- #19 (major, bug): ConversationReplyGenerator narrows the
  sender-route-fallback catch to transient errors via
  IsRetryableSenderRouteFailure. Programmer errors no longer cost an LLM
  round on retry.
- #29 + #30 (minor): inbox runtime gives metadata enrichment its own 15s
  budget separate from the LLM run, surfacing
  errorCode=llm_reply_metadata_timeout when scope/UserConfig lookup is
  slow. ResolveFallbackTimeout treats ResponseTimeoutSeconds<=0 as "no
  timeout" rather than silently snapping back to 120s.
- #12 (minor): ConversationGAgent's stream-chunk and final-stream-chunk
  edits run under a 10s CTS now; the failure path already uses one. A hung
  relay can no longer pin the actor turn forever.
- #27 (minor, security): ConstantTimeEquals docstring tightened — removed
  the "for future callers" line and added a SCOPE comment that this helper
  is rebuild-admin-only and shouldn't be promoted to internal/public
  without replacing its length-leak with a length-padding scheme.
- #23 (major, bug): CLI ornn skills slug default → ornn-api (matches the
  registered slug; bare "ornn" is the SPA frontend that returns HTML).

Build clean (NyxId / Skills / NyxidChat / Mainnet hosts), 30 AI tests +
15 inbox runtime tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- #15 (major, di): AddOrnnSkills now self-registers NyxIdApiClient and
  NyxIdToolOptions via TryAdd safety nets so workflow hosts that enable
  Ornn without AddNyxIdTools get a clean 404 path on missing BaseUrl
  rather than a confusing DI resolution failure.
- #3 (major, di): Introduce IExternalIdentityBindingProjectionPort and
  inject the interface in IdentityOAuthEndpoints. The concrete class
  remains for runtime composition; the registration also publishes the
  interface as a forwarder so existing tests that hold the concrete type
  keep working.
- #14 + #17 (major, concurrency): TurnStreamingReplySink TOCTOU concern
  was based on a misread — drainSignal is captured inside the same lock
  acquisition as _dispatchInProgress=false in the cap branch, and the
  throttle gate's nextIsFinal=false invariant makes _drainTcs guaranteed
  null on that path. Document the invariants so future readers don't
  re-flag this.
- #20 (major, arch): Replace the singleton Dictionary cache in
  NyxIdLlmServiceCatalogClient with IMemoryCache. Per CLAUDE.md
  "中间层状态约束", per-caller state lives in a host-owned cache, not
  a service field. AbsoluteExpiration policy preserved (30s).
- #9 (minor): /api/v1/proxy/services?per_page=100 was already extracted
  into NyxIdLlmCatalogRoutes.ProxyServicesPath — both call sites already
  use the constant.
- #10 (minor): LooksLikeLlmRouteCandidate already has negative-signal
  filtering plus a two-distinct-weak-signals threshold. No change.
- #11 (minor): ExternalIdentityBindingProjector already logs a Warning
  with DocumentId/EventId/Version on the empty-binding delete path.

Build clean (Ornn / Identity / NyxidChat / Channel.Runtime), updated
catalog client test passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- #13 (major, arch): /api/oauth/aevatar-client/rebuild now dispatches
  ProvisionAevatarOAuthClientCommand via IActorDispatchPort.DispatchAsync,
  matching /unbind. The inline actor.HandleEventAsync was a known
  CLAUDE.md "投递语义必须 runtime-neutral" violation; aligning the two
  endpoints removes the inconsistency that any future inbox middleware
  would silently bypass on rebuild.
- #24 (minor, design): callback endpoint accepts ?format=json on the URL
  to opt back into the {status:"bound", already_bound, display_name}
  envelope that programmatic CLI/SDK consumers used pre-HTML-render.
  Default stays HTML for browser callbacks.
- #26 (minor, arch): /rebuild now sits behind a RebuildAuthEndpointFilter
  that enforces the admin-token check before model binding and per-request
  DI activation kick in. The filter + the inline check in the handler are
  redundant by design (defense in depth) — the filter rejects unauth
  posts before deserialization runs, and the handler still validates so
  hand-rolled tests/integration scenarios cannot bypass.
- #28 (minor, design): document the readmodel-deletion contract in the
  ExternalIdentityBindingProjector header — empty BindingId deletes the
  document instead of upserting an inactive record; downstream audit
  consumers must read the committed-event log directly.
- #1 + #2 (blocker, arch): no change needed. Earlier commits in this PR
  already moved /model self-heal to IActorDispatchPort.DispatchAsync and
  removed the EnsureProjectionForActorAsync call from the slash-command
  request path. Verified by reading the current handler.
- #25 (minor, test): documented in the rebuild handler comment — concurrent
  /rebuild calls would race on the same actor, but this is operator-grade
  break-glass and de-duping concurrent rebuilds is out of scope.

Build clean (Identity), 34 OAuth-path tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- #7 (major, arch): wrap MEAILLMProvider's Patch.Set("$.reasoning_content")
  in try/catch so a future OpenAI SDK contract break (Patch surface change,
  serializer rewrite, field rename) degrades to "no reasoning replay"
  instead of crashing the chat call. The OpenAI package is already pinned
  to 2.9.1 in Directory.Packages.props, and the existing
  AIComponentCoverageTests already pin the serialized JSON shape so any
  drift fails the build the moment Patch stops landing in the payload.
- Test fix for #22 (skill registry TTL): the old
  TryGet_BeyondTtl_ReturnsFalseSoCallerCanRefetch test relied on TTL
  expiring SkillSource.Local entries — that's the bug #22 flagged. Updated
  the stale-entry tests to use SkillSource.Remote (via remoteId) which is
  the realistic stale scenario. Added a new
  TryGet_LocalSkillBeyondTtl_StillFresh test pinning the new behaviour.

538 AI tests + 897 ChannelRuntime tests + 16 Ornn tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov flagged 1 missing + 2 partial branches after the previous
coverage commit (4c537ad). Add three targeted tests:

* CreateCardAsync_RejectsLiteralNullJson: ParseJsonObject's
  `?? throw` branch only fires when JsonNode.Parse returns null
  without throwing — and the only input that does that is the
  literal JSON `"null"`. Without the throw, the request would
  serialize as `"data": null` and Lark would reject it as a missing
  field; the boundary check turns the silent null into a clear
  ArgumentException at the caller's call site.

* SetCardSettingsAsync_OmitsUuid_WhenIdempotencyKeyIsBlank: covers
  the false branch of `if (!string.IsNullOrWhiteSpace(...))` for
  the settings endpoint. Mirrors the existing StreamElementContent
  variant.

* UpdateCardAsync_PassesIdempotencyKey_WhenProvided: covers the
  true branch of the same conditional for UpdateCardAsync, plus
  asserts the idempotency key is .Trim()'d before emission so
  accidental whitespace does not defeat Lark's dedup.

13/13 LarkCardKitClient tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-model fix-check verdict (3/5 partial, 2/5 unresolved on
discussion_r3201406794) flagged the residual structural risk: the
runtime-only `card_mode` boolean still lived on
LlmReplyStreamChunkEvent — a domain-event-shaped envelope — so a
misbehaving persistence layer could replay it and silently re-route
to the card sink. Comment-only mitigation was acceptable to three
reviewers but not to glm-5.1 / codex.

Eliminate the risk by construction:

* New proto LlmReplyCardStreamChunkEvent in conversation_events.proto
  carries the same fields as LlmReplyStreamChunkEvent, dispatched by
  the card sink path. The proto type itself signals routing — there
  is no boolean to flip.
* LlmReplyStreamChunkEvent.card_mode is removed and the field number
  is `reserved` along with its name so any stale serializer or
  accidental reuse fails loudly instead of silently re-enabling
  card mode.
* TurnStreamingReplySink.DispatchOneAsync now constructs the correct
  proto type based on the existing `cardMode` constructor flag.
* ConversationGAgent grows a second [EventHandler]
  HandleLlmReplyCardStreamChunkAsync. The legacy
  HandleLlmReplyStreamChunkAsync delegates to a private
  HandleNyxRelayStreamingChunkCoreAsync helper; the card handler runs
  the card-streaming machine and, on CreationFailed, synthesizes an
  edit-message chunk and falls through to the same helper so the
  user still gets a reply when card creation fails.
* IConversationCardTurnRunner / ChannelCardConversationTurnRunner /
  HandleLarkCardStreamingChunkCoreAsync take
  LlmReplyCardStreamChunkEvent. Tests update RecordingCardTurnRunner
  factories and CreateCardStreamChunk helper accordingly; tests now
  exercise HandleLlmReplyCardStreamChunkAsync instead of relying on
  a CardMode flag on the legacy event.

Spot-checks for the two manual-eyeball items in the verdict:
* `git grep ConfigureAwait agents/Aevatar.GAgents.Channel.Runtime/
   Conversation/ConversationGAgent*.cs` finds only a comment mention
  ("no ConfigureAwait(false)") — no actual usage. mimo's dissent on
  this point was a misread.
* The three classifier helpers
  (ClassifyCreateFailure / ClassifyPostSendFailure /
  ClassifyStreamFailure) consistently map 11310 → IsTableLimitExceeded
  and 230099 / 230100 → IsCardUnavailable. No 230099 lands in any
  IsTableLimitExceeded path. codex's "neither classifier sets it"
  read is the accurate one.

Tests: 133 protocol + 36 ChannelRuntime streaming + 57 Lark tools,
all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented May 8, 2026

🔁 /opencode-pr-fix-check verdict

Re-ran the same 5 reviewers (deepseek-v4-pro · glm-5.1 · mimo-v2.5-pro · kimi · codex; gemini skipped — needs interactive auth) against HEAD after the fix commits. Verdict per inline comment below.

Score: 31 ✅ resolved · 1 ⚠️ partial · 5 ❌ unresolved · 0 ❓ cannot_determine (out of 37 top-level inline comments)

Tiebreak is conservative: unresolved > partial > resolved > cannot_determine. Where reviewers disagreed I keep the more pessimistic verdict and surface the dissent below.

Items still needing attention (6)

unresolved (2/5) — agents/Aevatar.GAgents.Channel.Runtime/TurnStreamingReplySink.cs:303 (dissent: v4-pro=resolved, glm-5.1=resolved, mimo=partial)

Why still flagged: The _chunksEmitted >= _maxInterimChunks exit path in DispatchLoopAsync still sets _dispatchInProgress = false, captures drainSignal = _drainTcs, clears _drainTcs = null, and breaks. If FinalizeAsync races in after the lock is released, the newly created _drainTcs is never signalled. The reviewer requested capturing drainSignal and clearing _dispatchInProgress atomically while ensuring _pendingText is drained before exit, which was not implemented.

Per-reviewer reasoning
  • v4-proresolved (high): The maxInterimChunks gate now captures drainSignal = _drainTcs and nulls _drainTcs atomically inside the same lock acquisition, eliminating the TOCTOU race. The invariant (nextIsFinal is false when reaching the cap) is documented in the comment.
  • glm-5.1resolved (medium): The dispatch loop was restructured: the maxInterimChunks gate captures drainSignal atomically under lock, and the throttle gate uses armTimerDelay to arm the timer outside the lock. Concurrent OnDeltaAsync correctly picks up pending text via the !_dispatchInProgress path.
  • mimopartial (medium): The _chunksEmitted >= _maxInterimChunks path still sets _dispatchInProgress = false and captures drainSignal (_drainTcs) under the lock, then sets _drainTcs = null immediately after. A concurrent FinalizeAsync between these two lines could still set a new _drainTcs that the exiting loop doesn't see. The throttle gate (line 313 area) has a similar TOCTOU where _dispatchInProgress is cleared before the deferred timer is armed. Both were flagged as remaining concurrency concerns.
  • kimiunresolved (high): The _chunksEmitted >= _maxInterimChunks exit path in DispatchLoopAsync still sets _dispatchInProgress = false, captures drainSignal = _drainTcs, clears _drainTcs = null, and breaks. If FinalizeAsync races in after the lock is released, the newly created _drainTcs is never signalled. The reviewer requested capturing drainSignal and clearing _dispatchInProgress atomically while ensuring _pendingText is drained before exit, which was not implemented.
  • codexunresolved (high): The maxInterimChunks early-exit path still sets _dispatchInProgress = false, captures _drainTcs, clears it, and breaks while pending text remains. The race described in the comment is not structurally fixed; the current code mainly adds explanatory comments.

unresolved (2/5) — agents/Aevatar.GAgents.Channel.Runtime/TurnStreamingReplySink.cs:319 (dissent: v4-pro=partial, glm-5.1=resolved, mimo=partial)

Why still flagged: The throttle gate at the equivalent of line 313 still sets _dispatchInProgress = false inside the lock while _pendingText is not consumed, then breaks. Between this lock release and the timer-arm lock below, a concurrent OnDeltaAsync can overwrite _pendingText and start a new dispatch loop before the deferred timer is armed. The reviewer specifically requested not releasing _dispatchInProgress until after the timer is armed, which was not done.

Per-reviewer reasoning
  • v4-propartial (medium): The TOCTOU gap in the throttle gate (dispatchInProgress set to false before timer arm) still exists in principle. However, data loss is mitigated because OnDeltaAsync always passes the full accumulated text (a superset of any previously-stashed partial), so overwriting _pendingText with a newer accumulated value preserves correctness. No explicit lock refactoring was done for this gate.
  • glm-5.1resolved (medium): Same restructuring as comment 3205773725. The throttle gate's _dispatchInProgress = false and armTimerDelay approach means concurrent OnDeltaAsync will start its own dispatch loop to pick up stashed text, preventing loss.
  • mimopartial (medium): The throttle gate sets _dispatchInProgress = false inside the lock but arms the timer outside the lock. Between the lock release and the timer arm, a concurrent OnDeltaAsyncFlushAsync can overwrite _pendingText because _dispatchInProgress is already false. This is the exact TOCTOU the comment flagged at line 313. The fix mentioned in the comment ('don't release _dispatchInProgress until after the timer is armed' or 'record deferred text into a separate field') was not applied.
  • kimiunresolved (high): The throttle gate at the equivalent of line 313 still sets _dispatchInProgress = false inside the lock while _pendingText is not consumed, then breaks. Between this lock release and the timer-arm lock below, a concurrent OnDeltaAsync can overwrite _pendingText and start a new dispatch loop before the deferred timer is armed. The reviewer specifically requested not releasing _dispatchInProgress until after the timer is armed, which was not done.
  • codexunresolved (high): The throttle gate still releases _dispatchInProgress while _pendingText is not consumed and arms the timer after leaving the lock. There is no separate deferred text field or atomic timer-arm handoff to eliminate the TOCTOU window.

unresolved (3/5) — agents/Aevatar.GAgents.NyxidChat/ConversationReplyGenerator.cs:120 (dissent: v4-pro=partial, kimi=resolved)

Why still flagged: No warning log was added when IRemoteSkillFetcher is null but SkillRegistry is not null. The concern was minor and the system prompt still references use_skill.

Per-reviewer reasoning
  • v4-propartial (medium): The suggestion to add a construction-time warning when _remoteSkillFetcher is null but _skillRegistry is not null was not implemented. The logging suggestion is minor and the system functions correctly either way, but the suggestion was not acted upon.
  • glm-5.1unresolved (high): No warning log was added when IRemoteSkillFetcher is null but SkillRegistry is not null. The concern was minor and the system prompt still references use_skill.
  • mimounresolved (medium): The constructor still silently accepts null for IRemoteSkillFetcher without logging. When _skillRegistry is not null but _remoteSkillFetcher is null, the use_skill tool will be registered (from the skill registry) but cannot fetch remote skills — a confusing silent degradation. No warning log was added.
  • kimiresolved (medium): The null fetcher case is handled gracefully: BuildTurnToolsAsync registers UseSkillTool with a fallback new SkillRegistry() when _remoteSkillFetcher is null. While no construction-time warning was added, the functional gap is closed because the system no longer fails silently when the fetcher is missing.
  • codexunresolved (medium): NyxIdConversationReplyGenerator still accepts nullable IRemoteSkillFetcher and does not log at construction when it is missing while a SkillRegistry exists. The requested warning is not present in the diff.

unresolved (4/5) — agents/Aevatar.GAgents.Scheduled/SkillRunnerGAgent.cs:414 (dissent: v4-pro=partial)

Why still flagged: The comment still references 'SkillRunnerGAgent.cs:351' (a line number that will drift). No change was made.

Per-reviewer reasoning
  • v4-propartial (medium): The comment still references 'PR fix(skill-runner): treat zero-tool-call runs as failure for fetch-and-summarize skills (#439) #569 review, codex P1 on SkillRunnerGAgent.cs:351' with a line number that will drift as the file evolves. The suggestion to use a commit hash or stable anchor was not implemented. Nit-level concern.
  • glm-5.1unresolved (high): The comment still references 'SkillRunnerGAgent.cs:351' (a line number that will drift). No change was made.
  • mimounresolved (low): The comment is a nit about referencing a line number in a comment that will drift. The line-number reference (SkillRunnerGAgent.cs:351) is still present in the current code. This is a minor documentation nit with no functional impact.
  • kimiunresolved (high): The comment in SkillRunnerGAgent.cs still references 'PR fix(skill-runner): treat zero-tool-call runs as failure for fetch-and-summarize skills (#439) #569 review, codex P1 on SkillRunnerGAgent.cs:351'. The line number was not replaced with a commit hash or stable anchor, so it will drift as the file evolves.
  • codexunresolved (high): The SkillRunnerGAgent comment still references a drifting source line number: 'codex P1 on SkillRunnerGAgent.cs:351'. No stable anchor or commit reference was substituted.

unresolved (2/5) — test/Aevatar.AI.ToolProviders.Ornn.Tests/Aevatar.AI.ToolProviders.Ornn.Tests.csproj:7 (dissent: v4-pro=partial, mimo=resolved, kimi=partial)

Why still flagged: The test .csproj still only has a transitive reference to Aevatar.AI.ToolProviders.NyxId through Aevatar.AI.ToolProviders.Ornn. It builds correctly but the reference is fragile.

Per-reviewer reasoning
  • v4-propartial (medium): The test project still relies on transitive dependency (Aevatar.AI.ToolProviders.Ornn references NyxId, which the test project references). No explicit to Aevatar.AI.ToolProviders.NyxId was added. The project builds correctly via transitivity but the reference is not explicit. Nit-level concern.
  • glm-5.1unresolved (medium): The test .csproj still only has a transitive reference to Aevatar.AI.ToolProviders.NyxId through Aevatar.AI.ToolProviders.Ornn. It builds correctly but the reference is fragile.
  • mimoresolved (medium): The test project builds successfully because Aevatar.AI.ToolProviders.Ornn transitively references Aevatar.AI.ToolProviders.NyxId. While making the reference explicit would be more robust, the comment's concern (that the project wouldn't build) is not valid — transitive references are a supported .NET feature.
  • kimipartial (medium): The test project builds because Aevatar.AI.ToolProviders.Ornn.csproj transitively references Aevatar.AI.ToolProviders.NyxId. However, the reviewer requested making the reference explicit in the test .csproj for robustness, which was not done.
  • codexunresolved (high): The new Ornn test project still only references Aevatar.AI.ToolProviders.Ornn. It does not explicitly reference Aevatar.AI.ToolProviders.NyxId despite tests instantiating NyxIdApiClient directly.

⚠️ partial (5/5) — src/Aevatar.AI.LLMProviders.MEAI/MEAILLMProvider.cs:282

Why still flagged: A try/catch around the Patch.Set call provides graceful degradation on SDK break, and integration tests (MEAILLMProvider_ConvertMessages_ShouldIncludeReasoningContent) pin serialization output. However, the underlying fragility of depending on an undocumented SDK field (SCME0001) remains, and no upstream feature request is visible in the diff.

Per-reviewer reasoning
  • v4-propartial (high): A try/catch around the Patch.Set call provides graceful degradation on SDK break, and integration tests (MEAILLMProvider_ConvertMessages_ShouldIncludeReasoningContent) pin serialization output. However, the underlying fragility of depending on an undocumented SDK field (SCME0001) remains, and no upstream feature request is visible in the diff.
  • glm-5.1partial (medium): An integration test (MEAILLMProvider_ConvertMessages_ShouldIncludeReasoningContent) now asserts the serialized JSON contains reasoning_content. However, the Patch.Set approach with SCME0001 pragma remains unchanged, and there's no evidence of an upstream feature request or SDK version pinning from the diff alone.
  • mimopartial (high): The try/catch fallback was added so an SDK break degrades to 'no reasoning replay' instead of crashing. However, the underlying fragility of rawMessage.Patch.Set("$.reasoning_content"u8, ...) using the undocumented SCME0001 surface remains — there is no compile-time guard or integration test that asserts the serialized JSON contains reasoning_content after an SDK bump. The comment asked for (b) an integration test asserting the JSON shape, which is not present in this diff.
  • kimipartial (medium): Integration tests were added in AIComponentCoverageTests.cs that assert reasoning_content survives serialization via ModelReaderWriter. The code also wraps the Patch call in try/catch for graceful degradation. However, it still relies on the experimental SCME0001 Patch surface with no visible OpenAI SDK version pinning or upstream feature request to retire the workaround.
  • codexpartial (medium): The MEAI workaround is now isolated with try/catch and tests assert serialized OpenAI JSON contains reasoning_content. The diff does not show an SDK version pin or tracked upstream feature request, so the full mitigation set requested by the comment is not evident.

Resolved threads not listed above. Raw verdicts and reviewer outputs in $WORK/ for re-runs.

Generated by /opencode-pr-fix-check. The skill does not auto-resolve review threads — please mark resolved manually as appropriate.

eanzhao and others added 17 commits May 8, 2026 13:38
…-phase-machine

Lark streaming: phase machine refactor (#405) + CardKit streaming (#589)
Resolve the items the /opencode-pr-fix-check verdict flagged after the
first review pass:

- TurnStreamingReplySink #14 (cap branch): clear `_pendingText` when the
  cap is hit so the reviewer-asked invariant "pending text is never left
  behind when we release `_dispatchInProgress`" actually holds. The
  earlier "preserve _pendingText for FinalizeAsync" comment was wrong:
  FinalizeAsync uses its own `text` parameter, not the stash, so clearing
  is harmless.
- TurnStreamingReplySink #17 (throttle gate): arm the deferred-flush
  timer inside the same `lock` acquisition that releases
  `_dispatchInProgress = false`. A concurrent OnDeltaAsync can no longer
  observe the (no-timer + not-dispatching) gap. The trailing
  outside-lock `armTimerDelay` block is gone.
- ConversationReplyGenerator: log a Warning at construction when
  SkillRegistry is wired but IRemoteSkillFetcher is missing — that
  config silently advertises use_skill yet can't pull remote skills.
  Single grepable line for ops.
- SkillRunnerGAgent: replace the drifting `SkillRunnerGAgent.cs:351`
  reference in the streaming-sink comment with the stable anchor
  `EnsureToolStatusAllowsCompletion`.
- Ornn.Tests.csproj: explicit ProjectReference to
  Aevatar.AI.ToolProviders.NyxId — tests instantiate NyxIdApiClient
  directly so depending on transitive flow-through from Ornn was
  fragile.
- MEAILLMProvider Patch.Set: documentation upgrade — call out the
  three layered mitigations (SDK pin in Directory.Packages.props,
  AIComponentCoverageTests JSON-shape integration assertion,
  try/catch graceful degradation) and the upstream tracking path
  (file an openai-dotnet issue + retire the SCME0001 hack when a
  typed reasoning_content lands).

Build clean (Channel.Runtime, NyxidChat, Scheduled, MEAI, Ornn.Tests),
30 streaming-sink tests + 16 Ornn tests + 18 MEAI/coverage tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "safety net" I added in PR #562 review #15 (commit 673809a) was
itself the cause of a production outage on 2026-05-08:

  System.InvalidOperationException: NyxID base URL is not configured.
     at NyxIdRelayAuthValidator.ResolveDiscoveryUrl()
     at NyxIdChatEndpoints.HandleRelayWebhookAsync ...

`AddAevatarAIFeatures` runs `RegisterOrnnSkills` BEFORE the mainnet host
calls `AddNyxIdTools`, so the safety-net `services.TryAddSingleton<
NyxIdToolOptions>()` registered an empty default first; the host's real
`AddNyxIdTools(o => { o.BaseUrl = ... })` then no-op'd because
`TryAddSingleton(options)` saw the empty instance already registered.
Every NyxID call (relay JWT validation, channel-relay/reply, proxy)
exploded on null BaseUrl, and Lark stopped replying entirely.

The right behaviour: hosts that enable Ornn skills MUST call
`AddNyxIdTools` themselves. If they don't, DI resolution fails fast at
startup with a clear "no service for NyxIdApiClient" — which is exactly
the signal that needs to surface, not be papered over with empty
placeholders. Updated the docstring with a remarks block so the next
person tempted to add this safety net sees the production-incident
breadcrumb first.

The reviewer concern that triggered #15 (workflow host enables Ornn
without NyxId) is real but theoretical for this repo today; the cost
of the wrong fix was concrete and severe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ault

PR #590 shipped CardKit streaming behind a flag (default false) and the
production cluster's ConfigMap doesn't carry an Aevatar:NyxId:Relay
override (verified earlier — the deployed appsettings.Distributed.json
has no Relay node), so flipping the flag in repo configuration was a
no-op. Move the default to true in the C# field initializer so the
aevatar Lark bot uses the card path out of the box.

Risk: low. PR #590's failure-routing already handles missing CardKit
scopes — card.create returns the scope-error / rate-limit envelope,
ConversationGAgent.HandleLarkCardStreamingChunkCoreAsync transitions to
CreationFailed, and the turn falls through to the legacy edit-message
sink for the rest of the chunks. Worst case for a deployment without
cardkit:card:read + cardkit:card:write granted is one extra POST per
turn before the fallback kicks in.

Deployments that explicitly want the legacy path (no Lark bot, or want
to skip the card.create round-trip) can opt out via
Aevatar:NyxId:Relay:StreamingCardKitEnabled=false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lt test

Last commit flipped StreamingCardKitEnabled default to true, which made
ProcessAsync_StreamingEnabled_DispatchesChunkEventAndReadyEvent fail
because the inbox now stamps card-mode chunks with the structurally
distinct LlmReplyCardStreamChunkEvent proto type (PR #590 routing
scheme), and the test asserts the legacy LlmReplyStreamChunkEvent
descriptor.

Two changes:
- Existing test sets StreamingCardKitEnabled=false explicitly; comment
  notes that it deliberately exercises the legacy text-edit chunk shape.
- New test ProcessAsync_StreamingEnabledWithDefaultCardMode_Dispatches-
  CardChunkEvent pins the new default behaviour: omit the flag, observe
  LlmReplyCardStreamChunkEvent on the wire.

898 ChannelRuntime tests + 94 NyxId AI tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lark CardKit 2.0's open-apis/cardkit/v1/cards endpoints want
type-tagged `data` / `settings` / `card.data` payloads as JSON-encoded
STRINGS, not inline objects. The merged PR #590 implementation always
embedded those fields as inline JsonNode trees, which Lark rejects with
code 9499 "Invalid parameter type in json: Data | Settings | Card".

Production rollout (commit 36b8b5c flipped StreamingCardKitEnabled
default to true) made every Lark turn run card.create, hit the 400, log
"Card create failed; falling back to text-edit", and silently spend the
rest of the turn on the legacy edit-message path — so users still saw
text-edit replies and concluded card mode was off (it was on, but the
client was sending the wrong shape).

Verified each shape against the live endpoint via `nyxid proxy request`:

  POST /cards   data: STRING when type=card_json|card_id
                data: INLINE OBJECT when type=template
  PATCH /cards/{id}/settings   settings: STRING
  PUT   /cards/{id}            card: INLINE { type, data }
                                where data follows the same string rule

Tests:
- Updated existing pinning tests that asserted the wrong shape (the
  inline-object assertion was the contract bug).
- Added explicit cases for card_id (string) and template (inline object).
- Round-trip card_json string back through JsonDocument.Parse so the
  test still pins the inner schema/streaming_mode fields.
- UpdateCardAsync now validates CardJson parses (defensive client-side)
  and wraps it in `card.{type:"card_json", data:<string>}` automatically.
- All four CardKit methods reject blank / null DataJson upfront.

59 Lark CardKit tests + 37 ChannelRuntime card tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production incident 2026-05-08: a Lark turn issued an `ssh_exec` tool
call to sg-office-network. NyxID accepted the POST to
`/api/v1/ssh/{catalog_id}/exec` at 07:17:47 UTC and never returned a
response. Direct curl reproduces the call in ~2s (NyxID side
duration_ms:1666), so the hang is something between aevatar's HttpClient
and NyxID's SSH gateway / NodeAgent — most likely the per-user single-
session SSH lock with a stale prior session. With no client-side cap,
the call sat past the inbox runtime's 120s timeout, the user got the
generic `llm_reply_timeout` fallback ("Sorry, this took too long..."),
and the actual SSH attempt was never observable to the model.

Two layered fixes:

1. NyxIdSshExecTool now wraps the SshExecAsync call in a linked CTS that
   cancels at `timeout_secs + 15s` (the +15s gives NyxID's own server-
   side timeout a margin to respond with `timed_out: true` before we
   give up). On timeout the tool returns
     {"error":"ssh_timeout","detail":"NyxID did not return ... within Ns"}
   so the LLM sees a real tool error and can either retry against a
   different host or summarize "couldn't reach the SSH gateway" instead
   of dragging the run.

2. NyxIdApiClient.SendAsync now lets `OperationCanceledException`
   propagate. Previously the catch-all wrapped it as
     {"error":true,"message":"A task was canceled."}
   which silently shadowed any per-call CTS the caller installed on top
   of the LLM run's CT (the test for this fix caught it: tool's catch
   never fired). Cancellation is a control-flow signal, not an HTTP
   error envelope. Other callers of NyxIdApiClient already work with
   either path because the LLM run's CT cancellation also unwinds the
   inbox runtime via OperationCanceledException.

Tests:
- New ExecuteAsync_HardTimesOut_WhenNyxIdHangsOnSshPost exercises the
  full fail mode via a `MapHanging` route on the test PathHandler that
  awaits Timeout.Infinite until the cancellation token fires.
- Existing 44 SSH + ApiClient tests still pass; 539-test AI suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production observation 2026-05-08: a "拉一下 sg office 设备 IP" turn
generated a perfectly-shaped streaming-card reply but ran out of LLM
turn budget mid-flush — the user saw the generic "took too long"
fallback instead of the real answer. The bot did the work; we just
cut it off too early.

Multi-step Lark turns realistically need:
  - ornn skill-search / fetch SKILL.md       ~10s
  - ssh_exec to a remote host                ~30-60s
                                             (45s is the new client-side
                                              hard cap on a stuck NyxID
                                              SSH gateway)
  - LLM thinking + final streaming dispatch  ~30-90s
  - per-tool jitter + connection setup       ~10-20s
                                             ──────
                                             ~ 80-180s typical

120s left no margin once anything but the happy path happened. 300s
buys headroom without giving up the watchdog: a true infinite hang
still resolves to the "took too long" fallback so the user is not
stuck on a "..." reaction forever.

Bumped both knobs to keep them in sync:
  - NyxIdRelayOptions.ResponseTimeoutSeconds default 120 → 300
  - ChannelLlmReplyInboxRuntime.FallbackTimeoutSecondsDefault 120 → 300

Deployments that want a tighter / looser cap still set
Aevatar:NyxId:Relay:ResponseTimeoutSeconds in config; 0 / negative
still means "no cap" (host has its own watchdog) per the existing
ResolveFallbackTimeout contract.

Existing tests that pin specific timeout values
(e.g. ResponseTimeoutSeconds=1 in the timeout-fallback test) are
unaffected; tests that didn't set the option get 300s by default
which is even further from firing for fast tests. 16 inbox runtime +
24 SSH tool tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /daily Lark slash command and the agent template behind it were
named daily_report internally — six different casings (daily_report,
DailyReport, dailyReport, daily-report, plus compound identifiers like
daily_report_form, BuildDailyReportForm, CreateDailyReportAgentAsync).
The user-facing command has always been /daily; the internal _report
suffix added no signal and made it harder to talk about the skill in
docs and Ornn.

Strategy:
- Sed-driven global rename across .cs / .json / .md / .proto / .yaml /
  csproj. 281 occurrences across 26 files. Order: kebab → camel →
  Pascal → snake (no shared substrings, so order is defensive).
- Renamed compound identifiers (daily_report_form → daily_form,
  BuildDailyReportForm → BuildDailyForm, etc.). All callers updated by
  the same pass.
- Three test inputs that intentionally used /daily_report as a fake
  unknown-command literal (testing "this is not a real command, return
  unknown usage") now collide with the real /daily command after the
  rename. Dropped those three [InlineData] entries — the same code path
  is still covered by the /foobar and / cases that remain.

Persistence note: any scheduled agent records persisted with
template="daily_report" before this commit will not match the new
template="daily" lookup. The current production /daily flow is broken
anyway (always returns eanzhao's data, hangs past the 120s budget per
this morning's incident review), so a clean cutover is acceptable —
re-run /daily and the rebuilt agent will land under the new name.

This commit is naming-only. The next commit (separate) replaces the
template body to consume the Ornn-hosted `daily` skill via use_skill,
the way sg-office-network is reached via ssh_exec.

Build clean across affected projects (Authoring.Lark, Scheduled,
NyxidChat). 895 ChannelRuntime + 17 Lark Platform tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`/daily` and `/social-media` are no longer in-tree commands. Recipes for
new persistent agents now live as Ornn skills — the LLM searches via
`ornn_search_skills` (or matches a slash command name to a skill query)
and follows the SKILL.md verbatim. `agent_builder` keeps only the
lifecycle surface (`list_agents`, `agent_status`, `run_agent`,
`disable_agent`, `enable_agent`, `delete_agent`).

The `social_media` template, the `WorkflowAgentGAgent` actor, the
`twitter_publish` workflow module, the `IScopeWorkflowCommandPort` LLM
plumbing in `agent_builder`, and all related card-flow surfaces are
deleted with no replacement; the missing scope-scoped workflow tools
needed to bring social-media back as an Ornn skill are tracked in
issue #598.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…continuation-phase-a

[codex] Refactor LLM reply continuation into agent runs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant