Lark streaming reply: migrate to CardKit 2.0 to remove edit-count cap

## TL;DR (中文)

当前 Lark 流式回复靠 `/channel-relay/reply` + `/reply/update` 反复编辑同一条消息实现（PR #374 + commit `31972630`）。Lark 单条消息 edit 上限约 15–20 次（错误码 `230072`），所以现在必须靠 `StreamingMaxInterimChunks=15` + 750ms throttle 把 edit 数量压住——长回复会"卡在最后一次 interim 直到 final 才显示完整内容"，UX 不佳。

CardKit 2.0 的"流式卡片"是 Lark 官方为 token-by-token 流式输出设计的形态：通过 `card_id` 寻址，更新走专用 `cardkit/v1/cardElement.content`，**没有 edit 次数上限**，throttle 可降到 200ms 量级，UX 直接对齐当前主流 AI bot。

## Background

PR #374 (`feat/2026-04-24_lark-streaming-reply`) wires streaming through NyxID's channel-relay outbound surface:

- `POST /api/v1/channel-relay/reply` — first send, consumes the per-callback `reply_token`.
- `POST /api/v1/channel-relay/reply/update` — subsequent edits, internally translated to `PUT/PATCH /open-apis/im/v1/messages/{id}`.

Commit `31972630` ("Cap and throttle streaming Lark edits to avoid 230072") landed two band-aids after mainnet logs surfaced `230072`:

1. **Loop throttle gate** in `TurnStreamingReplySink.DispatchLoopAsync`. Without it, in-flight dispatch + concurrent `OnDeltaAsync` produced one Lark edit per token.
2. **Hard interim cap** `NyxIdRelayOptions.StreamingMaxInterimChunks` (default 15). Once hit, interim flushes stash text but skip dispatch; the final flush bypasses the cap so it always lands.

This works, but the underlying constraint is structural: **Lark refuses message edits once the per-message edit count is exhausted, full stop.** Any future feature that wants to expose more streaming detail (reasoning blocks, tool-call status, longer replies) re-hits the wall. We are also forcing users to wait for the whole tail of the reply before they see the last 30%+ of any sufficiently long answer.

## Why CardKit 2.0

Lark's CardKit 2.0 streaming-cards surface is purpose-built for the LLM-token-streaming use case:

- A card is allocated once via `cardkit/v1/card.create` → returns `card_id`.
- The card is sent to a chat once via `im/v1/messages` (or `messages/{id}/reply`) with `msg_type=interactive` referencing `card_id`.
- Streaming text into a specific element happens via `cardkit/v1/cardElement.content` (per-element, sequence-controlled).
- Streaming mode is opened/closed via `cardkit/v1/card.settings`; final content (links, tool blocks, citations) lands via `cardkit/v1/card.update`.
- All updates use a monotonically increasing `sequence` field — Lark rejects stale writes deterministically.
- **No per-card edit-count cap.** The cap that produces `230072` is on `im/v1/messages` edits, not CardKit element updates.

`ColinLu50/openclaw-lark-stream` is a working reference impl with the exact same shape we want: phase state machine, sequence counter, 200ms CardKit throttle, fallback to IM patch on `230020` (rate limit) / `230099`/`11310` (card table size).

## Discovery (load-bearing — verify before scoping)

### NyxID proxy reachability — ✅ no NyxID changes required

`backend/src/services/proxy_service.rs` exposes `/api/v1/proxy/s/{slug}/{*path}` as a wildcard pass-through. Anything under `open-apis/*` is forwarded transparently with the bot's tenant access token injected. CardKit endpoints (`/open-apis/cardkit/v1/...`) are reachable via the same `LarkNyxClient.ProxyRequestAsync` mechanism aevatar already uses for `messages.SearchMessagesAsync`, `BatchGetMessagesAsync` etc. Per CLAUDE.md "外部仓库无改动权" — this plan stays inside the bound.

### Lark scope — ⚠️ external configuration, no code change

aevatar's Lark bot app must enable CardKit-related scopes in the Feishu/Lark Developer Console (e.g. `im:card`, `im:card:send`, `cardkit:card.read.write`; exact list to be confirmed during impl). NyxID's `services/channel_adapters/lark.rs` only enumerates `im:message` + `im:message:send_as_bot` for its own permission-setup deep-link, but it does **not** block other scopes — the proxy passes whatever the upstream tenant token has.

### Streaming sink today goes through channel-relay, not the API-key proxy — ⚠️ architectural fork

- `NyxIdRelayOutboundPort.SendAsync` / `UpdateAsync` → `POST /api/v1/channel-relay/reply{,/update}`, authed by per-callback `reply_token` (RS256 JWT, `aud="channel-relay/reply"`).
- `LarkNyxClient.ProxyRequestAsync` → `/api/v1/proxy/s/{slug}/...`, authed by aevatar's NyxID API key.

CardKit calls **must** go through the API-key proxy path because the `reply_token` is bound to `aud="channel-relay/reply"` (per `skills/nyxid/references/channels.md` lines 296–307). This means `TurnStreamingReplySink`'s outbound channel forks: first send + final terminal send may stay on channel-relay (so NyxID retains audit metadata), but every interim CardKit element update must use the proxy path. **Open question:** whether to keep the hybrid or move the entire Lark outbound to the proxy path (and accept that NyxID's metadata audit no longer covers Lark interim writes — bodies are never stored anyway per ADR-013).

## Scope

### A. `LarkCardKitClient` (new wrapper, `src/Aevatar.AI.ToolProviders.Lark/`)

Mirror the existing `LarkNyxClient` shape. Add typed methods for:

- `CreateCardAsync(token, cardJson, ct) → card_id` → `POST open-apis/cardkit/v1/cards`
- `SendCardAsync(token, receiveIdType, receiveId, cardId, ct)` → `POST open-apis/im/v1/messages` with `msg_type=interactive`, `content={"type":"card","data":{"card_id":...}}`
- `StreamElementContentAsync(token, cardId, elementId, sequence, content, ct)` → `POST open-apis/cardkit/v1/cards/{card_id}/elements/{element_id}/content`
- `SetSettingsAsync(token, cardId, sequence, settings, ct)` → `PATCH open-apis/cardkit/v1/cards/{card_id}/settings`
- `UpdateCardAsync(token, cardId, sequence, cardJson, ct)` → `PUT open-apis/cardkit/v1/cards/{card_id}`

Exact path strings to be verified against Lark's open-API reference during impl; the `cardkit.v1.*` SDK names in OpenClaw map to these REST paths.

### B. `TurnStreamingCardSink` + `CardPhase` state machine

A new sink alongside (not replacing) `TurnStreamingReplySink`. Per-turn runtime state on `ConversationGAgent` carries:

```
CardPhase phase
string? cardId
string streamingElementId          // single canonical element id we stream into
long sequence                      // monotonic; pre-increment per outbound call
string? cardMessageId              // platform message id of the sent card
string? originalCardKitCardId      // preserved for terminal update if mid-stream fallback fires
```

Phases (extends #405's):

```
Idle              → no chunk attempted yet
Creating          → card.create + send in flight
Streaming         → element-content updates flowing
Completed         → terminal, success
Aborted           → terminal, intentional cancel
Terminated        → terminal, upstream message recall / unavailable
CreationFailed    → terminal, fall back to text-edit sink
```

Constraints (mirror #405):
- Must remain in-memory on the actor (per CLAUDE.md "中间层状态约束 / 运行态在 actor 内").
- `PhaseTransitions` table; reject illegal transitions with warn log, do **not** throw.
- `TerminalReason` recorded on entry to any terminal phase.
- All read sites use phase-level helpers (`AllowsStreamingUpdate`, `AllowsFinalUpdate`, etc.).
- `sequence` is pre-incremented before every outbound call; sink owns it.

### C. Outbound port + runner contract

Extend `IConversationTurnRunner.RunStreamChunkAsync` (or add a sibling `RunCardStreamChunkAsync`) so the sink can return card-shaped progress (`cardId`, `cardMessageId`, `sequence`, `phase`) instead of the current text-edit-shaped `ConversationStreamChunkResult { PlatformMessageId, EditUnsupported }`.

Decide between (1) extending the existing record with optional card fields (one runner, branched by mode flag) or (2) splitting the runner into two implementations selected by `NyxIdRelayOptions.OutboundMode`. Recommend (2) — cleaner phase-machine separation; the text-edit runner stays as the fallback target for Scope D.

### D. Fallback to text-edit sink

When `cardkit/v1/card.create` or the initial `messages.create with card_id` fails (e.g. scope not granted, Lark down, `230020` rate limit on the create), drop to the existing `TurnStreamingReplySink` for this turn. Phase transitions to `CreationFailed`; subsequent chunks are routed to the text-edit pathway with the same `correlation_id`.

Mid-stream fallback (`230099`/`11310` table limit during element-content updates) follows OpenClaw: clear `cardId`, preserve `originalCardKitCardId`, route remaining interim writes to text-edit on the same upstream message id, and post a terminal `card.update` to the original card id at finalization. Scope this **only if** real traffic shows we hit it; default impl can stop at full-turn fallback.

### E. Tests

- Mirror `TurnStreamingReplySinkTests` for the card sink: sequence ordering under burst, throttle gate, finalization, abort.
- `PhaseTransitions` table tests (every illegal transition logs + no-ops).
- Fallback test: `card.create` returns 4xx → `CreationFailed` + text-edit sink picks up.
- Rate-limit test: `230020` mid-stream → frame skipped, phase stays `Streaming`.
- All new tests strictly synchronous — no `Task.Delay` outside the existing `tools/ci/test_polling_allowlist.txt`.

### F. Config + docs

- `NyxIdRelayOptions`: `OutboundMode` (`TextEdit` | `CardKit`, default `TextEdit` until Scope E green), `CardKitThrottleMs` (default 200), `CardKitFallbackToTextEdit` (default true), `StreamingElementId` (default `streaming_main`).
- New ADR under `docs/adr/`: "Lark CardKit streaming as the canonical outbound for streaming replies", capturing the channel-relay-vs-proxy fork decision.
- Update `skills/nyxid` ref docs (locally — not pushed to NyxID) on which path aevatar uses, since channel-relay is no longer the only Lark outbound.

## Out of scope

- **Issue #405 (text-edit phase state machine + unavailable guard).** Should land **first** as a strictly smaller refactor on the existing text-edit sink; the card sink reuses the same `UnavailableGuard` and a superset of the phase machine. Do not delete the text-edit codepath after CardKit lands — it remains the fallback per Scope D.
- **Reasoning blocks / tool-call visualization on cards.** CardKit supports rich elements but our current LLM output is plain text. Filing a separate issue once we have actual reasoning/tool-call payloads to visualize.
- **CardKit interactive callbacks** (button clicks, form submits). aevatar's bot does not currently consume `card.action.trigger` — no scope creep here.
- **NyxID changes.** Per CLAUDE.md "外部仓库无改动权"; nothing in this plan requires NyxID code changes. Lark scope additions are config in the Feishu Developer Console.

## Open questions / pre-conditions

1. **Lark bot scope grant** — confirm the exact CardKit scope keys aevatar's bot needs and that the ops owner can enable them in the Feishu Developer Console before the card sink ships. Without this, every PR will fail integration on a real tenant.
2. **Outbound auth path** — confirm `LarkNyxClient`'s API key is available at the streaming sink's call site (today the sink uses `reply_token`). Likely yes (the bot already uses `LarkNyxClient` for tools); cheap to verify before Scope A.
3. **Channel-relay first-send vs direct send** — decide hybrid vs pure-direct. Recommend pure-direct (one auth path, one outbound surface for Lark streaming). Document in the new ADR.
4. **Emoji/typing indicator** — `LarkNyxClient.UpdateMessageReactionAsync` (Typing/DONE swap) currently runs alongside text-edit streaming. Decide whether CardKit's own `card.settings` streaming-mode toggle replaces it or runs in parallel.

## Effort estimate

| Slice | Days |
|---|---|
| Scope A (`LarkCardKitClient` + DTOs) | 0.5–1 |
| Scope B (`CardPhase` machine + `TurnStreamingCardSink`) | 2–3 |
| Scope C (runner contract split) | 0.5–1 |
| Scope D (fallback wiring) | 0.5 |
| Scope E (tests) | 1.5–2 |
| Scope F (config + ADR) | 0.5 |
| Real-tenant e2e + scope grant coordination | 0.5 |
| **Total** | **6–9 working days (~1.5–2 weeks)** |

Recommended sequencing: land #405 first (1–2 days, scoped, helpful regardless), then this issue as 2 PRs — Scope A+F preliminary, then Scope B–E main.

## References

- Issue #405 — phase state machine + unavailable guard for the text-edit sink (lands first; this issue extends its phase machine for cards)
- PR #374 — original streaming reply implementation
- Commit `31972630` — current 230072 mitigation (interim cap + throttle gate); the band-aid this issue replaces
- OpenClaw reference impl: `ColinLu50/openclaw-lark-stream` — `src/card/cardkit.ts`, `src/card/streaming-card-controller.ts`, `src/card/flush-controller.ts`, `src/card/unavailable-guard.ts`
- Lark CardKit 2.0 docs: `open.feishu.cn/document/uAjLw4CM/ukTMukTMukTM/feishu-cards/streaming-update-of-card-content/streaming-cards-overview`
- NyxID reachability: `backend/src/services/proxy_service.rs` (wildcard `/api/v1/proxy/s/{slug}/{*path}`), `skills/nyxid/references/channels.md` lines 270–307 (channel-relay vs reply token semantics)
- CLAUDE.md sections: 单线程事实源, 中间层状态约束（运行态在 actor 内，不持久化）, 外部仓库无改动权

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lark streaming reply: migrate to CardKit 2.0 to remove edit-count cap #589

TL;DR (中文)

Background

Why CardKit 2.0

Discovery (load-bearing — verify before scoping)

NyxID proxy reachability — ✅ no NyxID changes required

Lark scope — ⚠️ external configuration, no code change

Streaming sink today goes through channel-relay, not the API-key proxy — ⚠️ architectural fork

Scope

A. `LarkCardKitClient` (new wrapper, `src/Aevatar.AI.ToolProviders.Lark/`)

B. `TurnStreamingCardSink` + `CardPhase` state machine

C. Outbound port + runner contract

D. Fallback to text-edit sink

E. Tests

F. Config + docs

Out of scope

Open questions / pre-conditions

Effort estimate

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slice	Days
Scope A (`LarkCardKitClient` + DTOs)	0.5–1
Scope B (`CardPhase` machine + `TurnStreamingCardSink`)	2–3
Scope C (runner contract split)	0.5–1
Scope D (fallback wiring)	0.5
Scope E (tests)	1.5–2
Scope F (config + ADR)	0.5
Real-tenant e2e + scope grant coordination	0.5
Total	6–9 working days (~1.5–2 weeks)

Lark streaming reply: migrate to CardKit 2.0 to remove edit-count cap #589

Description

TL;DR (中文)

Background

Why CardKit 2.0

Discovery (load-bearing — verify before scoping)

NyxID proxy reachability — ✅ no NyxID changes required

Lark scope — ⚠️ external configuration, no code change

Streaming sink today goes through channel-relay, not the API-key proxy — ⚠️ architectural fork

Scope

A. LarkCardKitClient (new wrapper, src/Aevatar.AI.ToolProviders.Lark/)

B. TurnStreamingCardSink + CardPhase state machine

C. Outbound port + runner contract

D. Fallback to text-edit sink

E. Tests

F. Config + docs

Out of scope

Open questions / pre-conditions

Effort estimate

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

A. `LarkCardKitClient` (new wrapper, `src/Aevatar.AI.ToolProviders.Lark/`)

B. `TurnStreamingCardSink` + `CardPhase` state machine