harden(webhook): nyxid-relay handler must accept-fast and persist to inbox before processing

> Architectural follow-up surfaced in [docs/audit-scorecard/2026-04-27-daily-pipeline-architecture-review.md](../blob/docs/2026-04-27_daily-pipeline-test-reference/docs/audit-scorecard/2026-04-27-daily-pipeline-architecture-review.md) §A2. aevatar-side hardening for the fact that NyxID's callback delivery is fire-and-forget (no outbox / retry / DLQ on NyxID side — see `~/Code/NyxID/backend/src/services/channel_relay_service.rs:284-397`).

## Symptom

`NyxIdChatEndpoints.HandleRelayWebhookAsync` ([NyxIdChatEndpoints.Relay.cs:28](../blob/dev/agents/Aevatar.GAgents.NyxidChat/NyxIdChatEndpoints.Relay.cs)) does the following inline before returning:

1. Read full request body
2. Parse via `NyxIdRelayTransport.Parse`
3. Validate JWT via `NyxIdRelayAuthValidator.ValidateAsync`
4. Resolve canonical scope id (`ResolveRelayScopeIdAsync`)
5. Normalize activity (`Clone()` etc.)
6. Publish to `ConversationGAgent` inbox

Any exception in steps 2-5 returns 4xx to NyxID. NyxID records `channel_messages.callback_status='failed'` and **never retries**. The inbound message is permanently lost.

issue #398 is a direct symptom (\"Lark relay callbacks never reach aevatar — no POST /api/webhooks/nyxid-relay on inbound messages\"), though that one is mostly NyxID-side configuration. The aevatar-side issue is: even when NyxID does deliver, any blip in steps 2-5 of our handler is terminal.

## Architectural violations

- CLAUDE.md \"事实源唯一\" / \"committed event 必须可观察\" — the only \"persistence\" of an inbound message in aevatar today happens after we publish to ConversationGAgent inbox. Anything before that fails non-replayably.

## Proposed direction

Two-phase webhook:

**Phase 1 — accept**: persist raw bytes + minimal metadata (`message_id`, headers) to a `RelayInboundInboxGAgent` (or an append-only document store) in O(1) write, then return 202. No parsing, no JWT validation, no normalization. Idempotent on `message_id` (NyxID supplies it in `X-NyxID-Message-Id`).

**Phase 2 — process**: async worker (Orleans grain timer / dedicated consumer actor) picks up rows from the inbox and runs the existing parse → JWT validate → scope resolve → normalize → publish-to-ConversationGAgent pipeline. Failures stay in the inbox (with attempt count + last error), can be replayed manually or dead-lettered.

Knock-on benefits:
- Authentication failures still leave an audit trail (currently 401 + log line, payload discarded).
- Operationally inspectable: \"why didn't /daily work\" → look in the inbox, not in pod stdout.
- Forward-compatible with eventual NyxID-side retry: phase-1 dedupe by `message_id` makes re-delivery harmless.
- Defensive against multi-pod ingress confusion (issue #398 hypothesis 3): the inbox is shared state, not pod-local.

What this does NOT solve:
- NyxID never delivering at all (issue #398 hypotheses 1/2: stale `callback_url`, broken Lark→NyxID subscription). Those are NyxID-side / config-side and out of scope.

## Acceptance

- [ ] Webhook returns 202 within `<25ms` p99 (just persist + ack).
- [ ] No exception in parse / JWT / scope-resolve causes message loss.
- [ ] Replaying a single inbox row produces identical downstream effects (idempotent by `message_id`).
- [ ] Test: inject a parse failure mid-handler; verify the message is in the inbox with `status=parse_failed`, retryable.
- [ ] Inbox storage retention policy defined (e.g. 7 days successful, 30 days failed).

## Affected files

- \`agents/Aevatar.GAgents.NyxidChat/NyxIdChatEndpoints.Relay.cs\` — phase-1 only
- new: \`agents/Aevatar.GAgents.NyxidChat/RelayInboundInboxGAgent.cs\` (or non-actor inbox store)
- new: phase-2 consumer (worker grain or dedicated dispatcher)
- \`channel_runtime_messages.proto\` — inbox row contract

## Related

- #398 — bug-level, mostly NyxID/Lark/config side. This issue is the aevatar-side resilience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harden(webhook): nyxid-relay handler must accept-fast and persist to inbox before processing #449

Symptom

Architectural violations

Proposed direction

Acceptance

Affected files

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

harden(webhook): nyxid-relay handler must accept-fast and persist to inbox before processing #449

Description

Symptom

Architectural violations

Proposed direction

Acceptance

Affected files

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions