Skip to content

fix(kiloclaw) webhook agent ingest chat delivery#3140

Merged
St0rmz1 merged 5 commits intomainfrom
fix/webhook-agent-ingest-chat-delivery
May 8, 2026
Merged

fix(kiloclaw) webhook agent ingest chat delivery#3140
St0rmz1 merged 5 commits intomainfrom
fix/webhook-agent-ingest-chat-delivery

Conversation

@St0rmz1
Copy link
Copy Markdown
Contributor

@St0rmz1 St0rmz1 commented May 8, 2026

Summary

Webhook deliveries that target a KiloClaw chat thread have been failing in production with "Authentication required" errors since the Stream Chat backend was removed. The dead path was a POST from the webhook agent ingest queue consumer to a kiloclaw worker route (/api/platform/send-chat-message) that no longer exists. Captured webhook requests reach the trigger requests page in the cloud dashboard, attempt delivery, and end up in the failed state.

This change replaces the dead HTTP fetch with a service binding RPC into the kilo-chat worker. A new KiloChatService.postMessageAsUser method resolves the conversation between the user and the bot, optionally creates one on first delivery, and posts the rendered prompt via the existing createMessageFor path. The rest of the pipeline (capture, queue, trigger config, token minting, the cloud agent delivery target) is unchanged.

The new RPC is intentionally generic. The source parameter is a free form string used for log attribution, so future internal flows that want to post user messages into the chat thread (for example, an onboarding flow that warms the bot before the user opens chat for the first time) can call the same primitive without a contract change.

Reliability and validation guarantees:

  • The RPC validates message content against the same shared textBlockSchema used by the public createMessage HTTP route, returning invalid_request for empty or oversized bodies before any conversation lookup or creation. Webhook payloads can be up to 256KB, so without this the persisted chat content could exceed MESSAGE_TEXT_MAX_CHARS.
  • The webhook consumer wraps the RPC call in a try/catch so a thrown service binding error explicitly fails the request rather than leaving it stuck in inprogress. Without the catch, the outer queue retry guard would silently ack the next attempt and leave the request stuck.
  • The RPC enforces userOwnsSandbox unconditionally, including the path where a conversation already exists. The previous behaviour trusted the ConversationDO membership snapshot, which is not equivalent to current sandbox ownership.
  • The kiloclaw_instances lookup in webhook agent ingest is now scoped by user id. A stale or other tenant instance id returns a clean "instance not found" instead of resolving to a different user's sandbox.

The cross service RPC contract types live in the shared @kilocode/kilo-chat package so producer (kilo-chat) and consumer (webhook agent ingest) share one source of truth. A drift in either direction now fails the build instead of passing wrong data at runtime.

Verification

  • Reproduced the regression by inspecting the trigger requests page in the cloud dashboard for a real production trigger. Pre fix the request row reads "Failed: KiloClaw Chat delivery failed: Authentication required". After applying the change in local test the same flow resolves the conversation and persists a chat message.
  • Direct unit coverage on postMessageAsUser: auto creates a conversation on first delivery, reuses an existing one on subsequent delivery, returns no_conversation when autoCreateConversation: false and none exists, returns forbidden when the user does not own the sandbox, returns forbidden when ownership is revoked between conversation create and a later post (covers the existing conversation bypass), and returns invalid_request for empty or oversized message bodies.
  • All existing kilo-chat suites (326 tests) and webhook agent ingest suites (55 tests) pass.
  • pnpm run lint, pnpm run typecheck, and pnpm run format:check all clean across both services and the shared package.
  • After deploy, send a fresh POST to a live trigger URL. The request row in the cloud dashboard should show "Success" and the rendered prompt should appear in the user's chat thread.

Visual Changes

N/A

Reviewer Notes

The new RPC accepts sandboxId rather than kiloclawInstanceId. The webhook consumer does the kiloclaw_instances translation on its side, which keeps kilo-chat free of any kiloclaw specific schema knowledge while the consumer already has database access for upstream trigger lookups.

The getKiloChat(env) cast helper in services/webhook-agent-ingest/src/kilo-chat-binding.ts exists because wrangler types only emits a generic Service for service bindings. Centralising the cast keeps call sites clean and gives one place to update if the binding shape ever moves.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 8, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (4 files)
  • services/kilo-chat/src/index.ts
  • services/kilo-chat/src/services/post-message-as-user.ts
  • services/webhook-agent-ingest/src/queue-consumer.test.ts
  • services/webhook-agent-ingest/src/queue-consumer.ts

Reviewed by gpt-5.5-20260423 · 516,496 tokens

@St0rmz1 St0rmz1 changed the title Fix/webhook agent ingest chat delivery fix(kiloclaw) webhook agent ingest chat delivery May 8, 2026
@St0rmz1 St0rmz1 merged commit 67e24ea into main May 8, 2026
43 checks passed
@St0rmz1 St0rmz1 deleted the fix/webhook-agent-ingest-chat-delivery branch May 8, 2026 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants