Skip to content

ai-partner: Cloudflare Worker AI proxy stand-up #1450

@CraigBuckmaster

Description

@CraigBuckmaster

Parent epic: #1446 (Amicus — AI Study Partner v1)
Phase: 1 · Size: M

Stand up the Cloudflare Worker that sits between the mobile client and Anthropic. Handles auth (RevenueCat receipt validation), rate limiting, zero-retention flag enforcement, streaming response pass-through, and gap-signal detection (feeds #1471).


Repository location

Create a new top-level directory ai-proxy/ in the ScriptureDeepDive repo (sibling to app/ and _tools/). Not a separate repo — keeps infra co-located with the code that talks to it.

ai-proxy/
├── src/
│   ├── index.ts              — Worker entry point + router
│   ├── auth.ts               — RevenueCat receipt validation
│   ├── rateLimit.ts          — KV + Durable Object rate limiting
│   ├── anthropic.ts          — Anthropic API client (streaming)
│   ├── gapDetection.ts       — parses gap_signal JSON from stream, writes to D1 (stub; full logic in #1471)
│   └── types.ts              — shared types
├── test/
│   ├── auth.test.ts
│   ├── rateLimit.test.ts
│   └── integration.test.ts   — hits a local miniflare instance
├── wrangler.toml             — Worker config (prod + staging)
├── package.json
├── tsconfig.json
└── README.md                 — local dev + deploy instructions

Stack

  • Cloudflare Workers (TypeScript)
  • Wrangler CLI for deploys
  • Miniflare for local testing
  • Workers KV for rate-limit counters (simple path)
  • Durable Objects only if KV proves insufficient (defer)
  • Vitest for tests

Endpoints

POST /ai/chat

Authenticated, streaming, rate-limited. Primary entry point for Amicus conversations.

Request body:

{
  query: string;
  retrieved_chunks: Array<{
    chunk_id: string;
    source_type: string;
    text: string;
    metadata: Record<string, unknown>;
  }>;
  profile_summary: string;
  current_chapter_ref: string | null;
  model_tier: "haiku" | "sonnet";   // client hints; proxy may override for entitlement
  conversation_history: Array<{ role: "user" | "assistant"; content: string }>;
}

Headers:

  • Authorization: Bearer <revenuecat_receipt>
  • X-Amicus-Client-Version: <semver> (for future compat handling)

Response: Server-Sent Events (SSE) streaming Anthropic's response. Worker pipes the upstream stream directly; no buffering.

Gap detection hook: As the stream passes through, accumulate assistant tokens. When [DONE] arrives, parse for the structured gap_signal JSON envelope. If gap: true, fire-and-forget a write to Cloudflare D1 (full implementation in #1471; this issue provides the hook point with a stub writer).

POST /ai/embed

Authenticated, non-streaming. Embeds a user query so the client can run vector search.

Request: { text: string }
Response: { vector: number[] } (1536 floats)

Uses OpenAI text-embedding-3-small. Same model as build-time embedding so client-side vectors match corpus vectors.

GET /ai/health

Unauthenticated. Returns { status: "ok", version: "<commit_sha>" }.


Auth (auth.ts)

Validate the bearer token as a RevenueCat receipt. On the first request per user per 5 minutes:

  1. Extract receipt from Authorization: Bearer <receipt>
  2. POST to https://api.revenuecat.com/v1/subscribers/<app_user_id> with RevenueCat secret key
  3. Check the response subscriber.entitlements for active entitlement premium OR partner_plus
  4. Cache the result in Workers KV for 5 minutes keyed on receipt hash
  5. Attach entitlement: "premium" | "partner_plus" | null to the request context

Return 402 Payment Required if no valid entitlement. Return 401 Unauthorized if receipt is malformed.

Secrets (Wrangler env bindings):

  • REVENUECAT_SECRET_KEY
  • ANTHROPIC_API_KEY
  • OPENAI_API_KEY

All secrets must be set via wrangler secret put — never in wrangler.toml. README.md documents required secrets.

Rate limiting (rateLimit.ts)

Per-user limits by entitlement:

Entitlement Monthly Daily 10-min burst
premium 300 10
partner_plus 1,500 30
none 0 (already rejected by auth)

Store counters in Workers KV keyed by rate:{user_id}:{YYYY-MM} and burst:{user_id}:{timestamp_10min_bucket}. TTL auto-expires.

On limit exceeded: return 429 Too Many Requests with JSON body { error: "rate_limit_exceeded", retry_after: <seconds>, upgrade_url?: string }. Include upgrade_url if entitlement is premium so client can surface the Amicus+ upgrade path.

Anthropic client (anthropic.ts)

  • Model mapping:
    • model_tier: "haiku"claude-haiku-4-5-20251001
    • model_tier: "sonnet"claude-opus-4-7 (or current Sonnet/Opus — verify at implementation time against /mnt/skills/public/product-self-knowledge)
  • For partner_plus entitlement, always use Sonnet regardless of client hint
  • Set zero-retention: include header anthropic-no-retention: true on every request (verify exact header name against Anthropic API docs at implementation)
  • System prompt assembled server-side from template + profile_summary + retrieved_chunks (never trust client-supplied system prompts)
  • Stream responses via SSE; pipe through to client

Logging posture

  • Log: user_id (hashed), timestamp, endpoint, status_code, input_token_count, output_token_count, entitlement, latency_ms
  • Do NOT log: query text, response text, retrieved_chunks, profile_summary, any user content
  • Structured JSON logs to stdout; Cloudflare observability picks them up

wrangler.toml structure

name = "amicus-proxy"
main = "src/index.ts"
compatibility_date = "2026-04-01"

[env.production]
route = "ai.contentcompanionstudy.com/*"
kv_namespaces = [{ binding = "RATE_LIMITS", id = "..." }]

[env.staging]
route = "ai-staging.contentcompanionstudy.com/*"
kv_namespaces = [{ binding = "RATE_LIMITS", id = "..." }]

README.md must document

  • Local dev setup (npm install, wrangler dev)
  • How to run tests
  • How to set secrets via wrangler secret put
  • Deploy commands for staging vs production
  • Rollback procedure

Acceptance criteria

  • wrangler dev runs locally; all three endpoints respond
  • /ai/health returns 200 unauthenticated
  • /ai/chat returns 401 with malformed bearer token
  • /ai/chat returns 402 with bearer token for non-premium RevenueCat subscriber
  • /ai/chat with valid premium entitlement streams Anthropic response end-to-end
  • Rate limit 429 fires correctly when premium user exceeds 10/10min
  • Zero-retention header confirmed set on outbound Anthropic requests (assert in integration test)
  • No request body content appears in Worker logs (verify in integration test)
  • /ai/embed returns 1536-dim vector for a sample query
  • Tests: auth, rate limit, and integration all passing under Vitest + Miniflare
  • README covers local dev + deploy + secrets
  • Staging deploy works; production deploy documented but not yet executed (defer until rest of Phase 1 is ready to wire through)

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions