ai-partner: Cloudflare Worker AI proxy stand-up

**Parent epic:** #1446 (Amicus — AI Study Partner v1)
**Phase:** 1 · **Size:** M

Stand up the Cloudflare Worker that sits between the mobile client and Anthropic. Handles auth (RevenueCat receipt validation), rate limiting, zero-retention flag enforcement, streaming response pass-through, and gap-signal detection (feeds #1471).

---

## Repository location

Create a new top-level directory `ai-proxy/` in the ScriptureDeepDive repo (sibling to `app/` and `_tools/`). Not a separate repo — keeps infra co-located with the code that talks to it.

```
ai-proxy/
├── src/
│   ├── index.ts              — Worker entry point + router
│   ├── auth.ts               — RevenueCat receipt validation
│   ├── rateLimit.ts          — KV + Durable Object rate limiting
│   ├── anthropic.ts          — Anthropic API client (streaming)
│   ├── gapDetection.ts       — parses gap_signal JSON from stream, writes to D1 (stub; full logic in #1471)
│   └── types.ts              — shared types
├── test/
│   ├── auth.test.ts
│   ├── rateLimit.test.ts
│   └── integration.test.ts   — hits a local miniflare instance
├── wrangler.toml             — Worker config (prod + staging)
├── package.json
├── tsconfig.json
└── README.md                 — local dev + deploy instructions
```

## Stack

- Cloudflare Workers (TypeScript)
- Wrangler CLI for deploys
- Miniflare for local testing
- Workers KV for rate-limit counters (simple path)
- Durable Objects only if KV proves insufficient (defer)
- Vitest for tests

---

## Endpoints

### `POST /ai/chat`

Authenticated, streaming, rate-limited. Primary entry point for Amicus conversations.

**Request body:**
```ts
{
  query: string;
  retrieved_chunks: Array<{
    chunk_id: string;
    source_type: string;
    text: string;
    metadata: Record<string, unknown>;
  }>;
  profile_summary: string;
  current_chapter_ref: string | null;
  model_tier: "haiku" | "sonnet";   // client hints; proxy may override for entitlement
  conversation_history: Array<{ role: "user" | "assistant"; content: string }>;
}
```

**Headers:**
- `Authorization: Bearer <revenuecat_receipt>`
- `X-Amicus-Client-Version: <semver>` (for future compat handling)

**Response:** Server-Sent Events (SSE) streaming Anthropic's response. Worker pipes the upstream stream directly; no buffering.

**Gap detection hook:** As the stream passes through, accumulate assistant tokens. When `[DONE]` arrives, parse for the structured gap_signal JSON envelope. If `gap: true`, fire-and-forget a write to Cloudflare D1 (full implementation in #1471; this issue provides the hook point with a stub writer).

### `POST /ai/embed`

Authenticated, non-streaming. Embeds a user query so the client can run vector search.

**Request:** `{ text: string }`
**Response:** `{ vector: number[] }` (1536 floats)

Uses OpenAI `text-embedding-3-small`. Same model as build-time embedding so client-side vectors match corpus vectors.

### `GET /ai/health`

Unauthenticated. Returns `{ status: "ok", version: "<commit_sha>" }`.

---

## Auth (`auth.ts`)

Validate the bearer token as a RevenueCat receipt. On the first request per user per 5 minutes:

1. Extract receipt from `Authorization: Bearer <receipt>`
2. POST to `https://api.revenuecat.com/v1/subscribers/<app_user_id>` with RevenueCat secret key
3. Check the response `subscriber.entitlements` for active entitlement `premium` OR `partner_plus`
4. Cache the result in Workers KV for 5 minutes keyed on receipt hash
5. Attach `entitlement: "premium" | "partner_plus" | null` to the request context

Return `402 Payment Required` if no valid entitlement. Return `401 Unauthorized` if receipt is malformed.

**Secrets (Wrangler env bindings):**
- `REVENUECAT_SECRET_KEY`
- `ANTHROPIC_API_KEY`
- `OPENAI_API_KEY`

All secrets must be set via `wrangler secret put` — never in `wrangler.toml`. README.md documents required secrets.

## Rate limiting (`rateLimit.ts`)

Per-user limits by entitlement:

| Entitlement | Monthly | Daily | 10-min burst |
|---|---|---|---|
| `premium` | 300 | — | 10 |
| `partner_plus` | 1,500 | — | 30 |
| none | 0 (already rejected by auth) | — | — |

Store counters in Workers KV keyed by `rate:{user_id}:{YYYY-MM}` and `burst:{user_id}:{timestamp_10min_bucket}`. TTL auto-expires.

On limit exceeded: return `429 Too Many Requests` with JSON body `{ error: "rate_limit_exceeded", retry_after: <seconds>, upgrade_url?: string }`. Include `upgrade_url` if entitlement is `premium` so client can surface the Amicus+ upgrade path.

## Anthropic client (`anthropic.ts`)

- Model mapping:
  - `model_tier: "haiku"` → `claude-haiku-4-5-20251001`
  - `model_tier: "sonnet"` → `claude-opus-4-7` (or current Sonnet/Opus — verify at implementation time against `/mnt/skills/public/product-self-knowledge`)
- For `partner_plus` entitlement, always use Sonnet regardless of client hint
- Set zero-retention: include header `anthropic-no-retention: true` on every request (verify exact header name against Anthropic API docs at implementation)
- System prompt assembled server-side from template + `profile_summary` + `retrieved_chunks` (never trust client-supplied system prompts)
- Stream responses via SSE; pipe through to client

---

## Logging posture

- **Log:** user_id (hashed), timestamp, endpoint, status_code, input_token_count, output_token_count, entitlement, latency_ms
- **Do NOT log:** query text, response text, retrieved_chunks, profile_summary, any user content
- Structured JSON logs to stdout; Cloudflare observability picks them up

## wrangler.toml structure

```
name = "amicus-proxy"
main = "src/index.ts"
compatibility_date = "2026-04-01"

[env.production]
route = "ai.contentcompanionstudy.com/*"
kv_namespaces = [{ binding = "RATE_LIMITS", id = "..." }]

[env.staging]
route = "ai-staging.contentcompanionstudy.com/*"
kv_namespaces = [{ binding = "RATE_LIMITS", id = "..." }]
```

## README.md must document

- Local dev setup (`npm install`, `wrangler dev`)
- How to run tests
- How to set secrets via `wrangler secret put`
- Deploy commands for staging vs production
- Rollback procedure

---

## Acceptance criteria

- [ ] `wrangler dev` runs locally; all three endpoints respond
- [ ] `/ai/health` returns 200 unauthenticated
- [ ] `/ai/chat` returns 401 with malformed bearer token
- [ ] `/ai/chat` returns 402 with bearer token for non-premium RevenueCat subscriber
- [ ] `/ai/chat` with valid premium entitlement streams Anthropic response end-to-end
- [ ] Rate limit 429 fires correctly when premium user exceeds 10/10min
- [ ] Zero-retention header confirmed set on outbound Anthropic requests (assert in integration test)
- [ ] No request body content appears in Worker logs (verify in integration test)
- [ ] `/ai/embed` returns 1536-dim vector for a sample query
- [ ] Tests: auth, rate limit, and integration all passing under Vitest + Miniflare
- [ ] README covers local dev + deploy + secrets
- [ ] Staging deploy works; production deploy documented but not yet executed (defer until rest of Phase 1 is ready to wire through)

## Out of scope

- Gap signal parsing → write-to-D1: stub here (emit structured log only); full D1 integration in #1471
- Client-side integration → that's #1451
- Supabase sync of conversations → deferred to v2 (Supabase Phase 14)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-partner: Cloudflare Worker AI proxy stand-up #1450

Repository location

Stack

Endpoints

`POST /ai/chat`

`POST /ai/embed`

`GET /ai/health`

Auth (`auth.ts`)

Rate limiting (`rateLimit.ts`)

Anthropic client (`anthropic.ts`)

Logging posture

wrangler.toml structure

README.md must document

Acceptance criteria

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Entitlement	Monthly	Daily	10-min burst
`premium`	300	—	10
`partner_plus`	1,500	—	30
none	0 (already rejected by auth)	—	—

ai-partner: Cloudflare Worker AI proxy stand-up #1450

Description

Repository location

Stack

Endpoints

POST /ai/chat

POST /ai/embed

GET /ai/health

Auth (auth.ts)

Rate limiting (rateLimit.ts)

Anthropic client (anthropic.ts)

Logging posture

wrangler.toml structure

README.md must document

Acceptance criteria

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`POST /ai/chat`

`POST /ai/embed`

`GET /ai/health`

Auth (`auth.ts`)

Rate limiting (`rateLimit.ts`)

Anthropic client (`anthropic.ts`)