Parent epic: #1446 (Amicus — AI Study Partner v1)
Phase: 1 · Size: M
Stand up the Cloudflare Worker that sits between the mobile client and Anthropic. Handles auth (RevenueCat receipt validation), rate limiting, zero-retention flag enforcement, streaming response pass-through, and gap-signal detection (feeds #1471).
Repository location
Create a new top-level directory ai-proxy/ in the ScriptureDeepDive repo (sibling to app/ and _tools/). Not a separate repo — keeps infra co-located with the code that talks to it.
ai-proxy/
├── src/
│ ├── index.ts — Worker entry point + router
│ ├── auth.ts — RevenueCat receipt validation
│ ├── rateLimit.ts — KV + Durable Object rate limiting
│ ├── anthropic.ts — Anthropic API client (streaming)
│ ├── gapDetection.ts — parses gap_signal JSON from stream, writes to D1 (stub; full logic in #1471)
│ └── types.ts — shared types
├── test/
│ ├── auth.test.ts
│ ├── rateLimit.test.ts
│ └── integration.test.ts — hits a local miniflare instance
├── wrangler.toml — Worker config (prod + staging)
├── package.json
├── tsconfig.json
└── README.md — local dev + deploy instructions
Stack
- Cloudflare Workers (TypeScript)
- Wrangler CLI for deploys
- Miniflare for local testing
- Workers KV for rate-limit counters (simple path)
- Durable Objects only if KV proves insufficient (defer)
- Vitest for tests
Endpoints
POST /ai/chat
Authenticated, streaming, rate-limited. Primary entry point for Amicus conversations.
Request body:
{
query: string;
retrieved_chunks: Array<{
chunk_id: string;
source_type: string;
text: string;
metadata: Record<string, unknown>;
}>;
profile_summary: string;
current_chapter_ref: string | null;
model_tier: "haiku" | "sonnet"; // client hints; proxy may override for entitlement
conversation_history: Array<{ role: "user" | "assistant"; content: string }>;
}
Headers:
Authorization: Bearer <revenuecat_receipt>
X-Amicus-Client-Version: <semver> (for future compat handling)
Response: Server-Sent Events (SSE) streaming Anthropic's response. Worker pipes the upstream stream directly; no buffering.
Gap detection hook: As the stream passes through, accumulate assistant tokens. When [DONE] arrives, parse for the structured gap_signal JSON envelope. If gap: true, fire-and-forget a write to Cloudflare D1 (full implementation in #1471; this issue provides the hook point with a stub writer).
POST /ai/embed
Authenticated, non-streaming. Embeds a user query so the client can run vector search.
Request: { text: string }
Response: { vector: number[] } (1536 floats)
Uses OpenAI text-embedding-3-small. Same model as build-time embedding so client-side vectors match corpus vectors.
GET /ai/health
Unauthenticated. Returns { status: "ok", version: "<commit_sha>" }.
Auth (auth.ts)
Validate the bearer token as a RevenueCat receipt. On the first request per user per 5 minutes:
- Extract receipt from
Authorization: Bearer <receipt>
- POST to
https://api.revenuecat.com/v1/subscribers/<app_user_id> with RevenueCat secret key
- Check the response
subscriber.entitlements for active entitlement premium OR partner_plus
- Cache the result in Workers KV for 5 minutes keyed on receipt hash
- Attach
entitlement: "premium" | "partner_plus" | null to the request context
Return 402 Payment Required if no valid entitlement. Return 401 Unauthorized if receipt is malformed.
Secrets (Wrangler env bindings):
REVENUECAT_SECRET_KEY
ANTHROPIC_API_KEY
OPENAI_API_KEY
All secrets must be set via wrangler secret put — never in wrangler.toml. README.md documents required secrets.
Rate limiting (rateLimit.ts)
Per-user limits by entitlement:
| Entitlement |
Monthly |
Daily |
10-min burst |
premium |
300 |
— |
10 |
partner_plus |
1,500 |
— |
30 |
| none |
0 (already rejected by auth) |
— |
— |
Store counters in Workers KV keyed by rate:{user_id}:{YYYY-MM} and burst:{user_id}:{timestamp_10min_bucket}. TTL auto-expires.
On limit exceeded: return 429 Too Many Requests with JSON body { error: "rate_limit_exceeded", retry_after: <seconds>, upgrade_url?: string }. Include upgrade_url if entitlement is premium so client can surface the Amicus+ upgrade path.
Anthropic client (anthropic.ts)
- Model mapping:
model_tier: "haiku" → claude-haiku-4-5-20251001
model_tier: "sonnet" → claude-opus-4-7 (or current Sonnet/Opus — verify at implementation time against /mnt/skills/public/product-self-knowledge)
- For
partner_plus entitlement, always use Sonnet regardless of client hint
- Set zero-retention: include header
anthropic-no-retention: true on every request (verify exact header name against Anthropic API docs at implementation)
- System prompt assembled server-side from template +
profile_summary + retrieved_chunks (never trust client-supplied system prompts)
- Stream responses via SSE; pipe through to client
Logging posture
- Log: user_id (hashed), timestamp, endpoint, status_code, input_token_count, output_token_count, entitlement, latency_ms
- Do NOT log: query text, response text, retrieved_chunks, profile_summary, any user content
- Structured JSON logs to stdout; Cloudflare observability picks them up
wrangler.toml structure
name = "amicus-proxy"
main = "src/index.ts"
compatibility_date = "2026-04-01"
[env.production]
route = "ai.contentcompanionstudy.com/*"
kv_namespaces = [{ binding = "RATE_LIMITS", id = "..." }]
[env.staging]
route = "ai-staging.contentcompanionstudy.com/*"
kv_namespaces = [{ binding = "RATE_LIMITS", id = "..." }]
README.md must document
- Local dev setup (
npm install, wrangler dev)
- How to run tests
- How to set secrets via
wrangler secret put
- Deploy commands for staging vs production
- Rollback procedure
Acceptance criteria
Out of scope
Parent epic: #1446 (Amicus — AI Study Partner v1)
Phase: 1 · Size: M
Stand up the Cloudflare Worker that sits between the mobile client and Anthropic. Handles auth (RevenueCat receipt validation), rate limiting, zero-retention flag enforcement, streaming response pass-through, and gap-signal detection (feeds #1471).
Repository location
Create a new top-level directory
ai-proxy/in the ScriptureDeepDive repo (sibling toapp/and_tools/). Not a separate repo — keeps infra co-located with the code that talks to it.Stack
Endpoints
POST /ai/chatAuthenticated, streaming, rate-limited. Primary entry point for Amicus conversations.
Request body:
Headers:
Authorization: Bearer <revenuecat_receipt>X-Amicus-Client-Version: <semver>(for future compat handling)Response: Server-Sent Events (SSE) streaming Anthropic's response. Worker pipes the upstream stream directly; no buffering.
Gap detection hook: As the stream passes through, accumulate assistant tokens. When
[DONE]arrives, parse for the structured gap_signal JSON envelope. Ifgap: true, fire-and-forget a write to Cloudflare D1 (full implementation in #1471; this issue provides the hook point with a stub writer).POST /ai/embedAuthenticated, non-streaming. Embeds a user query so the client can run vector search.
Request:
{ text: string }Response:
{ vector: number[] }(1536 floats)Uses OpenAI
text-embedding-3-small. Same model as build-time embedding so client-side vectors match corpus vectors.GET /ai/healthUnauthenticated. Returns
{ status: "ok", version: "<commit_sha>" }.Auth (
auth.ts)Validate the bearer token as a RevenueCat receipt. On the first request per user per 5 minutes:
Authorization: Bearer <receipt>https://api.revenuecat.com/v1/subscribers/<app_user_id>with RevenueCat secret keysubscriber.entitlementsfor active entitlementpremiumORpartner_plusentitlement: "premium" | "partner_plus" | nullto the request contextReturn
402 Payment Requiredif no valid entitlement. Return401 Unauthorizedif receipt is malformed.Secrets (Wrangler env bindings):
REVENUECAT_SECRET_KEYANTHROPIC_API_KEYOPENAI_API_KEYAll secrets must be set via
wrangler secret put— never inwrangler.toml. README.md documents required secrets.Rate limiting (
rateLimit.ts)Per-user limits by entitlement:
premiumpartner_plusStore counters in Workers KV keyed by
rate:{user_id}:{YYYY-MM}andburst:{user_id}:{timestamp_10min_bucket}. TTL auto-expires.On limit exceeded: return
429 Too Many Requestswith JSON body{ error: "rate_limit_exceeded", retry_after: <seconds>, upgrade_url?: string }. Includeupgrade_urlif entitlement ispremiumso client can surface the Amicus+ upgrade path.Anthropic client (
anthropic.ts)model_tier: "haiku"→claude-haiku-4-5-20251001model_tier: "sonnet"→claude-opus-4-7(or current Sonnet/Opus — verify at implementation time against/mnt/skills/public/product-self-knowledge)partner_plusentitlement, always use Sonnet regardless of client hintanthropic-no-retention: trueon every request (verify exact header name against Anthropic API docs at implementation)profile_summary+retrieved_chunks(never trust client-supplied system prompts)Logging posture
wrangler.toml structure
README.md must document
npm install,wrangler dev)wrangler secret putAcceptance criteria
wrangler devruns locally; all three endpoints respond/ai/healthreturns 200 unauthenticated/ai/chatreturns 401 with malformed bearer token/ai/chatreturns 402 with bearer token for non-premium RevenueCat subscriber/ai/chatwith valid premium entitlement streams Anthropic response end-to-end/ai/embedreturns 1536-dim vector for a sample queryOut of scope