Skip to content

feat(ai-partner): Phase 1 E2E smoke test behind feature flag (#1453)#1481

Merged
CraigBuckmaster merged 4 commits into
masterfrom
claude/issue-1453-e2e-smoke-test
Apr 17, 2026
Merged

feat(ai-partner): Phase 1 E2E smoke test behind feature flag (#1453)#1481
CraigBuckmaster merged 4 commits into
masterfrom
claude/issue-1453-e2e-smoke-test

Conversation

@CraigBuckmaster
Copy link
Copy Markdown
Owner

Closes #1453. Depends on #1447, #1448, #1450, #1451, #1452.

Summary

  • New dev-only harness for the Phase 1 end-to-end smoke test.
  • 10 canned queries span multi-source synthesis, lexicon lookup, debate topic, cross-testament typology, journey stops, corpus gap (q06, expected to fire gap_signal.gap=true), meta-FAQ, character study, archaeology, and textual transmission.
  • Hidden in production by featureFlags.AMICUS_SMOKE_TEST (requires __DEV__ and EXPO_PUBLIC_AMICUS_SMOKE=true).

New modules

  • app/src/constants/featureFlags.ts — the flag registry (first occupant is AMICUS_SMOKE_TEST).
  • app/src/services/amicus/__smoke__/canned_queries.json — 10 queries w/ expected citations + source types.
  • app/src/services/amicus/__smoke__/runSmokeTest.ts — harness: retrieves, streams chat, extracts citations, parses gap_signal, builds SmokeReport with p50/p95 latency.
  • app/src/screens/dev/AmicusSmokeScreen.tsx — minimal utility UI: Run button, expandable per-query rows, Copy-JSON-to-clipboard.

Wiring

  • app/src/navigation/types.ts — adds AmicusSmoke to MoreStackParamList.
  • app/src/navigation/MoreStack.tsx — lazy-registered only when the flag is on.

Test plan

  • npx tsc --noEmit clean
  • npx jest — 3,239 tests pass (10 new: canned-query shape, citation extraction, trailing-JSON parsing)
  • Feature flag invisible in production builds (flag evaluates false)
  • Report JSON structure round-trips through clipboard
  • Reviewer runs the harness in a dev build against staging proxy — target ≥9/10 pass and p95 <5s (Phase 1 exit criterion). Full run requires live RevenueCat receipt + staging Worker; instructions are in this PR body.

How to run locally

  1. cd app && EXPO_PUBLIC_AMICUS_SMOKE=true EXPO_PUBLIC_AMICUS_DEV_TOKEN=<receipt> EXPO_PUBLIC_AMICUS_PROXY_URL=https://ai-staging.contentcompanionstudy.com npm run dev
  2. Nav: More → AmicusSmoke
  3. Tap "Run all 10 queries"
  4. Copy JSON report and paste into this PR comment when done

Out of scope

https://claude.ai/code/session_01Pht3kzgdvkn81DDfL9SnFe

claude added 4 commits April 17, 2026 06:40
Implements the Amicus retrieval service layer that embeds a user query,
runs sqlite-vec MATCH against scripture.db, and re-ranks with profile +
context boosts.

New modules under app/src/services/amicus/:
- types.ts       — shared types (RetrievedChunk, RetrievalContext,
                   CompressedProfile, AmicusError + AmicusErrorCode)
- embed.ts       — calls proxy /ai/embed; retries once on 5xx/network;
                   translates proxy 401/402 to PROXY_UNAUTHORIZED,
                   network failure to OFFLINE
- vectorSearch.ts — packVector (little-endian float32), distance→similarity,
                    searchByVector executing the MATCH join on
                    embeddings/chunk_text/chunk_metadata. Throws
                    EXTENSION_NOT_LOADED if sqlite-vec isn't loaded.
- rerank.ts      — pure functions: current-chapter ×1.5, preferred
                   scholars ×1.1, traditions ×1.05, 2-per-scholar
                   diversity cap, top-10
- retrieval.ts   — top-level orchestrator with latency timing
- index.ts       — tiny public barrel

30 Jest tests (4 suites) across the pipeline:
- rerank: boosts, diversity, ordering
- vectorSearch: packing, similarity, error on extension missing
- embed: shape validation, retry, 401/402, offline, timeout
- retrieval: end-to-end with fake fetch + mock DB

All 3213 app tests still pass; tsc --noEmit + eslint 0 warnings.

https://claude.ai/code/session_01Pht3kzgdvkn81DDfL9SnFe
Deterministic, on-device generator that reads engagement signals from
user.db and produces a 100-200-token prose summary. Sent to Amicus on
every request; user-inspectable in Settings (#1459).

Migration v15 adds partner_profile_cache (singleton, id=1) to user.db.
The generator SHA-256-hashes the raw signals and skips regeneration
when the hash matches a <7-day-old cached row.

New modules under app/src/services/amicus/profile/:
- types.ts     — RawSignals, CompressedProfile, ProfileForInspection
- signals.ts   — queries for total_chapters_read, last_30_day, top
                 scholars, tradition + genre distribution, journey
                 state, recent chapters, current focus. Tables that
                 don't exist yet (scholar_opens, journey_progress)
                 degrade gracefully to empty.
- templates.ts — prose assembly with six optional sections; thin
                 profiles get a deliberately minimal "new to study"
                 phrasing so Amicus doesn't over-personalize.
- generator.ts — generateProfile(force), getProfileForInspection(),
                 clearProfile(), hashRawSignals(). Stable sorted-key
                 canonicalization so identical signals always hash
                 identically.
- index.ts     — public barrel

16 Jest tests: template section inclusion rules, thick/thin profiles,
truncation, 24-hour reading window, cache hit/miss/expire/force,
hash determinism and key-order independence, clearProfile DELETE.

Also bumps __tests__/{db,unit}/userDatabase.test.ts migration counts
from 14 → 15 to match. All 3,199 app tests pass; tsc clean.

https://claude.ai/code/session_01Pht3kzgdvkn81DDfL9SnFe
Depends on #1447, #1448, #1450, #1451, #1452 (merge commits stacked).

New modules:
- app/src/constants/featureFlags.ts — new registry; AMICUS_SMOKE_TEST
  flag (__DEV__ && EXPO_PUBLIC_AMICUS_SMOKE=true)
- app/src/services/amicus/__smoke__/canned_queries.json — 10 queries
  covering synthesis, lexicon, debate, cross-testament typology,
  journey stops, deliberate corpus gap (q06), meta-FAQ lookup, character
  study, archaeology, textual transmission
- app/src/services/amicus/__smoke__/runSmokeTest.ts — harness driving
  retrieve() + /ai/chat streaming; extracts citations, parses the
  trailing gap_signal, produces a structured SmokeReport with p50/p95
  latency, pass counts, and per-query diagnostics
- app/src/screens/dev/AmicusSmokeScreen.tsx — internal utility UI:
  header, Run button, per-query rows that expand to show failures,
  citations, and response preview; Copy-JSON-to-clipboard

Wired:
- app/src/navigation/types.ts — new AmicusSmoke route on MoreStackParamList
- app/src/navigation/MoreStack.tsx — lazy-registered only when the flag
  is on, so production builds never surface the screen

Tests: 10 unit tests covering canned-query shape, citation extraction,
and trailing JSON parsing. Full run (3,239 tests) stays green; tsc clean.

Phase 1 exit criterion: this harness passing against staging proxy.

https://claude.ai/code/session_01Pht3kzgdvkn81DDfL9SnFe
@github-actions
Copy link
Copy Markdown

Test Results

✅ All tests passed

Passed Failed Total
Tests ✅ 3239 ❌ 0 3239
Suites ✅ 433 ❌ 0 433

Coverage

Statements Branches Functions Lines

⏱️ Duration: 73.3s

@CraigBuckmaster CraigBuckmaster merged commit 56e6aa6 into master Apr 17, 2026
6 checks passed
@CraigBuckmaster CraigBuckmaster deleted the claude/issue-1453-e2e-smoke-test branch April 17, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ai-partner: E2E smoke test behind feature flag

2 participants