You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
End-to-end smoke test for the Phase 1 foundations. Validates the whole pipe — retrieval → proxy → Anthropic → streamed response with citations — runs against a dev build with a feature flag, not behind premium gating yet. Exit criteria for "Phase 1 is done."
Files to create
app/src/services/amicus/__smoke__/canned_queries.json — 10 canned test queries with expected citation targets
app/src/services/amicus/__smoke__/runSmokeTest.ts — programmatic harness that iterates the canned set and writes a report
app/src/screens/dev/AmicusSmokeScreen.tsx — dev-only screen behind feature flag, runs the smoke test and renders results
app/src/constants/featureFlags.ts — add AMICUS_SMOKE_TEST flag (default off; on only for internal build profile)
Files to modify
app/src/navigation/MoreStack.tsx — conditionally register AmicusSmokeScreen route when flag is on
Feature flag pattern
Follow existing flag conventions if there's a pattern in the codebase; otherwise use:
Corpus-gap question (deliberately out-of-scope, expects gap: true response — e.g. "What do Coptic Orthodox scholars say about Romans 9?" if corpus has no Coptic Orthodox coverage)
Translation question
Character-focused question
Archaeology / historical context question
Simple definition question
Craig (or the implementer) writes the 10 queries + expected citations based on actual corpus content. Do NOT guess — validate expected_citations actually exist in the corpus before committing.
Call proxy /ai/chat via the integrated service layer — stream response
Parse final response for: (a) the prose answer, (b) citation markers [chunk_id], (c) structured gap_signal JSON
Assert against expected_citations:
At least one chunk_id from must_include_any_of appears in response citations
Every must_include_source_types has at least one citation of that type
No must_not_include citations appear
For gap-test queries: assert gap_signal.gap === true
Measure: end-to-end latency, tokens in, tokens out, chunks retrieved
Output: JSON report { query_id, passed: bool, citations: [], latency_ms, failures: [] } for each query, plus an aggregate pass/fail count.
Dev screen UI (AmicusSmokeScreen.tsx)
Minimal utility UI — not polished. For internal use only.
Header: "Amicus Smoke Test (Dev Only)"
Button: "Run all 10 queries"
Progress indicator during run
Result list: green check / red X per query + expand-for-details
Aggregate: "9 / 10 passed — avg latency 1.4s"
Export button: copies full JSON report to clipboard
Acceptance criteria
Feature flag AMICUS_SMOKE_TEST gates screen visibility correctly (invisible in prod builds)
All 10 canned queries written with verified expected citations
Harness runs all 10 queries end-to-end against staging proxy
At least 9 / 10 pass on first run (the 10th — the gap test — should produce gap: true, not a "pass" in the citation sense; adjust harness to account for this)
Latency p95 across all queries < 5 seconds
No hallucinated scholar attributions detected in responses (manual spot-check)
Parent epic: #1446 (Amicus — AI Study Partner v1)
Phase: 1 · Size: S · Depends on: #1447, #1448, #1450, #1451, #1452
End-to-end smoke test for the Phase 1 foundations. Validates the whole pipe — retrieval → proxy → Anthropic → streamed response with citations — runs against a dev build with a feature flag, not behind premium gating yet. Exit criteria for "Phase 1 is done."
Files to create
app/src/services/amicus/__smoke__/canned_queries.json— 10 canned test queries with expected citation targetsapp/src/services/amicus/__smoke__/runSmokeTest.ts— programmatic harness that iterates the canned set and writes a reportapp/src/screens/dev/AmicusSmokeScreen.tsx— dev-only screen behind feature flag, runs the smoke test and renders resultsapp/src/constants/featureFlags.ts— addAMICUS_SMOKE_TESTflag (default off; on only for internal build profile)Files to modify
app/src/navigation/MoreStack.tsx— conditionally registerAmicusSmokeScreenroute when flag is onFeature flag pattern
Follow existing flag conventions if there's a pattern in the codebase; otherwise use:
Developer sets
EXPO_PUBLIC_AMICUS_SMOKE=truein.env.localto enable in dev; never on in production builds.Canned queries (
canned_queries.json)Ten queries covering the breadth of corpus types. Each entry:
{ "id": "q01", "query": "Why do Reformed and Jewish scholars read election theology differently?", "current_chapter_ref": { "book_id": "romans", "chapter_num": 9 }, "expected_citations": { "must_include_any_of": ["section_panel:romans-9-s1-calvin", "section_panel:romans-9-s2-wright"], "must_include_source_types": ["section_panel", "debate_topic"], "must_not_include": [] }, "expected_behavior": "multi-source synthesis with citations" }, { "id": "q02", "query": "What does the Hebrew word chesed mean?", "current_chapter_ref": null, "expected_citations": { "must_include_any_of": ["word_study:chesed", "lexicon_entry:heb-H2617"], "must_include_source_types": ["word_study", "lexicon_entry"] }, "expected_behavior": "single-source lexicon lookup" }, ...Coverage must span:
gap: trueresponse — e.g. "What do Coptic Orthodox scholars say about Romans 9?" if corpus has no Coptic Orthodox coverage)Craig (or the implementer) writes the 10 queries + expected citations based on actual corpus content. Do NOT guess — validate expected_citations actually exist in the corpus before committing.
Harness behavior (
runSmokeTest.ts)For each canned query:
retrieve(ctx)from ai-partner: client-side retrieval with sqlite-vec #1451 — log retrieved chunk_ids/ai/chatvia the integrated service layer — stream response[chunk_id], (c) structured gap_signal JSONmust_include_any_ofappears in response citationsmust_include_source_typeshas at least one citation of that typemust_not_includecitations appeargap_signal.gap === trueOutput: JSON report
{ query_id, passed: bool, citations: [], latency_ms, failures: [] }for each query, plus an aggregate pass/fail count.Dev screen UI (
AmicusSmokeScreen.tsx)Minimal utility UI — not polished. For internal use only.
Acceptance criteria
AMICUS_SMOKE_TESTgates screen visibility correctly (invisible in prod builds)gap: true, not a "pass" in the citation sense; adjust harness to account for this)anytypes; strict TypeScript passesOut of scope
Phase 1 exit criteria
When this smoke test passes at the bar above, Phase 1 is complete and Phase 2 (Amicus tab UI) can begin.