Skip to content

ai-partner: per-chapter chip pool generator #1461

@CraigBuckmaster

Description

@CraigBuckmaster

Parent epic: #1446 (Amicus — AI Study Partner v1)
Phase: 3 · Size: M · Depends on: #1447 (embeddings), #1450 (proxy)

Build-time pipeline that pre-generates the chip pool for the FAB peek (#1462). Each chapter gets 3 chip prompts per profile variant (6 variants), batched via Claude Haiku at content build. Output lives in scripture.db and serves ~$0 runtime cost for the inline FAB peek experience.


Files to create

  • _tools/build_prompts.py — orchestrator (matches build_embeddings.py structure from ai-partner: build embeddings pipeline script #1447)
  • _tools/build_prompts_variants.py — profile variant definitions + variant → prompt-seed template
  • _tools/build_prompts_entity.py — entity chips for people/places/debates (lightweight, templated)
  • _tools/prompts_manifest.json — chunk hash tracking for incremental (gitignored)

Files to modify

  • _tools/build_sqlite_schema.py — add precached_prompts table DDL
  • _tools/build_sqlite_loaders.py — add populate_precached_prompts(conn) loader
  • _tools/build_sqlite.py — call the new loader in sequence
  • _tools/content_writer.pysave_chapter() flags the chapter in prompts_manifest.json for re-generation
  • .gitignore — add _tools/prompts_manifest.json, prompts.db

Conventions to follow

  • Match orchestrator + loader separation from build_sqlite.py / build_embeddings.py
  • UTF-8 stdout preamble; ROOT = Path(__file__).resolve().parent.parent
  • [OK] print markers
  • Windows: python not python3; no Unix path assumptions

Profile variants (6 total, fixed)

PROFILE_VARIANTS = [
    "generic_balanced",       # default fallback
    "reformed_narrative",     # Calvin/Wright leaning, OT narrative
    "reformed_prophets",      # Calvin/Wright leaning, prophets
    "jewish_pentateuch",      # Sarna/Alter leaning, Torah
    "jewish_prophets",        # Sarna/Alter leaning, prophets
    "catholic_gospels",       # Catholic tradition, NT gospels
]

Each variant has a system-prompt seed that biases chip generation toward scholars/themes that variant's profile would engage with. A separate generic_balanced catches users who don't fit any lean.

Chapter chip generation

For each (chapter × variant), call Claude Haiku with:

  • System prompt: "Generate 3 concise prompt chips that a scholarly Bible-study user (variant: {X}) might tap when opening {book} {chapter}. Each chip is 6-10 words, phrased as a question or prompt. Return JSON array."
  • Context: chapter title, subtitle, top 3 retrieved chunks from embeddings (via vector search on chapter summary)

Target per-chip cost: $0.001 with Haiku + prompt caching. Total one-time cost: 1,189 chapters × 6 variants × 3 chips = **$21 one-time** (well under the $42 ceiling called out in plan §10).

Entity chip generation (lightweight templated)

For entity screens (people, places, debates), chips are templated — not LLM-generated. Format:

  • Person: "Who was {name}?" / "Where does {name} appear in scripture?" / "Scholars on {name}"
  • Place: "What happened at {place}?" / "Scholars on {place}" / "Related people and events"
  • Debate topic: "Different views on this" / "Strongest arguments for each side"

Zero LLM cost. _tools/build_prompts_entity.py produces these from meta files.

Output DB table (scripture.db)

CREATE TABLE precached_prompts (
  entity_type TEXT NOT NULL,        -- 'chapter' | 'person' | 'place' | 'debate_topic'
  entity_id TEXT NOT NULL,          -- e.g. 'romans-9', 'david', 'jerusalem', 'election-predestination'
  profile_variant TEXT NOT NULL,    -- one of the 6 variants (or 'default' for entities)
  chips_json TEXT NOT NULL,         -- JSON: [{label, seed_query, expected_source_types[]}]
  generated_at TEXT NOT NULL,
  PRIMARY KEY (entity_type, entity_id, profile_variant)
);

CREATE INDEX idx_precached_prompts_entity ON precached_prompts(entity_type, entity_id);

CLI interface

python _tools/build_prompts.py                     # full rebuild
python _tools/build_prompts.py --incremental       # re-gen only for chapters in manifest
python _tools/build_prompts.py --dry-run           # cost estimate + count
python _tools/build_prompts.py --entity-only       # regen entity chips only (zero LLM cost)
python _tools/build_prompts.py --chapter-only      # regen chapter chips only

Cost guardrails

  • --dry-run prints total API calls, estimated tokens, estimated cost
  • Full rebuild without dry-run prompts Continue? [y/N] if cost > $1.00
  • ANTHROPIC_API_KEY env required; fail early if missing

Acceptance criteria

  • Full rebuild generates 6 variants × 1,189 chapters = ~7,134 chapter rows
  • Entity chips generated for all people (313), places (373), debate topics (308)
  • Incremental mode regenerates only chapters flagged in manifest
  • Chips JSON validates (each has label, seed_query, expected_source_types)
  • Chip labels are 6-10 words (measured; fail build if outside range)
  • Resume from checkpoint works (Ctrl-C mid-batch → re-run without duplicate API calls)
  • python _tools/validate_sqlite.py passes with new section verifying chip coverage
  • Works on Windows

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions