Parent epic: #1446 (Amicus — AI Study Partner v1)
Phase: 1 · Size: S · Depends on: #1447
Integrate the sqlite-vec extension into the scripture.db build pipeline so vector search runs on-device with no extra setup for the user. Merge embeddings.db (output of #1447) into scripture.db as a vec0 virtual table.
Files to modify
_tools/build_sqlite.py — add new phase after schema creation that loads sqlite-vec and populates the embeddings vec0 virtual table from embeddings.db
_tools/build_sqlite_schema.py — add the vec0 table DDL and the companion chunk_text + chunk_metadata tables
_tools/build_sqlite_loaders.py — add populate_embeddings(conn) loader function matching the existing loader signature pattern
_tools/validate_sqlite.py — add integrity checks (see acceptance)
app/package.json — add native deps needed for sqlite-vec (see "Client-side loading")
Files to create
_tools/sqlite_vec_loader.py — small helper that wraps sqlite_vec.loadable_path() and handles cross-platform loading for the build machine
Schema additions (in build_sqlite_schema.py)
Add these to the SCHEMA string. The vec0 table is declared as a virtual table; companion tables hold text + metadata keyed by chunk_id.
-- Vector embeddings for Amicus retrieval
CREATE VIRTUAL TABLE embeddings USING vec0(
embedding FLOAT[1536]
);
CREATE TABLE chunk_text (
chunk_id TEXT PRIMARY KEY,
text TEXT NOT NULL
);
CREATE TABLE chunk_metadata (
chunk_id TEXT PRIMARY KEY,
source_type TEXT NOT NULL,
source_id TEXT NOT NULL,
scholar_id TEXT,
tradition TEXT,
book_id TEXT,
chapter_num INTEGER,
verse_start INTEGER,
verse_end INTEGER,
panel_type TEXT
);
CREATE INDEX idx_chunk_metadata_source ON chunk_metadata(source_type, source_id);
CREATE INDEX idx_chunk_metadata_chapter ON chunk_metadata(book_id, chapter_num);
The vec0 virtual table has an implicit rowid that maps to chunk_id via the order of insertion. populate_embeddings() inserts into vec0 and chunk_text/chunk_metadata in matching order so rowid ↔ chunk_id is consistent.
Build-time loader (populate_embeddings)
Match the signature + style of existing loaders (populate_books, populate_verses, etc.). Pseudocode:
def populate_embeddings(conn: sqlite3.Connection) -> None:
embeddings_db = ROOT / 'embeddings.db'
if not embeddings_db.exists():
print(" [WARN] embeddings.db not found — skipping embeddings table")
print(" Run: python _tools/build_embeddings.py")
return
# Load sqlite-vec extension into the target conn
conn.enable_load_extension(True)
import sqlite_vec
sqlite_vec.load(conn)
conn.enable_load_extension(False)
# Attach embeddings.db read-only; copy rows into scripture.db target tables
src = sqlite3.connect(f"file:{embeddings_db}?mode=ro", uri=True)
cur = conn.cursor()
rows = src.execute("SELECT chunk_id, source_type, source_id, text, metadata_json, embedding FROM embedding_chunks").fetchall()
for chunk_id, src_type, src_id, text, meta_json, emb_blob in rows:
meta = json.loads(meta_json)
cur.execute("INSERT INTO embeddings(embedding) VALUES (?)", (emb_blob,))
cur.execute("INSERT INTO chunk_text VALUES (?, ?)", (chunk_id, text))
cur.execute("""INSERT INTO chunk_metadata VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(chunk_id, src_type, src_id, meta.get('scholar_id'), meta.get('tradition'),
meta.get('book_id'), meta.get('chapter_num'), meta.get('verse_start'),
meta.get('verse_end'), meta.get('panel_type')))
conn.commit()
src.close()
print(f" [OK] Embeddings: {len(rows)} chunks populated")
Client-side loading (app/)
React Native needs a native module for sqlite-vec. Add to app/package.json:
"expo-sqlite": "(latest matching Expo SDK 54)",
"@op-engineering/op-sqlite": "(if needed for extension loading, eval at implementation time)"
Verify at implementation which sqlite-vec approach works with Expo SDK 54. Document the chosen path in a code comment in app/src/db/index.ts where the scripture.db connection is opened.
Validation (_tools/validate_sqlite.py)
Add a new section 6. EMBEDDINGS that verifies:
embeddings virtual table exists and has rows
- Row counts match across
embeddings, chunk_text, chunk_metadata
- No orphan chunk_ids (every chunk_text row has matching chunk_metadata)
- Sample vector search returns results (smoke test with a canned query vector)
- Total DB size under 150MB (hard cap before we need further optimization)
Acceptance criteria
Out of scope
Parent epic: #1446 (Amicus — AI Study Partner v1)
Phase: 1 · Size: S · Depends on: #1447
Integrate the
sqlite-vecextension into the scripture.db build pipeline so vector search runs on-device with no extra setup for the user. Mergeembeddings.db(output of #1447) intoscripture.dbas a vec0 virtual table.Files to modify
_tools/build_sqlite.py— add new phase after schema creation that loads sqlite-vec and populates theembeddingsvec0 virtual table fromembeddings.db_tools/build_sqlite_schema.py— add the vec0 table DDL and the companionchunk_text+chunk_metadatatables_tools/build_sqlite_loaders.py— addpopulate_embeddings(conn)loader function matching the existing loader signature pattern_tools/validate_sqlite.py— add integrity checks (see acceptance)app/package.json— add native deps needed for sqlite-vec (see "Client-side loading")Files to create
_tools/sqlite_vec_loader.py— small helper that wrapssqlite_vec.loadable_path()and handles cross-platform loading for the build machineSchema additions (in
build_sqlite_schema.py)Add these to the SCHEMA string. The vec0 table is declared as a virtual table; companion tables hold text + metadata keyed by chunk_id.
The vec0 virtual table has an implicit
rowidthat maps to chunk_id via the order of insertion.populate_embeddings()inserts into vec0 and chunk_text/chunk_metadata in matching order so rowid ↔ chunk_id is consistent.Build-time loader (
populate_embeddings)Match the signature + style of existing loaders (
populate_books,populate_verses, etc.). Pseudocode:Client-side loading (
app/)React Native needs a native module for sqlite-vec. Add to
app/package.json:Verify at implementation which sqlite-vec approach works with Expo SDK 54. Document the chosen path in a code comment in
app/src/db/index.tswhere the scripture.db connection is opened.Validation (
_tools/validate_sqlite.py)Add a new section
6. EMBEDDINGSthat verifies:embeddingsvirtual table exists and has rowsembeddings,chunk_text,chunk_metadataAcceptance criteria
python _tools/build_sqlite.pycompletes successfully afterbuild_embeddings.pyhas runscripture.dbcontains populatedembeddings,chunk_text,chunk_metadatatablespython _tools/validate_sqlite.pypasses with new section 6 EMBEDDINGS greenSELECT rowid, distance FROM embeddings WHERE embedding MATCH ? ORDER BY distance LIMIT 5returns resultsembeddings.dbabsent (doesn't break existing build)app/src/db/index.tsOut of scope