feat(ai-partner): integrate sqlite-vec into scripture.db build (#1448)#1476
Merged
Merged
Conversation
Adds the Amicus vector-retrieval tables to scripture.db. The loader is a graceful no-op when embeddings.db is missing or sqlite-vec isn't installed, so existing CI stays green without needing Amicus tooling. - _tools/build_sqlite_schema.py — add chunk_text + chunk_metadata tables + indexes (the vec0 virtual table is created inside the loader because it requires the extension) - _tools/build_sqlite_loaders.py — add populate_embeddings() that creates the vec0 table and copies rows from embeddings.db - _tools/build_sqlite.py — call populate_embeddings() after FTS build - _tools/sqlite_vec_loader.py — new cross-platform helper wrapping sqlite_vec.load() + enable_load_extension() - _tools/validate_sqlite.py — new section 6 EMBEDDINGS + 250MB size cap - app/src/db/index.ts — document client-side sqlite-vec loading path Verified locally: - Build + validate pass with no embeddings.db (CI-like scenario): warns - Build + validate pass with a seeded embeddings.db + sqlite-vec: all OK - Smoke vec search `MATCH ? ORDER BY distance LIMIT 5` returns results https://claude.ai/code/session_01Pht3kzgdvkn81DDfL9SnFe
Content Pipeline Results✅ All pipeline checks passed
|
Test Results✅ All tests passed
Coverage
⏱️ Duration: 76.0s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1448. Depends on #1447.
Summary
chunk_text+chunk_metadatatables (regular, always created) and theembeddingsvec0 virtual table (created inside the loader only when the extension is loadable).populate_embeddings()mergesembeddings.dbintoscripture.db, preserving rowid ↔ chunk_id order._tools/sqlite_vec_loader.pywrapsenable_load_extension()+sqlite_vec.load()with typed skip messages.validate_sqlite.pygets section 6 EMBEDDINGS: table presence, row-count parity, orphan check, DB size cap (250MB).app/src/db/index.ts.Graceful skip behavior
The content-pipeline CI doesn't have
sqlite_vecinstalled and noembeddings.dbis produced at PR time. The loader and validator both skip with a warning — no failure.Test plan
python3 _tools/build_sqlite.pypasses with noembeddings.db(CI scenario)python3 _tools/validate_sqlite.pywarns but passes in the same scenarioembeddings.db+pip install sqlite-vec, build populates the vec0 table and validate section 6 goes fully greenSELECT rowid, distance FROM embeddings WHERE embedding MATCH ? ORDER BY distance LIMIT 5returns resultsOut of scope
save_chapterdirty-marker in ai-partner: build embeddings pipeline script #1447https://claude.ai/code/session_01Pht3kzgdvkn81DDfL9SnFe