# Bible Presenter — Stacked Embeddings Generator

This notebook generates `all_versions_embeddings.npy` for the Bible Presenter app.

**Before running:**
1. In the top menu go to **Runtime → Change runtime type → T4 GPU**
2. Then click **Runtime → Run all**
3. At the end a download of `all_versions_embeddings.npy` will start automatically

Versions embedded: `KJV, AMP, NIV, ESV, NKJV, NASB`  
Expected output size: ~287 MB  
Expected time with T4 GPU: ~5–10 minutes

In [None]:
# ── Cell 1: Check GPU ────────────────────────────────────────────────────────
import subprocess
result = subprocess.run(['nvidia-smi', '--query-gpu=name,memory.total', '--format=csv,noheader'],
                        capture_output=True, text=True)
if result.returncode == 0:
    print('GPU detected:', result.stdout.strip())
else:
    print('WARNING: No GPU detected. Go to Runtime → Change runtime type → T4 GPU')
    print('Continuing on CPU will take ~1-2 hours instead of ~5 minutes.')

In [None]:
# ── Cell 2: Install dependencies ─────────────────────────────────────────────
!pip install -q sentence-transformers numpy

In [None]:
# ── Cell 3: Download super_bible.db ──────────────────────────────────────────
import urllib.request
import os

DB_URL = 'https://raw.githubusercontent.com/alshival/super_bible/main/SUPER_BIBLE/super_bible.db'
DB_PATH = 'super_bible.db'

if not os.path.exists(DB_PATH):
    print('Downloading super_bible.db (~59 MB)...')
    urllib.request.urlretrieve(DB_URL, DB_PATH)
    print(f'Done. Size: {os.path.getsize(DB_PATH) / 1e6:.1f} MB')
else:
    print(f'Already exists. Size: {os.path.getsize(DB_PATH) / 1e6:.1f} MB')

# Quick sanity check
import sqlite3
db = sqlite3.connect(DB_PATH)
versions = [r[0] for r in db.execute("SELECT DISTINCT version FROM super_bible WHERE language='EN' ORDER BY version").fetchall()]
print('Available EN versions:', versions)
db.close()

In [None]:
# ── Cell 4: Generate stacked embeddings ──────────────────────────────────────
import sqlite3
import numpy as np
from sentence_transformers import SentenceTransformer

# Versions to embed — must match EMBEDDED_VERSIONS in src-tauri/src/store/mod.rs
VERSIONS = ['KJV', 'AMP', 'NIV', 'ESV', 'NKJV', 'NASB']
OUT_NPY  = 'all_versions_embeddings.npy'
OUT_IDX  = 'verse_index.json'

print('Loading model all-MiniLM-L6-v2...')
model = SentenceTransformer('all-MiniLM-L6-v2')
print('Model ready.')

db = sqlite3.connect(DB_PATH)

available = {r[0] for r in db.execute("SELECT DISTINCT version FROM super_bible WHERE language='EN'")}
versions_to_use = [v for v in VERSIONS if v in available]
missing = [v for v in VERSIONS if v not in available]
if missing:
    print(f'WARNING: versions not in DB, skipping: {missing}')
print(f'Embedding versions: {versions_to_use}')

all_embeddings = []
verse_index    = []

for version in versions_to_use:
    print(f'\n── {version} ──')
    rows = db.execute(
        'SELECT title, chapter, verse, text FROM super_bible '
        'WHERE version = ? AND language = ? ORDER BY book, chapter, verse',
        (version, 'EN')
    ).fetchall()
    print(f'  {len(rows)} verses')

    texts = [r[3] for r in rows]
    embs  = model.encode(
        texts,
        normalize_embeddings=True,
        batch_size=512,
        show_progress_bar=True,
        convert_to_numpy=True,
        device='cuda' if __import__('torch').cuda.is_available() else 'cpu',
    )
    all_embeddings.append(embs.astype(np.float32))
    for r in rows:
        verse_index.append({'book': r[0], 'chapter': int(r[1]), 'verse': int(r[2]), 'version': version})

db.close()

stacked = np.vstack(all_embeddings)
print(f'\nStacked shape: {stacked.shape}  ({stacked.nbytes / 1e6:.1f} MB)')
np.save(OUT_NPY, stacked)
print(f'Saved → {OUT_NPY}')

import json
with open(OUT_IDX, 'w') as f:
    json.dump(verse_index, f, separators=(',', ':'))
print(f'Saved → {OUT_IDX}')

In [None]:
# ── Cell 5: Download the files ────────────────────────────────────────────────
from google.colab import files
import os

npy_size = os.path.getsize('all_versions_embeddings.npy') / 1e6
idx_size = os.path.getsize('verse_index.json') / 1e6
print(f'all_versions_embeddings.npy  {npy_size:.1f} MB')
print(f'verse_index.json             {idx_size:.1f} MB')
print('\nStarting download...')

files.download('all_versions_embeddings.npy')
files.download('verse_index.json')

## After downloading

Place both files in your project:

```
desktop-rs/src-tauri/bible_data/all_versions_embeddings.npy
desktop-rs/src-tauri/bible_data/verse_index.json
```

Then upload `all_versions_embeddings.npy` to a **GitHub Release** (it's ~287 MB, too large for a regular commit):

1. Go to your repo → **Releases → Draft a new release**
2. Tag: `v1.0-models` (or similar)
3. Attach `all_versions_embeddings.npy` as a release asset
4. Publish

The `verse_index.json` (~3 MB) can be committed normally.

You're done!