A high-performance Go HTTP API for semantic Bible search, mimicking the functionality of the scripture frontend JS application with persistent in-memory indices for optimal performance.
- Semantic Search: Uses EmbeddingGemma-300M model at 128 dimensions (Matryoshka truncation)
- Pre-computed Embeddings: Leverages pre-computed embeddings from Arweave for fast initialization
- In-Memory Indices: Keeps vector indices in memory for sub-millisecond search latency
- Quantization Support: Memory-efficient storage using int8 quantization
- Granularity Options: Search at verse or chapter level
- Filtering: Support for book, chapter, and verse filters
- RESTful API: Echo framework for high-performance HTTP handling
The API is designed to mirror the semantic search capabilities of the JavaScript frontend while providing:
- Persistent in-memory storage (indices stay loaded between requests)
- Better resource utilization through Go's efficient memory management
- Support for advanced quantization techniques
- Horizontal scalability potential
GET /health
Returns the health status of the API.
GET /status
Returns detailed status information including loaded indices and memory usage.
GET Request (Recommended):
GET /search?q=God%20loved%20the%20world&k=10&book=John&chapter=3
POST Request (Also supported):
POST /search
Content-Type: application/json
{
"query": "God loved the world",
"options": {
"book": "John",
"chapter": "3",
"granularity": "verse",
"k": 10
}
}
GET Query Parameters:
qorquery- Search query text (required)k- Number of results (default: 10)book- Filter by Bible bookchapter- Filter by chapter numberverse- Filter by verse numbergranularity- Search granularity: "verse" or "chapter" (default: "verse")
Alternatively, filters can be embedded in the query text:
GET /search?q=love%20your%20enemies%20book:Matthew%20chapter:5
Response:
{
"query": "God loved the world",
"results": [
{
"book": "John",
"chapter": 3,
"verseNum": 16,
"text": "For God so loved the world...",
"_searchMeta": {
"similarity": 0.95,
"score": 0.95,
"reference": "John 3:16"
}
}
],
"count": 10,
"status": "success"
}POST /embed
Content-Type: application/json
{
"text": "sample text",
"type": "query"
}
This endpoint will generate embeddings directly once ONNX runtime is integrated.
ONNX Runtime Installation (Required for EmbeddingGemma)
The server now uses the real EmbeddingGemma-300m ONNX model instead of pre-computed embeddings. ONNX Runtime is required:
Option 1: System Package (Recommended)
# Fedora/RHEL/CentOS
sudo dnf install onnxruntime onnxruntime-devel
# Ubuntu/Debian
sudo apt-get install libonnxruntime-dev
# macOS
brew install onnxruntimeOption 2: Manual Installation
# Download ONNX Runtime (Linux x64 example)
curl -L -O "https://github.com/microsoft/onnxruntime/releases/download/v1.19.2/onnxruntime-linux-x64-1.19.2.tgz"
tar -xzf onnxruntime-linux-x64-1.19.2.tgz
# Copy libraries to system or local directory
sudo cp onnxruntime-linux-x64-1.19.2/lib/libonnxruntime.so* /usr/local/lib/
sudo ldconfig
# Or copy to project directory (see Running section below)Standard Installation (with system ONNX Runtime):
# Install dependencies
go mod download
# Build the server
go build -o goscriptureapi
# Run the server
./goscriptureapi -port 8080 -debugLocal ONNX Runtime Installation:
# If ONNX Runtime is not system-installed, copy libraries to project directory
cp path/to/onnxruntime-linux-x64-1.19.2/lib/libonnxruntime.so* .
ln -sf libonnxruntime.so onnxruntime.so
# Run with library path
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./goscriptureapi -port 8080 -debugEasy Startup Script:
# Use the included startup script for automatic configuration
./start_server.sh 8080 # Normal mode
./start_server.sh 8080 -debug # Debug mode
# The script automatically:
# - Builds the binary if needed
# - Detects ONNX Runtime installation
# - Sets up library paths and symlinks
# - Provides helpful status messagesThe server will automatically download required model files on first run:
- model.onnx (~480KB) - Model architecture
- model.onnx_data (~1.2GB) - EmbeddingGemma-300m weights
- tokenizer.model (~4.6MB) - SentencePiece tokenizer
These files are cached in data/models/ for subsequent runs.
Expected startup logs:
INFO Initializing EmbeddingGemma service...
INFO Downloading file... path=data/models/model.onnx_data url=https://huggingface.co/...
INFO File downloaded successfully path=data/models/model.onnx_data
INFO ONNX model loaded successfully
INFO SentencePiece tokenizer loaded successfully
INFO Real ONNX EmbeddingGemma model initialized successfully
-port: Port to listen on (default: 8080)-data: Directory to store cached data and models (default: ./data)-debug: Enable debug logging
"onnxruntime.so: cannot open shared object file"
# Option 1: Install system-wide
sudo dnf install onnxruntime # Fedora
sudo apt install libonnxruntime-dev # Ubuntu
# Option 2: Use local libraries
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./goscriptureapi"model.onnx_data: No such file or directory"
# Create symlinks in working directory
ln -sf data/models/model.onnx .
ln -sf data/models/model.onnx_data .Model download failures
- Check internet connectivity to huggingface.co
- Ensure sufficient disk space (~1.3GB for model files)
- Verify write permissions in data/models directory
# Build the image
docker build -t goscriptureapi .
# Run the container
docker run -p 8080:8080 goscriptureapi.
├── main.go # Entry point
├── internal/
│ ├── api/ # HTTP handlers
│ ├── config/ # Configuration
│ ├── embeddings/ # Embedding generation service
│ └── search/ # Search service and vector index
├── data/ # Cached data directory
└── go.mod # Go module definition
-
Embedding Model: Now uses real EmbeddingGemma-300m ONNX model for generating embeddings on-the-fly. This provides superior semantic understanding compared to pre-computed embeddings.
-
Vector Index: Implements both standard float32 and quantized int8 indices for memory efficiency.
-
Hybrid Architecture: Falls back to SimpleEmbeddingService with pre-computed embeddings if ONNX model fails to initialize (for development/debugging).
-
Caching:
- Model files (1.3GB total) cached locally in
data/models/ - Pre-computed embeddings from Arweave for fallback mode
- Generated embeddings are computed on-demand (not cached)
- Model files (1.3GB total) cached locally in
-
Concurrency: Indices and model initialization run in background goroutines for fast startup.
-
Memory Usage:
- Model weights: ~1.2GB (loaded once)
- ONNX Runtime overhead: ~200MB
- Verse indices: ~16MB unquantized, ~4MB with int8 quantization
- Total: ~1.4GB RAM
-
Search Latency:
- Embedding generation: ~400-600ms (EmbeddingGemma inference)
- Vector search: <1ms (in-memory operations)
- Total query time: ~500ms
-
Initialization:
- First-time: Downloads 1.3GB model files from Hugging Face
- Subsequent: ~30 seconds for model loading
- Pre-computed indices: Downloads ~20MB from Arweave
- Memory Usage: ~20MB total
- Search Latency: <100ms end-to-end
- Initialization: Downloads ~20MB from Arweave only
- Minimum: 2GB RAM, 2GB disk space
- Recommended: 4GB+ RAM, SSD storage
- CPU: Any modern x86_64 processor (no GPU required)
ONNX Runtime Integration: ✅ Complete - Full EmbeddingGemma-300m ONNX model integration- GPU Acceleration: Optional CUDA/ROCm support for faster embedding generation
- Advanced Quantization: Product quantization and other techniques for even better memory efficiency
- Embedding Caching: Cache generated embeddings to improve repeated query performance
- Distributed Index: Support for sharding across multiple instances
- Batch Processing: Batch multiple embeddings in single ONNX inference for better throughput
- Model Variants: Support for different EmbeddingGemma sizes (768D full model)
- Incremental Updates: Support for adding new texts without full reindexing
This API is designed to be a drop-in replacement for the JavaScript semantic search, maintaining the same:
- Query syntax and filtering options
- Result format and scoring
- Embedding model and dimensions (EmbeddingGemma at 128D)
MIT# goscripture-api-v2