v1.0.0: Intelligence — candle runtime, orchestrator, reranker, query expansion by devwhodevs · Pull Request #10 · devwhodevs/engraph

devwhodevs · 2026-03-25T17:28:57Z

Summary

Runtime migration: Replaced ONNX (ort) with candle (pure Rust ML framework). GGUF model format with Metal acceleration on macOS. Drops ort and ndarray dependencies entirely.
3 GGUF models: embeddinggemma-300M (embeddings, mandatory ~300MB), qwen3-reranker-0.6B (cross-encoder reranking, optional ~640MB), qwen3-0.6B (orchestration + query expansion, optional ~640MB)
Research orchestrator: Single LLM call classifies query intent (exact/conceptual/relationship/exploratory) and generates 2-4 query expansions. Adaptive lane weights per intent. Heuristic fallback when intelligence is disabled.
Reranker as 4th RRF lane: Two-pass fusion — 3-lane retrieval → RRF pass 1 → cross-encoder reranker scores top 30 → RRF pass 2 (4-lane)
Intelligence is opt-in: Users choose during engraph init or engraph configure --enable-intelligence. Search degrades gracefully to v0.7 quality when disabled.
Custom bidirectional transformer: CandleEmbed loads embeddinggemma GGUF via raw candle tensors with bidirectional attention (not the autoregressive quantized_gemma3 module)
engraph configure implemented: --enable-intelligence, --disable-intelligence, --model embed|rerank|expand <uri>
Dimension migration: Auto-detects embedding dimension change (384→256) and triggers re-index

Stats

18 files changed, +3675 / -945 lines
271 tests (up from 225), all passing
ort, ndarray removed; candle-core, candle-nn, candle-transformers added

Test plan

cargo fmt --check — clean
cargo clippy -- -D warnings — clean
cargo test --lib — 271/271 pass
cargo build --release — compiles
CI passes (fmt + clippy + test on macOS + Ubuntu)
Smoke test with real vault after model download

🤖 Generated with Claude Code

Introduces three Send traits (EmbedModel, RerankModel, OrchestratorModel), supporting types (QueryIntent, OrchestrationResult, LaneWeights), and a deterministic MockLlm backed by SHA-256 hashes — no model files needed in tests. Foundation for v1.0 intelligence layer; old embedder/model modules kept intact until Task 8. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…uery test - `h2.update(&hash)` → `h2.update(hash)` to satisfy clippy's needless_borrows_for_generic_args lint - Remove unreachable `if union == 0` guard in `rerank_score` (already covered by the `q_set.is_empty() && d_set.is_empty()` early return) - Add `test_mock_rerank_empty_query` to assert empty query scores 0.0; brings llm test count to 8 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds `ModelConfig` (embed/rerank/expand URI overrides) and `intelligence: Option<bool>` to `Config`, plus an `intelligence_enabled()` helper. Three new tests cover TOML parsing, defaults, and the disabled path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds an llm_cache table to the SQLite schema with set/get methods for caching LLM orchestration results by query hash. Includes 4 new tests covering roundtrip, cache miss, overwrite, and embedding_dim meta. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replaces the hardcoded float[384] in the vec0 CREATE TABLE statement with a format-string using a caller-supplied `dim: usize`. All existing callers pass 384 (no behaviour change); adds test_init_vec_table_custom_dim to verify 256-dim tables create and round-trip correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add HfModelUri parsing for hf:org/repo/filename.gguf URIs, download_model with progress bar and SHA256 verification, ensure_model for cache-aware fetching, and ModelDefaults with canonical GGUF model URIs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When the stored embedding dimension in meta does not match the loaded model's dimension, reset_for_reindex drops and recreates the vec table, clears the chunks table, and forces a full rebuild so all vectors are regenerated at the new dimension. Adds has_dimension_mismatch / reset_for_reindex to Store, migration logic at the top of run_index, and two unit tests covering mismatch detection and the unset-key early-out. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add orchestration_cache_key() function in search.rs - Uses SHA256 hash of query string for deterministic cache keys - Add test_cache_key_deterministic() to verify determinism and uniqueness - All tests pass, clippy clean

Add orchestration JSON parsing with fallback for extracting structured intent + expansions from LLM output, and CandleOrchestrator struct that loads a quantized Qwen3 GGUF model for autoregressive text generation. Falls back to heuristic_orchestrate when generation or parsing fails. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add format_reranker_input() and CandleRerank struct that loads a Qwen3-Reranker GGUF model and scores (query, document) pairs via single forward pass Yes/No logit softmax. Reuses the same download and tokenizer loading patterns as CandleOrchestrator but does NOT do autoregressive generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add search_with_intelligence() implementing the full intelligence search pipeline: orchestration, multi-query 3-lane retrieval, RRF Pass 1, optional reranker scoring (4th lane), and RRF Pass 2. Refactor search_internal to delegate to this new function without intelligence models, preserving existing behavior. Add SearchConfig, SearchOutput.intent, dedup_by_file, and merge_seeds helpers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- run_search: prepend "Intent: <variant>" before the explain lane breakdown when --explain is used - run_status / format_status: load Config, derive intelligence enabled/disabled, include in both human-readable and JSON status output - Updated format_status signature to accept `intelligence: &str`; updated both call sites and both unit tests (added assertions for the new field) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… query expansion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

devwhodevs and others added 13 commits March 25, 2026 18:08

Task 11: Add cache key function for orchestration results

0dd0aaf

- Add orchestration_cache_key() function in search.rs - Uses SHA256 hash of query string for deterministic cache keys - Add test_cache_key_deterministic() to verify determinism and uniqueness - All tests pass, clippy clean

chore: v1.0.0 — intelligence: candle runtime, orchestrator, reranker,…

d642ddf

… query expansion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

devwhodevs merged commit 7a239d5 into main Mar 25, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0: Intelligence — candle runtime, orchestrator, reranker, query expansion#10

v1.0.0: Intelligence — candle runtime, orchestrator, reranker, query expansion#10
devwhodevs merged 13 commits intomainfrom
feat/v1.0-intelligence

devwhodevs commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devwhodevs commented Mar 25, 2026

Summary

Stats

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant