Skip to content

v1.0.0: Intelligence — candle runtime, orchestrator, reranker, query expansion#10

Merged
devwhodevs merged 13 commits intomainfrom
feat/v1.0-intelligence
Mar 25, 2026
Merged

v1.0.0: Intelligence — candle runtime, orchestrator, reranker, query expansion#10
devwhodevs merged 13 commits intomainfrom
feat/v1.0-intelligence

Conversation

@devwhodevs
Copy link
Copy Markdown
Owner

Summary

  • Runtime migration: Replaced ONNX (ort) with candle (pure Rust ML framework). GGUF model format with Metal acceleration on macOS. Drops ort and ndarray dependencies entirely.
  • 3 GGUF models: embeddinggemma-300M (embeddings, mandatory ~300MB), qwen3-reranker-0.6B (cross-encoder reranking, optional ~640MB), qwen3-0.6B (orchestration + query expansion, optional ~640MB)
  • Research orchestrator: Single LLM call classifies query intent (exact/conceptual/relationship/exploratory) and generates 2-4 query expansions. Adaptive lane weights per intent. Heuristic fallback when intelligence is disabled.
  • Reranker as 4th RRF lane: Two-pass fusion — 3-lane retrieval → RRF pass 1 → cross-encoder reranker scores top 30 → RRF pass 2 (4-lane)
  • Intelligence is opt-in: Users choose during engraph init or engraph configure --enable-intelligence. Search degrades gracefully to v0.7 quality when disabled.
  • Custom bidirectional transformer: CandleEmbed loads embeddinggemma GGUF via raw candle tensors with bidirectional attention (not the autoregressive quantized_gemma3 module)
  • engraph configure implemented: --enable-intelligence, --disable-intelligence, --model embed|rerank|expand <uri>
  • Dimension migration: Auto-detects embedding dimension change (384→256) and triggers re-index

Stats

  • 18 files changed, +3675 / -945 lines
  • 271 tests (up from 225), all passing
  • ort, ndarray removed; candle-core, candle-nn, candle-transformers added

Test plan

  • cargo fmt --check — clean
  • cargo clippy -- -D warnings — clean
  • cargo test --lib — 271/271 pass
  • cargo build --release — compiles
  • CI passes (fmt + clippy + test on macOS + Ubuntu)
  • Smoke test with real vault after model download

🤖 Generated with Claude Code

devwhodevs and others added 13 commits March 25, 2026 18:08
Introduces three Send traits (EmbedModel, RerankModel, OrchestratorModel),
supporting types (QueryIntent, OrchestrationResult, LaneWeights), and a
deterministic MockLlm backed by SHA-256 hashes — no model files needed in
tests. Foundation for v1.0 intelligence layer; old embedder/model modules
kept intact until Task 8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…uery test

- `h2.update(&hash)` → `h2.update(hash)` to satisfy clippy's needless_borrows_for_generic_args lint
- Remove unreachable `if union == 0` guard in `rerank_score` (already covered by the `q_set.is_empty() && d_set.is_empty()` early return)
- Add `test_mock_rerank_empty_query` to assert empty query scores 0.0; brings llm test count to 8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `ModelConfig` (embed/rerank/expand URI overrides) and `intelligence: Option<bool>` to `Config`, plus an `intelligence_enabled()` helper. Three new tests cover TOML parsing, defaults, and the disabled path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an llm_cache table to the SQLite schema with set/get methods for
caching LLM orchestration results by query hash. Includes 4 new tests
covering roundtrip, cache miss, overwrite, and embedding_dim meta.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the hardcoded float[384] in the vec0 CREATE TABLE statement
with a format-string using a caller-supplied `dim: usize`. All existing
callers pass 384 (no behaviour change); adds test_init_vec_table_custom_dim
to verify 256-dim tables create and round-trip correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add HfModelUri parsing for hf:org/repo/filename.gguf URIs, download_model
with progress bar and SHA256 verification, ensure_model for cache-aware
fetching, and ModelDefaults with canonical GGUF model URIs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the stored embedding dimension in meta does not match the loaded
model's dimension, reset_for_reindex drops and recreates the vec table,
clears the chunks table, and forces a full rebuild so all vectors are
regenerated at the new dimension.

Adds has_dimension_mismatch / reset_for_reindex to Store, migration
logic at the top of run_index, and two unit tests covering mismatch
detection and the unset-key early-out.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add orchestration_cache_key() function in search.rs
- Uses SHA256 hash of query string for deterministic cache keys
- Add test_cache_key_deterministic() to verify determinism and uniqueness
- All tests pass, clippy clean
Add orchestration JSON parsing with fallback for extracting structured
intent + expansions from LLM output, and CandleOrchestrator struct that
loads a quantized Qwen3 GGUF model for autoregressive text generation.
Falls back to heuristic_orchestrate when generation or parsing fails.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add format_reranker_input() and CandleRerank struct that loads a Qwen3-Reranker
GGUF model and scores (query, document) pairs via single forward pass Yes/No
logit softmax. Reuses the same download and tokenizer loading patterns as
CandleOrchestrator but does NOT do autoregressive generation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add search_with_intelligence() implementing the full intelligence search
pipeline: orchestration, multi-query 3-lane retrieval, RRF Pass 1, optional
reranker scoring (4th lane), and RRF Pass 2. Refactor search_internal to
delegate to this new function without intelligence models, preserving
existing behavior. Add SearchConfig, SearchOutput.intent, dedup_by_file,
and merge_seeds helpers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run_search: prepend "Intent: <variant>" before the explain lane breakdown when --explain is used
- run_status / format_status: load Config, derive intelligence enabled/disabled, include in both human-readable and JSON status output
- Updated format_status signature to accept `intelligence: &str`; updated both call sites and both unit tests (added assertions for the new field)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… query expansion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@devwhodevs devwhodevs merged commit 7a239d5 into main Mar 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant