v1.0.0: Intelligence — candle runtime, orchestrator, reranker, query expansion#10
Merged
devwhodevs merged 13 commits intomainfrom Mar 25, 2026
Merged
v1.0.0: Intelligence — candle runtime, orchestrator, reranker, query expansion#10devwhodevs merged 13 commits intomainfrom
devwhodevs merged 13 commits intomainfrom
Conversation
Introduces three Send traits (EmbedModel, RerankModel, OrchestratorModel), supporting types (QueryIntent, OrchestrationResult, LaneWeights), and a deterministic MockLlm backed by SHA-256 hashes — no model files needed in tests. Foundation for v1.0 intelligence layer; old embedder/model modules kept intact until Task 8. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…uery test - `h2.update(&hash)` → `h2.update(hash)` to satisfy clippy's needless_borrows_for_generic_args lint - Remove unreachable `if union == 0` guard in `rerank_score` (already covered by the `q_set.is_empty() && d_set.is_empty()` early return) - Add `test_mock_rerank_empty_query` to assert empty query scores 0.0; brings llm test count to 8 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `ModelConfig` (embed/rerank/expand URI overrides) and `intelligence: Option<bool>` to `Config`, plus an `intelligence_enabled()` helper. Three new tests cover TOML parsing, defaults, and the disabled path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an llm_cache table to the SQLite schema with set/get methods for caching LLM orchestration results by query hash. Includes 4 new tests covering roundtrip, cache miss, overwrite, and embedding_dim meta. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the hardcoded float[384] in the vec0 CREATE TABLE statement with a format-string using a caller-supplied `dim: usize`. All existing callers pass 384 (no behaviour change); adds test_init_vec_table_custom_dim to verify 256-dim tables create and round-trip correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add HfModelUri parsing for hf:org/repo/filename.gguf URIs, download_model with progress bar and SHA256 verification, ensure_model for cache-aware fetching, and ModelDefaults with canonical GGUF model URIs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the stored embedding dimension in meta does not match the loaded model's dimension, reset_for_reindex drops and recreates the vec table, clears the chunks table, and forces a full rebuild so all vectors are regenerated at the new dimension. Adds has_dimension_mismatch / reset_for_reindex to Store, migration logic at the top of run_index, and two unit tests covering mismatch detection and the unset-key early-out. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add orchestration_cache_key() function in search.rs - Uses SHA256 hash of query string for deterministic cache keys - Add test_cache_key_deterministic() to verify determinism and uniqueness - All tests pass, clippy clean
Add orchestration JSON parsing with fallback for extracting structured intent + expansions from LLM output, and CandleOrchestrator struct that loads a quantized Qwen3 GGUF model for autoregressive text generation. Falls back to heuristic_orchestrate when generation or parsing fails. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add format_reranker_input() and CandleRerank struct that loads a Qwen3-Reranker GGUF model and scores (query, document) pairs via single forward pass Yes/No logit softmax. Reuses the same download and tokenizer loading patterns as CandleOrchestrator but does NOT do autoregressive generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add search_with_intelligence() implementing the full intelligence search pipeline: orchestration, multi-query 3-lane retrieval, RRF Pass 1, optional reranker scoring (4th lane), and RRF Pass 2. Refactor search_internal to delegate to this new function without intelligence models, preserving existing behavior. Add SearchConfig, SearchOutput.intent, dedup_by_file, and merge_seeds helpers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run_search: prepend "Intent: <variant>" before the explain lane breakdown when --explain is used - run_status / format_status: load Config, derive intelligence enabled/disabled, include in both human-readable and JSON status output - Updated format_status signature to accept `intelligence: &str`; updated both call sites and both unit tests (added assertions for the new field) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… query expansion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ort) with candle (pure Rust ML framework). GGUF model format with Metal acceleration on macOS. Dropsortandndarraydependencies entirely.engraph initorengraph configure --enable-intelligence. Search degrades gracefully to v0.7 quality when disabled.CandleEmbedloads embeddinggemma GGUF via raw candle tensors with bidirectional attention (not the autoregressivequantized_gemma3module)engraph configureimplemented:--enable-intelligence,--disable-intelligence,--model embed|rerank|expand <uri>Stats
ort,ndarrayremoved;candle-core,candle-nn,candle-transformersaddedTest plan
cargo fmt --check— cleancargo clippy -- -D warnings— cleancargo test --lib— 271/271 passcargo build --release— compiles🤖 Generated with Claude Code