KS75: Runtime-configurable embedding model by Liorrr · Pull Request #14 · bellkisai/kernel

Liorrr · 2026-04-08T17:46:01Z

Summary

Add EmbeddingProvider trait with FastembedProvider (10 local ONNX models) and OpenAIProvider (API) implementations
Embedding model, provider, and API URL configurable via config.toml, config_set CLI/API, or env vars (SHRIMPK_EMBEDDING_PROVIDER/MODEL/API_URL)
API keys read from env var only (SHRIMPK_EMBEDDING_API_KEY) — never stored in config
Auto-infer embedding_dim from model name (no manual dimension config needed)
Dimension mismatch detection on startup — hard error if existing data uses different dim
Model name sidecar file warns on same-dim model swaps
Default behavior unchanged: fastembed / BGE-small-EN-v1.5 / 384-dim

Files changed (11)

shrimpk-core: EmbeddingProvider trait, EmbeddingBackend enum, config fields, infer_embedding_dim()
shrimpk-memory: embedding_provider.rs (new), MultiEmbedder refactor, EchoEngine dim check
shrimpk-daemon: config_show/set for 3 new fields
shrimpk-mcp: config_show/set for 3 new fields
cli: config show/set

Test plan

cargo test --workspace — 628 passed, 0 failed, 141 ignored
cargo clippy --workspace — clean
Manual: daemon shows embedding fields in GET /api/config
Manual: PUT /api/config writes embedding settings to TOML
Manual: default behavior unchanged (fastembed/BGE-small/384)
Manual: switch to bge-large-en-v1.5, restart, verify 1024-dim vectors
Manual: set provider=openai with API key, verify API embedding works
Manual: dim mismatch on existing data → clear error message

🤖 Generated with Claude Code

- Add EmbeddingProvider enum (Fastembed | OpenAI) with Display/FromStr/Serde - Add embedding_provider, embedding_model, embedding_api_url to EchoConfig - Add infer_embedding_dim() helper mapping known model names to dimensions - Wire into FileConfig, resolve_config(), env vars (SHRIMPK_EMBEDDING_*) - Auto-infer embedding_dim from model name unless explicitly overridden - Re-export EmbeddingProvider from shrimpk-core root - 10 new tests: parse roundtrip, display, infer_dim known/fallback, serde, TOML Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ckend - Add EmbeddingProvider trait in traits.rs (embed, embed_batch, dimension, name) - Rename config enum EmbeddingProvider -> EmbeddingBackend (avoids trait name collision) - Re-export EmbeddingProvider trait from shrimpk-core root Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- New file: crates/shrimpk-memory/src/embedding_provider.rs - FastembedProvider: wraps fastembed TextEmbedding with runtime model selection (10 supported models: BGE, MiniLM, Nomic, MxBai, GTE) - OpenAIProvider: ureq HTTP client for /v1/embeddings (OpenAI, Ollama, LiteLLM) with audit logging, batch support, dimension validation - from_config() factory: selects provider based on EchoConfig.embedding_provider - API key from SHRIMPK_EMBEDDING_API_KEY env var only (never in config file) - 5 unit tests (4 pass, 1 ignored for model download) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…smatch detection - MultiEmbedder::new() -> MultiEmbedder::new(config: &EchoConfig) - Text channel delegates to Box<dyn EmbeddingProvider> from embedding_provider::from_config() - text_dimension() returns provider.dimension() (was hardcoded 384) - Added text_provider_name() for sidecar/logging - EchoEngine::load() checks dimension mismatch: hard error if stored != config dim - Model name sidecar (embedding_model.txt): written on persist(), warn on load() if changed - Updated all MultiEmbedder::new() call sites (embedder.rs tests, echo_precision_tuning.rs, echo_token_efficiency.rs) to pass &EchoConfig::default() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Daemon config_show: add embedding_provider, embedding_model, embedding_api_url - Daemon config_set: support setting all 3 embedding fields with validation - MCP config_show: add Embedding Provider section with source tracking - MCP config_set: add embedding_provider, embedding_model, embedding_api_url - CLI config show: add Embedding Provider section - CLI config set: add embedding_provider, embedding_model, embedding_api_url Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-08T17:52:21Z

Greptile Summary

This PR adds a pluggable embedding backend system (KS75), replacing the hard-wired fastembed call with an EmbeddingProvider trait that supports both local ONNX (fastembed, 10 models) and any OpenAI-compatible HTTP API. Configuration is exposed through all three surfaces — config.toml, CLI/daemon/MCP config_set, and env vars — with sensible priority ordering (env > file > auto-infer).

Key design decisions worth noting:

embed_blocking() helper correctly gates both the Mutex::lock() and the inference call inside tokio::task::block_in_place on multi-thread runtimes, and falls back to a direct call on current_thread (tests/CLI) to avoid the block_in_place panic. OpenAIProvider::call_api applies the same pattern as defense-in-depth.
EchoEngine::new and EchoEngine::load use embedder.text_dimension() (the provider's actual reported dimension) rather than config.embedding_dim for LSH construction, eliminating any stale-config risk.
Dimension mismatch detection on startup (hard error) and model-swap warning (sidecar file) are both wired into EchoEngine::load.
infer_embedding_dim() now covers all 10 fastembed models plus common OpenAI and Ollama models, matching resolve_fastembed_model exactly.
resolve_config() correctly tracks dim_set_by_file and applies the env-var dimension value (not just its presence).

One remaining bug: truncate_json in embedding_provider.rs slices the JSON string at byte 200, which panics if a multi-byte UTF-8 character straddles that boundary. While OpenAI's own error bodies are ASCII-only, third-party OpenAI-compatible endpoints (Ollama, vLLM, LiteLLM) can return non-ASCII text in error messages.

Confidence Score: 4/5

Safe to merge after one targeted one-liner fix in truncate_json; all previously flagged async-correctness and config-resolution bugs are resolved.

All five issues from the prior review rounds have been addressed with verified fixes (dim env var value applied, dim_set_by_file tracking, missing fastembed models added, block_in_place runtime-flavor detection, Mutex lock moved inside block_in_place). The only remaining bug is the UTF-8 byte-boundary panic in truncate_json — a one-liner fix (.chars().take(200).collect()). This only triggers on non-ASCII error responses from third-party API endpoints, so it won't affect the default fastembed path or standard OpenAI responses. Score is 4 rather than 5 because of the confirmed panic path.

crates/shrimpk-memory/src/embedding_provider.rs — truncate_json byte-boundary slice (line 321)

Vulnerabilities

API key (SHRIMPK_EMBEDDING_API_KEY) is read exclusively from the environment variable and never written to config.toml or logged — correct secret hygiene.
The Authorization header is set per-request and never surfaces in audit logs; the audit log only records data_bytes, batch_size, and the endpoint URL.
No SQL/command injection surfaces; the HTTP call is a structured JSON POST via ureq.
No SSRF mitigations on embedding_api_url — the URL is user-supplied and configurable, but this is intentional (Ollama/LiteLLM use cases). Acceptable for a local daemon.

Important Files Changed

Filename	Overview
crates/shrimpk-memory/src/embedding_provider.rs	New file implementing FastembedProvider and OpenAIProvider; contains a panic-path bug in truncate_json (byte-boundary slice on non-ASCII UTF-8); blocking/async correctness is well-handled with block_in_place logic
crates/shrimpk-core/src/config.rs	Adds EmbeddingBackend enum and three new EchoConfig fields; resolve_config() correctly applies env-var overrides and dim_set_by_file flag to prevent stale auto-infer; infer_embedding_dim covers all 10 fastembed models plus common OpenAI/Ollama models
crates/shrimpk-memory/src/embedder.rs	MultiEmbedder refactored to delegate text embedding to a pluggable EmbeddingProvider; vision/speech channels unchanged; clean abstraction
crates/shrimpk-memory/src/echo.rs	embed_blocking() correctly handles multi-thread vs current-thread Tokio runtime flavors; both Mutex lock and inference run inside block_in_place; EchoEngine::new/load use actual embedder dimension for LSH instead of config.embedding_dim
crates/shrimpk-core/src/traits.rs	EmbeddingProvider trait added with correct Send (not Sync) bound and default embed_batch implementation
crates/shrimpk-daemon/src/routes.rs	config_show and config_set correctly expose the three new embedding fields; input parsing for embedding_provider uses FromStr with proper error propagation
crates/shrimpk-mcp/src/tools.rs	MCP config_show/set correctly mirrors daemon embedding field additions
cli/src/main.rs	CLI config show/set correctly adds embedding_provider, embedding_model, and embedding_api_url with source attribution

Sequence Diagram

sequenceDiagram
    participant Caller as async caller (store/echo)
    participant Engine as EchoEngine
    participant EB as embed_blocking()
    participant BIP as block_in_place (multi-thread RT)
    participant Mutex as Mutex<MultiEmbedder>
    participant FE as FastembedProvider (ONNX)
    participant OAI as OpenAIProvider (ureq)

    Caller->>Engine: store(text) / echo(query)
    Engine->>EB: embed_blocking(|e| e.embed_text(...))
    EB->>EB: check runtime flavor
    alt multi-thread runtime (daemon)
        EB->>BIP: block_in_place(closure)
        BIP->>Mutex: lock()
        Mutex-->>BIP: &mut MultiEmbedder
        alt Fastembed backend
            BIP->>FE: embed(text)
            FE-->>BIP: Vec<f32>
        else OpenAI backend
            BIP->>OAI: call_api(texts)
            OAI->>OAI: block_in_place (inner, no-op)
            OAI-->>BIP: Vec<Vec<f32>>
        end
        BIP-->>EB: Result<Vec<f32>>
    else current_thread / no runtime
        EB->>Mutex: lock()
        Mutex-->>EB: &mut MultiEmbedder
        EB->>FE: embed(text)
        FE-->>EB: Vec<f32>
    end
    EB-->>Engine: Result<Vec<f32>>
    Engine-->>Caller: Ok(..)

_{Reviews (10): Last reviewed commit: "KS75: fix rustdoc private link in OpenAI..." | Re-trigger Greptile}

greptile-apps · 2026-04-08T17:52:25Z

+        let mut resp = req.send_json(&body).map_err(|e| {
+            ShrimPKError::Embedding(format!("OpenAI embedding API error at {endpoint}: {e}"))
+        })?;
+
+        let json: serde_json::Value = resp.body_mut().read_json().map_err(|e| {


Blocking HTTP call inside Tokio async worker threads

OpenAIProvider::call_api uses ureq, which is a fully synchronous/blocking HTTP library:

let mut resp = req.send_json(&body).map_err(|e| { … })?; let json: serde_json::Value = resp.body_mut().read_json().map_err(|e| { … })?;

This function is called from embed and embed_batch, which are called from MultiEmbedder::embed_text. All callers in EchoEngine are async fn (store, echo, echo_with_mode, persist, etc.), so this network-blocking call runs on a Tokio worker thread. With a 30-second timeout_global, a slow or unavailable API endpoint will block that worker thread for 30 s, starving all other tasks sharing that thread — including heartbeats and other user requests.

The fastembed path has always been CPU-blocking in these same spots, which is already non-ideal, but CPU inference is typically O(ms). Network I/O with a 30-second hard timeout is a qualitatively different risk.

The lowest-footprint fix is tokio::task::block_in_place at all call sites in echo.rs that hold the embedder lock:

// Example — EchoEngine::store (and all other async callers): let embedding = { let mut embedder = self.embedder.lock().map_err(|e| { ShrimPKError::Embedding(format!("MultiEmbedder lock poisoned: {e}")) })?; tokio::task::block_in_place(|| embedder.embed_text(embed_text))? };

block_in_place informs the multi-threaded Tokio scheduler that the current thread will block, allowing it to move other tasks to remaining threads rather than stalling them. It requires the multi_thread runtime (which the daemon uses). This change applies equally well to both fastembed and the new OpenAI path without requiring any trait changes.

Fixed in bda744f — added embed_blocking() helper using tokio::task::block_in_place(). All async embedder call sites (store, echo, vision, audio, entity ranking) now use it. Prevents worker-thread starvation for API-based providers. @greptile review again please

…ings - Fix SHRIMPK_EMBEDDING_DIM env var: apply value, not just check existence - Track file-config embedding_dim override to prevent infer_embedding_dim() from silently overwriting user-set dimensions - Add embed_blocking() helper using tokio::task::block_in_place to prevent worker-thread starvation with API-based embedding providers (30s timeout) - Convert all async embedder call sites (store, echo, vision, audio, entity) to use embed_blocking() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dding_dim models - Rewrite infer_embedding_dim_known_models and infer_embedding_dim_unknown_falls_back tests to use struct-init syntax with ..Default::default() instead of field reassignment, fixing clippy field_reassign_with_default lint - Add bge-m3 (1024), gte-large (1024), gte-base (768) branches to infer_embedding_dim() to match resolve_fastembed_model in shrimpk-memory - Expand test coverage to include the three new model entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…bedding_dim - EchoEngine::new(): capture text_dim, vision_dimension(), speech_dimension() from embedder before it is moved into Mutex - EchoEngine::load(): same fix for all three LSH index rebuilds - Prevents dimension mismatch when config.embedding_dim is stale or auto-inference failed but the embedder knows its real output dimension - All 6 CosineHash::new calls in echo.rs now use embedder methods Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…current_thread - Replace unconditional `tokio::task::block_in_place` in `embed_blocking()` with runtime-flavor detection via `Handle::try_current()` - Multi-thread runtime (daemon): still uses block_in_place to prevent worker starvation - Current-thread runtime (#[tokio::test]) or non-Tokio context: falls back to direct call, avoiding the panic - Updated doc comment explaining the three-branch behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…starvation - embed_blocking() previously acquired the MultiEmbedder Mutex before entering block_in_place, so a second task blocking on lock() would silently starve a Tokio worker thread for up to 30s. - Move both lock acquisition and inference into the block_in_place closure so Tokio can schedule other tasks while waiting for the lock. - Non-multi-thread fallback path unchanged (tests, CLI, single-thread). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Document that OpenAIProvider uses synchronous ureq HTTP on both the struct and call_api method, with explicit guidance that async callers must go through EchoEngine::embed_blocking(). - Satisfies Greptile flag on lines 200-204 by making the blocking contract visible in the provider file itself. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move tokio::task::block_in_place wrapping into OpenAIProvider::call_api so the provider handles its own blocking concern (30s ureq timeout) - Uses same runtime-detection pattern as EchoEngine::embed_blocking: block_in_place on multi-thread runtime, direct call otherwise - Defense-in-depth: inner wrap covers HTTP concern, outer wrap in echo.rs covers mutex-lock concern - Resolves Greptile P1 on lines 200-204 of embedding_provider.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace intra-doc link to private method crate::echo::EchoEngine::embed_blocking with backtick-quoted plain text. Fixes CI failure from RUSTDOCFLAGS="-D warnings". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Liorrr and others added 5 commits April 8, 2026 20:03

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread crates/shrimpk-core/src/config.rs

Liorrr and others added 2 commits April 8, 2026 22:17

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread crates/shrimpk-memory/src/echo.rs

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread crates/shrimpk-memory/src/echo.rs

Liorrr and others added 4 commits April 9, 2026 00:29

KS75: fix rustdoc private link in OpenAIProvider docs

5122b5b

Replace intra-doc link to private method crate::echo::EchoEngine::embed_blocking with backtick-quoted plain text. Fixes CI failure from RUSTDOCFLAGS="-D warnings". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Liorrr merged commit 78ce39c into master Apr 8, 2026
7 checks passed

Liorrr deleted the feat/ks75-model-config branch April 8, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KS75: Runtime-configurable embedding model#14

KS75: Runtime-configurable embedding model#14
Liorrr merged 13 commits intomasterfrom
feat/ks75-model-config

Liorrr commented Apr 8, 2026

Uh oh!

greptile-apps Bot commented Apr 8, 2026 •

edited

Loading

Vulnerabilities

Uh oh!

Uh oh!

greptile-apps Bot Apr 8, 2026

Uh oh!

Liorrr Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Liorrr commented Apr 8, 2026

Summary

Files changed (11)

Test plan

Uh oh!

greptile-apps Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Liorrr Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 8, 2026 •

edited

Loading

Liorrr Apr 8, 2026 •

edited

Loading