KS75: Runtime-configurable embedding model#14
Conversation
- Add EmbeddingProvider enum (Fastembed | OpenAI) with Display/FromStr/Serde - Add embedding_provider, embedding_model, embedding_api_url to EchoConfig - Add infer_embedding_dim() helper mapping known model names to dimensions - Wire into FileConfig, resolve_config(), env vars (SHRIMPK_EMBEDDING_*) - Auto-infer embedding_dim from model name unless explicitly overridden - Re-export EmbeddingProvider from shrimpk-core root - 10 new tests: parse roundtrip, display, infer_dim known/fallback, serde, TOML Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ckend - Add EmbeddingProvider trait in traits.rs (embed, embed_batch, dimension, name) - Rename config enum EmbeddingProvider -> EmbeddingBackend (avoids trait name collision) - Re-export EmbeddingProvider trait from shrimpk-core root Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New file: crates/shrimpk-memory/src/embedding_provider.rs - FastembedProvider: wraps fastembed TextEmbedding with runtime model selection (10 supported models: BGE, MiniLM, Nomic, MxBai, GTE) - OpenAIProvider: ureq HTTP client for /v1/embeddings (OpenAI, Ollama, LiteLLM) with audit logging, batch support, dimension validation - from_config() factory: selects provider based on EchoConfig.embedding_provider - API key from SHRIMPK_EMBEDDING_API_KEY env var only (never in config file) - 5 unit tests (4 pass, 1 ignored for model download) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…smatch detection - MultiEmbedder::new() -> MultiEmbedder::new(config: &EchoConfig) - Text channel delegates to Box<dyn EmbeddingProvider> from embedding_provider::from_config() - text_dimension() returns provider.dimension() (was hardcoded 384) - Added text_provider_name() for sidecar/logging - EchoEngine::load() checks dimension mismatch: hard error if stored != config dim - Model name sidecar (embedding_model.txt): written on persist(), warn on load() if changed - Updated all MultiEmbedder::new() call sites (embedder.rs tests, echo_precision_tuning.rs, echo_token_efficiency.rs) to pass &EchoConfig::default() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Daemon config_show: add embedding_provider, embedding_model, embedding_api_url - Daemon config_set: support setting all 3 embedding fields with validation - MCP config_show: add Embedding Provider section with source tracking - MCP config_set: add embedding_provider, embedding_model, embedding_api_url - CLI config show: add Embedding Provider section - CLI config set: add embedding_provider, embedding_model, embedding_api_url Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR adds a pluggable embedding backend system (KS75), replacing the hard-wired fastembed call with an Key design decisions worth noting:
One remaining bug: Confidence Score: 4/5Safe to merge after one targeted one-liner fix in truncate_json; all previously flagged async-correctness and config-resolution bugs are resolved. All five issues from the prior review rounds have been addressed with verified fixes (dim env var value applied, dim_set_by_file tracking, missing fastembed models added, block_in_place runtime-flavor detection, Mutex lock moved inside block_in_place). The only remaining bug is the UTF-8 byte-boundary panic in truncate_json — a one-liner fix (.chars().take(200).collect()). This only triggers on non-ASCII error responses from third-party API endpoints, so it won't affect the default fastembed path or standard OpenAI responses. Score is 4 rather than 5 because of the confirmed panic path. crates/shrimpk-memory/src/embedding_provider.rs — truncate_json byte-boundary slice (line 321)
|
| Filename | Overview |
|---|---|
| crates/shrimpk-memory/src/embedding_provider.rs | New file implementing FastembedProvider and OpenAIProvider; contains a panic-path bug in truncate_json (byte-boundary slice on non-ASCII UTF-8); blocking/async correctness is well-handled with block_in_place logic |
| crates/shrimpk-core/src/config.rs | Adds EmbeddingBackend enum and three new EchoConfig fields; resolve_config() correctly applies env-var overrides and dim_set_by_file flag to prevent stale auto-infer; infer_embedding_dim covers all 10 fastembed models plus common OpenAI/Ollama models |
| crates/shrimpk-memory/src/embedder.rs | MultiEmbedder refactored to delegate text embedding to a pluggable EmbeddingProvider; vision/speech channels unchanged; clean abstraction |
| crates/shrimpk-memory/src/echo.rs | embed_blocking() correctly handles multi-thread vs current-thread Tokio runtime flavors; both Mutex lock and inference run inside block_in_place; EchoEngine::new/load use actual embedder dimension for LSH instead of config.embedding_dim |
| crates/shrimpk-core/src/traits.rs | EmbeddingProvider trait added with correct Send (not Sync) bound and default embed_batch implementation |
| crates/shrimpk-daemon/src/routes.rs | config_show and config_set correctly expose the three new embedding fields; input parsing for embedding_provider uses FromStr with proper error propagation |
| crates/shrimpk-mcp/src/tools.rs | MCP config_show/set correctly mirrors daemon embedding field additions |
| cli/src/main.rs | CLI config show/set correctly adds embedding_provider, embedding_model, and embedding_api_url with source attribution |
Sequence Diagram
sequenceDiagram
participant Caller as async caller (store/echo)
participant Engine as EchoEngine
participant EB as embed_blocking()
participant BIP as block_in_place (multi-thread RT)
participant Mutex as Mutex<MultiEmbedder>
participant FE as FastembedProvider (ONNX)
participant OAI as OpenAIProvider (ureq)
Caller->>Engine: store(text) / echo(query)
Engine->>EB: embed_blocking(|e| e.embed_text(...))
EB->>EB: check runtime flavor
alt multi-thread runtime (daemon)
EB->>BIP: block_in_place(closure)
BIP->>Mutex: lock()
Mutex-->>BIP: &mut MultiEmbedder
alt Fastembed backend
BIP->>FE: embed(text)
FE-->>BIP: Vec<f32>
else OpenAI backend
BIP->>OAI: call_api(texts)
OAI->>OAI: block_in_place (inner, no-op)
OAI-->>BIP: Vec<Vec<f32>>
end
BIP-->>EB: Result<Vec<f32>>
else current_thread / no runtime
EB->>Mutex: lock()
Mutex-->>EB: &mut MultiEmbedder
EB->>FE: embed(text)
FE-->>EB: Vec<f32>
end
EB-->>Engine: Result<Vec<f32>>
Engine-->>Caller: Ok(..)
Reviews (10): Last reviewed commit: "KS75: fix rustdoc private link in OpenAI..." | Re-trigger Greptile
| let mut resp = req.send_json(&body).map_err(|e| { | ||
| ShrimPKError::Embedding(format!("OpenAI embedding API error at {endpoint}: {e}")) | ||
| })?; | ||
|
|
||
| let json: serde_json::Value = resp.body_mut().read_json().map_err(|e| { |
There was a problem hiding this comment.
Blocking HTTP call inside Tokio async worker threads
OpenAIProvider::call_api uses ureq, which is a fully synchronous/blocking HTTP library:
let mut resp = req.send_json(&body).map_err(|e| { … })?;
let json: serde_json::Value = resp.body_mut().read_json().map_err(|e| { … })?;This function is called from embed and embed_batch, which are called from MultiEmbedder::embed_text. All callers in EchoEngine are async fn (store, echo, echo_with_mode, persist, etc.), so this network-blocking call runs on a Tokio worker thread. With a 30-second timeout_global, a slow or unavailable API endpoint will block that worker thread for 30 s, starving all other tasks sharing that thread — including heartbeats and other user requests.
The fastembed path has always been CPU-blocking in these same spots, which is already non-ideal, but CPU inference is typically O(ms). Network I/O with a 30-second hard timeout is a qualitatively different risk.
The lowest-footprint fix is tokio::task::block_in_place at all call sites in echo.rs that hold the embedder lock:
// Example — EchoEngine::store (and all other async callers):
let embedding = {
let mut embedder = self.embedder.lock().map_err(|e| {
ShrimPKError::Embedding(format!("MultiEmbedder lock poisoned: {e}"))
})?;
tokio::task::block_in_place(|| embedder.embed_text(embed_text))?
};block_in_place informs the multi-threaded Tokio scheduler that the current thread will block, allowing it to move other tasks to remaining threads rather than stalling them. It requires the multi_thread runtime (which the daemon uses). This change applies equally well to both fastembed and the new OpenAI path without requiring any trait changes.
There was a problem hiding this comment.
Fixed in bda744f — added embed_blocking() helper using tokio::task::block_in_place(). All async embedder call sites (store, echo, vision, audio, entity ranking) now use it. Prevents worker-thread starvation for API-based providers. @greptile review again please
…ings - Fix SHRIMPK_EMBEDDING_DIM env var: apply value, not just check existence - Track file-config embedding_dim override to prevent infer_embedding_dim() from silently overwriting user-set dimensions - Add embed_blocking() helper using tokio::task::block_in_place to prevent worker-thread starvation with API-based embedding providers (30s timeout) - Convert all async embedder call sites (store, echo, vision, audio, entity) to use embed_blocking() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dding_dim models - Rewrite infer_embedding_dim_known_models and infer_embedding_dim_unknown_falls_back tests to use struct-init syntax with ..Default::default() instead of field reassignment, fixing clippy field_reassign_with_default lint - Add bge-m3 (1024), gte-large (1024), gte-base (768) branches to infer_embedding_dim() to match resolve_fastembed_model in shrimpk-memory - Expand test coverage to include the three new model entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bedding_dim - EchoEngine::new(): capture text_dim, vision_dimension(), speech_dimension() from embedder before it is moved into Mutex - EchoEngine::load(): same fix for all three LSH index rebuilds - Prevents dimension mismatch when config.embedding_dim is stale or auto-inference failed but the embedder knows its real output dimension - All 6 CosineHash::new calls in echo.rs now use embedder methods Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…current_thread - Replace unconditional `tokio::task::block_in_place` in `embed_blocking()` with runtime-flavor detection via `Handle::try_current()` - Multi-thread runtime (daemon): still uses block_in_place to prevent worker starvation - Current-thread runtime (#[tokio::test]) or non-Tokio context: falls back to direct call, avoiding the panic - Updated doc comment explaining the three-branch behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…starvation - embed_blocking() previously acquired the MultiEmbedder Mutex before entering block_in_place, so a second task blocking on lock() would silently starve a Tokio worker thread for up to 30s. - Move both lock acquisition and inference into the block_in_place closure so Tokio can schedule other tasks while waiting for the lock. - Non-multi-thread fallback path unchanged (tests, CLI, single-thread). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Document that OpenAIProvider uses synchronous ureq HTTP on both the struct and call_api method, with explicit guidance that async callers must go through EchoEngine::embed_blocking(). - Satisfies Greptile flag on lines 200-204 by making the blocking contract visible in the provider file itself. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move tokio::task::block_in_place wrapping into OpenAIProvider::call_api so the provider handles its own blocking concern (30s ureq timeout) - Uses same runtime-detection pattern as EchoEngine::embed_blocking: block_in_place on multi-thread runtime, direct call otherwise - Defense-in-depth: inner wrap covers HTTP concern, outer wrap in echo.rs covers mutex-lock concern - Resolves Greptile P1 on lines 200-204 of embedding_provider.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace intra-doc link to private method crate::echo::EchoEngine::embed_blocking with backtick-quoted plain text. Fixes CI failure from RUSTDOCFLAGS="-D warnings". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
EmbeddingProvidertrait withFastembedProvider(10 local ONNX models) andOpenAIProvider(API) implementationsconfig.toml,config_setCLI/API, or env vars (SHRIMPK_EMBEDDING_PROVIDER/MODEL/API_URL)SHRIMPK_EMBEDDING_API_KEY) — never stored in configembedding_dimfrom model name (no manual dimension config needed)Files changed (11)
shrimpk-core: EmbeddingProvider trait, EmbeddingBackend enum, config fields, infer_embedding_dim()shrimpk-memory: embedding_provider.rs (new), MultiEmbedder refactor, EchoEngine dim checkshrimpk-daemon: config_show/set for 3 new fieldsshrimpk-mcp: config_show/set for 3 new fieldscli: config show/setTest plan
cargo test --workspace— 628 passed, 0 failed, 141 ignoredcargo clippy --workspace— cleanGET /api/configPUT /api/configwrites embedding settings to TOML🤖 Generated with Claude Code