Skip to content

KS75: Runtime-configurable embedding model#14

Merged
Liorrr merged 13 commits intomasterfrom
feat/ks75-model-config
Apr 8, 2026
Merged

KS75: Runtime-configurable embedding model#14
Liorrr merged 13 commits intomasterfrom
feat/ks75-model-config

Conversation

@Liorrr
Copy link
Copy Markdown
Contributor

@Liorrr Liorrr commented Apr 8, 2026

Summary

  • Add EmbeddingProvider trait with FastembedProvider (10 local ONNX models) and OpenAIProvider (API) implementations
  • Embedding model, provider, and API URL configurable via config.toml, config_set CLI/API, or env vars (SHRIMPK_EMBEDDING_PROVIDER/MODEL/API_URL)
  • API keys read from env var only (SHRIMPK_EMBEDDING_API_KEY) — never stored in config
  • Auto-infer embedding_dim from model name (no manual dimension config needed)
  • Dimension mismatch detection on startup — hard error if existing data uses different dim
  • Model name sidecar file warns on same-dim model swaps
  • Default behavior unchanged: fastembed / BGE-small-EN-v1.5 / 384-dim

Files changed (11)

  • shrimpk-core: EmbeddingProvider trait, EmbeddingBackend enum, config fields, infer_embedding_dim()
  • shrimpk-memory: embedding_provider.rs (new), MultiEmbedder refactor, EchoEngine dim check
  • shrimpk-daemon: config_show/set for 3 new fields
  • shrimpk-mcp: config_show/set for 3 new fields
  • cli: config show/set

Test plan

  • cargo test --workspace — 628 passed, 0 failed, 141 ignored
  • cargo clippy --workspace — clean
  • Manual: daemon shows embedding fields in GET /api/config
  • Manual: PUT /api/config writes embedding settings to TOML
  • Manual: default behavior unchanged (fastembed/BGE-small/384)
  • Manual: switch to bge-large-en-v1.5, restart, verify 1024-dim vectors
  • Manual: set provider=openai with API key, verify API embedding works
  • Manual: dim mismatch on existing data → clear error message

🤖 Generated with Claude Code

Liorrr and others added 5 commits April 8, 2026 20:03
- Add EmbeddingProvider enum (Fastembed | OpenAI) with Display/FromStr/Serde
- Add embedding_provider, embedding_model, embedding_api_url to EchoConfig
- Add infer_embedding_dim() helper mapping known model names to dimensions
- Wire into FileConfig, resolve_config(), env vars (SHRIMPK_EMBEDDING_*)
- Auto-infer embedding_dim from model name unless explicitly overridden
- Re-export EmbeddingProvider from shrimpk-core root
- 10 new tests: parse roundtrip, display, infer_dim known/fallback, serde, TOML

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ckend

- Add EmbeddingProvider trait in traits.rs (embed, embed_batch, dimension, name)
- Rename config enum EmbeddingProvider -> EmbeddingBackend (avoids trait name collision)
- Re-export EmbeddingProvider trait from shrimpk-core root

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New file: crates/shrimpk-memory/src/embedding_provider.rs
- FastembedProvider: wraps fastembed TextEmbedding with runtime model selection
  (10 supported models: BGE, MiniLM, Nomic, MxBai, GTE)
- OpenAIProvider: ureq HTTP client for /v1/embeddings (OpenAI, Ollama, LiteLLM)
  with audit logging, batch support, dimension validation
- from_config() factory: selects provider based on EchoConfig.embedding_provider
- API key from SHRIMPK_EMBEDDING_API_KEY env var only (never in config file)
- 5 unit tests (4 pass, 1 ignored for model download)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…smatch detection

- MultiEmbedder::new() -> MultiEmbedder::new(config: &EchoConfig)
- Text channel delegates to Box<dyn EmbeddingProvider> from embedding_provider::from_config()
- text_dimension() returns provider.dimension() (was hardcoded 384)
- Added text_provider_name() for sidecar/logging
- EchoEngine::load() checks dimension mismatch: hard error if stored != config dim
- Model name sidecar (embedding_model.txt): written on persist(), warn on load() if changed
- Updated all MultiEmbedder::new() call sites (embedder.rs tests, echo_precision_tuning.rs,
  echo_token_efficiency.rs) to pass &EchoConfig::default()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Daemon config_show: add embedding_provider, embedding_model, embedding_api_url
- Daemon config_set: support setting all 3 embedding fields with validation
- MCP config_show: add Embedding Provider section with source tracking
- MCP config_set: add embedding_provider, embedding_model, embedding_api_url
- CLI config show: add Embedding Provider section
- CLI config set: add embedding_provider, embedding_model, embedding_api_url

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

This PR adds a pluggable embedding backend system (KS75), replacing the hard-wired fastembed call with an EmbeddingProvider trait that supports both local ONNX (fastembed, 10 models) and any OpenAI-compatible HTTP API. Configuration is exposed through all three surfaces — config.toml, CLI/daemon/MCP config_set, and env vars — with sensible priority ordering (env > file > auto-infer).

Key design decisions worth noting:

  • embed_blocking() helper correctly gates both the Mutex::lock() and the inference call inside tokio::task::block_in_place on multi-thread runtimes, and falls back to a direct call on current_thread (tests/CLI) to avoid the block_in_place panic. OpenAIProvider::call_api applies the same pattern as defense-in-depth.
  • EchoEngine::new and EchoEngine::load use embedder.text_dimension() (the provider's actual reported dimension) rather than config.embedding_dim for LSH construction, eliminating any stale-config risk.
  • Dimension mismatch detection on startup (hard error) and model-swap warning (sidecar file) are both wired into EchoEngine::load.
  • infer_embedding_dim() now covers all 10 fastembed models plus common OpenAI and Ollama models, matching resolve_fastembed_model exactly.
  • resolve_config() correctly tracks dim_set_by_file and applies the env-var dimension value (not just its presence).

One remaining bug: truncate_json in embedding_provider.rs slices the JSON string at byte 200, which panics if a multi-byte UTF-8 character straddles that boundary. While OpenAI's own error bodies are ASCII-only, third-party OpenAI-compatible endpoints (Ollama, vLLM, LiteLLM) can return non-ASCII text in error messages.

Confidence Score: 4/5

Safe to merge after one targeted one-liner fix in truncate_json; all previously flagged async-correctness and config-resolution bugs are resolved.

All five issues from the prior review rounds have been addressed with verified fixes (dim env var value applied, dim_set_by_file tracking, missing fastembed models added, block_in_place runtime-flavor detection, Mutex lock moved inside block_in_place). The only remaining bug is the UTF-8 byte-boundary panic in truncate_json — a one-liner fix (.chars().take(200).collect()). This only triggers on non-ASCII error responses from third-party API endpoints, so it won't affect the default fastembed path or standard OpenAI responses. Score is 4 rather than 5 because of the confirmed panic path.

crates/shrimpk-memory/src/embedding_provider.rs — truncate_json byte-boundary slice (line 321)

Vulnerabilities

  • API key (SHRIMPK_EMBEDDING_API_KEY) is read exclusively from the environment variable and never written to config.toml or logged — correct secret hygiene.
  • The Authorization header is set per-request and never surfaces in audit logs; the audit log only records data_bytes, batch_size, and the endpoint URL.
  • No SQL/command injection surfaces; the HTTP call is a structured JSON POST via ureq.
  • No SSRF mitigations on embedding_api_url — the URL is user-supplied and configurable, but this is intentional (Ollama/LiteLLM use cases). Acceptable for a local daemon.

Important Files Changed

Filename Overview
crates/shrimpk-memory/src/embedding_provider.rs New file implementing FastembedProvider and OpenAIProvider; contains a panic-path bug in truncate_json (byte-boundary slice on non-ASCII UTF-8); blocking/async correctness is well-handled with block_in_place logic
crates/shrimpk-core/src/config.rs Adds EmbeddingBackend enum and three new EchoConfig fields; resolve_config() correctly applies env-var overrides and dim_set_by_file flag to prevent stale auto-infer; infer_embedding_dim covers all 10 fastembed models plus common OpenAI/Ollama models
crates/shrimpk-memory/src/embedder.rs MultiEmbedder refactored to delegate text embedding to a pluggable EmbeddingProvider; vision/speech channels unchanged; clean abstraction
crates/shrimpk-memory/src/echo.rs embed_blocking() correctly handles multi-thread vs current-thread Tokio runtime flavors; both Mutex lock and inference run inside block_in_place; EchoEngine::new/load use actual embedder dimension for LSH instead of config.embedding_dim
crates/shrimpk-core/src/traits.rs EmbeddingProvider trait added with correct Send (not Sync) bound and default embed_batch implementation
crates/shrimpk-daemon/src/routes.rs config_show and config_set correctly expose the three new embedding fields; input parsing for embedding_provider uses FromStr with proper error propagation
crates/shrimpk-mcp/src/tools.rs MCP config_show/set correctly mirrors daemon embedding field additions
cli/src/main.rs CLI config show/set correctly adds embedding_provider, embedding_model, and embedding_api_url with source attribution

Sequence Diagram

sequenceDiagram
    participant Caller as async caller (store/echo)
    participant Engine as EchoEngine
    participant EB as embed_blocking()
    participant BIP as block_in_place (multi-thread RT)
    participant Mutex as Mutex<MultiEmbedder>
    participant FE as FastembedProvider (ONNX)
    participant OAI as OpenAIProvider (ureq)

    Caller->>Engine: store(text) / echo(query)
    Engine->>EB: embed_blocking(|e| e.embed_text(...))
    EB->>EB: check runtime flavor
    alt multi-thread runtime (daemon)
        EB->>BIP: block_in_place(closure)
        BIP->>Mutex: lock()
        Mutex-->>BIP: &mut MultiEmbedder
        alt Fastembed backend
            BIP->>FE: embed(text)
            FE-->>BIP: Vec<f32>
        else OpenAI backend
            BIP->>OAI: call_api(texts)
            OAI->>OAI: block_in_place (inner, no-op)
            OAI-->>BIP: Vec<Vec<f32>>
        end
        BIP-->>EB: Result<Vec<f32>>
    else current_thread / no runtime
        EB->>Mutex: lock()
        Mutex-->>EB: &mut MultiEmbedder
        EB->>FE: embed(text)
        FE-->>EB: Vec<f32>
    end
    EB-->>Engine: Result<Vec<f32>>
    Engine-->>Caller: Ok(..)
Loading

Reviews (10): Last reviewed commit: "KS75: fix rustdoc private link in OpenAI..." | Re-trigger Greptile

Comment thread crates/shrimpk-core/src/config.rs Outdated
Comment on lines +200 to +204
let mut resp = req.send_json(&body).map_err(|e| {
ShrimPKError::Embedding(format!("OpenAI embedding API error at {endpoint}: {e}"))
})?;

let json: serde_json::Value = resp.body_mut().read_json().map_err(|e| {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Blocking HTTP call inside Tokio async worker threads

OpenAIProvider::call_api uses ureq, which is a fully synchronous/blocking HTTP library:

let mut resp = req.send_json(&body).map_err(|e| {})?;
let json: serde_json::Value = resp.body_mut().read_json().map_err(|e| {})?;

This function is called from embed and embed_batch, which are called from MultiEmbedder::embed_text. All callers in EchoEngine are async fn (store, echo, echo_with_mode, persist, etc.), so this network-blocking call runs on a Tokio worker thread. With a 30-second timeout_global, a slow or unavailable API endpoint will block that worker thread for 30 s, starving all other tasks sharing that thread — including heartbeats and other user requests.

The fastembed path has always been CPU-blocking in these same spots, which is already non-ideal, but CPU inference is typically O(ms). Network I/O with a 30-second hard timeout is a qualitatively different risk.

The lowest-footprint fix is tokio::task::block_in_place at all call sites in echo.rs that hold the embedder lock:

// Example — EchoEngine::store (and all other async callers):
let embedding = {
    let mut embedder = self.embedder.lock().map_err(|e| {
        ShrimPKError::Embedding(format!("MultiEmbedder lock poisoned: {e}"))
    })?;
    tokio::task::block_in_place(|| embedder.embed_text(embed_text))?
};

block_in_place informs the multi-threaded Tokio scheduler that the current thread will block, allowing it to move other tasks to remaining threads rather than stalling them. It requires the multi_thread runtime (which the daemon uses). This change applies equally well to both fastembed and the new OpenAI path without requiring any trait changes.

Copy link
Copy Markdown
Contributor Author

@Liorrr Liorrr Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bda744f — added embed_blocking() helper using tokio::task::block_in_place(). All async embedder call sites (store, echo, vision, audio, entity ranking) now use it. Prevents worker-thread starvation for API-based providers. @greptile review again please

…ings

- Fix SHRIMPK_EMBEDDING_DIM env var: apply value, not just check existence
- Track file-config embedding_dim override to prevent infer_embedding_dim()
  from silently overwriting user-set dimensions
- Add embed_blocking() helper using tokio::task::block_in_place to prevent
  worker-thread starvation with API-based embedding providers (30s timeout)
- Convert all async embedder call sites (store, echo, vision, audio, entity)
  to use embed_blocking()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread crates/shrimpk-core/src/config.rs
Liorrr and others added 2 commits April 8, 2026 22:17
…dding_dim models

- Rewrite infer_embedding_dim_known_models and infer_embedding_dim_unknown_falls_back
  tests to use struct-init syntax with ..Default::default() instead of field reassignment,
  fixing clippy field_reassign_with_default lint
- Add bge-m3 (1024), gte-large (1024), gte-base (768) branches to infer_embedding_dim()
  to match resolve_fastembed_model in shrimpk-memory
- Expand test coverage to include the three new model entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bedding_dim

- EchoEngine::new(): capture text_dim, vision_dimension(), speech_dimension()
  from embedder before it is moved into Mutex
- EchoEngine::load(): same fix for all three LSH index rebuilds
- Prevents dimension mismatch when config.embedding_dim is stale or
  auto-inference failed but the embedder knows its real output dimension
- All 6 CosineHash::new calls in echo.rs now use embedder methods

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread crates/shrimpk-memory/src/echo.rs
…current_thread

- Replace unconditional `tokio::task::block_in_place` in `embed_blocking()`
  with runtime-flavor detection via `Handle::try_current()`
- Multi-thread runtime (daemon): still uses block_in_place to prevent
  worker starvation
- Current-thread runtime (#[tokio::test]) or non-Tokio context: falls
  back to direct call, avoiding the panic
- Updated doc comment explaining the three-branch behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread crates/shrimpk-memory/src/echo.rs
Liorrr and others added 4 commits April 9, 2026 00:29
…starvation

- embed_blocking() previously acquired the MultiEmbedder Mutex before
  entering block_in_place, so a second task blocking on lock() would
  silently starve a Tokio worker thread for up to 30s.
- Move both lock acquisition and inference into the block_in_place
  closure so Tokio can schedule other tasks while waiting for the lock.
- Non-multi-thread fallback path unchanged (tests, CLI, single-thread).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Document that OpenAIProvider uses synchronous ureq HTTP on both
  the struct and call_api method, with explicit guidance that async
  callers must go through EchoEngine::embed_blocking().
- Satisfies Greptile flag on lines 200-204 by making the blocking
  contract visible in the provider file itself.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move tokio::task::block_in_place wrapping into OpenAIProvider::call_api
  so the provider handles its own blocking concern (30s ureq timeout)
- Uses same runtime-detection pattern as EchoEngine::embed_blocking:
  block_in_place on multi-thread runtime, direct call otherwise
- Defense-in-depth: inner wrap covers HTTP concern, outer wrap in
  echo.rs covers mutex-lock concern
- Resolves Greptile P1 on lines 200-204 of embedding_provider.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace intra-doc link to private method
crate::echo::EchoEngine::embed_blocking with backtick-quoted
plain text. Fixes CI failure from RUSTDOCFLAGS="-D warnings".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Liorrr Liorrr merged commit 78ce39c into master Apr 8, 2026
7 checks passed
@Liorrr Liorrr deleted the feat/ks75-model-config branch April 8, 2026 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant