From 332ee800a85fd35ded4e37adabecbfdd6221d31b Mon Sep 17 00:00:00 2001 From: Tomasz Szuster Date: Mon, 4 May 2026 13:47:26 +0200 Subject: [PATCH 1/2] feat(embeddings): add LM Studio as a first-class embedding provider (#42) LM Studio's Local Server speaks the OpenAI-compatible /v1/embeddings protocol, so users running it as their model host (chat plus embedding in one desktop app, GGUF model management) had no clean integration path. Changes: - src/services/provider-lmstudio.ts: new LMStudioEmbeddingProvider wrapping the OpenAI SDK with a custom baseURL (default http://localhost:1234/v1). Sends a placeholder API key to satisfy the OpenAI SDK while LM Studio's Local Server runs without auth by default. Skips the dimensions parameter because LM Studio models have no Matryoshka projection. Forces encoding_format=float to defeat the OpenAI SDK 6.x base64 default, which would otherwise mangle LM Studio's plain-array responses into 1024 zeros. - src/services/embedding-config.ts: extends the EmbeddingProvider union, reads LMSTUDIO_URL and LMSTUDIO_API_KEY, fail-fast validation when EMBEDDING_PROVIDER=lmstudio without EMBEDDING_MODEL or EMBEDDING_DIMENSIONS. - src/services/embedding-provider.ts: factory case for lmstudio with a dynamic import to avoid loading the OpenAI SDK at startup for ollama users. - ensureReady distinguishes "LM Studio unreachable" from "reachable but embedding model not loaded" so the operator knows whether to start the Local Server or load the configured model. - src/services/qdrant.ts: minor refactor to extract the hybrid-search query payload to a local const for readability. - README.md: dedicated LM Studio section, MCP host config example, env-var table entries. - tests/unit/embedding-config.test.ts: 8 new cases (required-env validation, URL default and override, optional API key, context-length override). - tests/unit/embedding-provider.test.ts: 3 new cases (factory wiring, ensureReady error format against a closed port, healthCheck unreachable output). Backward compatible. The lmstudio provider is opt-in via EMBEDDING_PROVIDER=lmstudio. Existing ollama, openai, and google paths are untouched. --- README.md | 47 +++++- src/services/embedding-config.ts | 68 ++++++-- src/services/embedding-provider.ts | 14 +- src/services/provider-lmstudio.ts | 220 ++++++++++++++++++++++++++ src/services/qdrant.ts | 31 ++-- tests/unit/embedding-config.test.ts | 83 +++++++++- tests/unit/embedding-provider.test.ts | 68 ++++++++ 7 files changed, 497 insertions(+), 34 deletions(-) create mode 100644 src/services/provider-lmstudio.ts diff --git a/README.md b/README.md index 89048a2..0580cee 100644 --- a/README.md +++ b/README.md @@ -240,7 +240,7 @@ On VS Code's 2.45M‑line codebase, SocratiCode answers architectural questions - **Hybrid code search** — Built on Qdrant, a purpose-built vector database with HNSW indexing, concurrent read/write, and payload filtering. Each chunk stores both a dense vector and a BM25 sparse vector; the Query API runs both sub-queries in a single round-trip and fuses results with Reciprocal Rank Fusion (RRF). Semantic search handles conceptual queries like "authentication middleware" even when those exact words don't appear in the code. BM25 handles exact identifier and keyword lookups. You get the best of both in every query with no tuning required. - **Configurable Qdrant** — Use the built-in Docker Qdrant (default, zero config) or connect to your own instance (self-hosted, remote server, or Qdrant Cloud). Configure via `QDRANT_MODE`, `QDRANT_URL`, and `QDRANT_API_KEY` environment variables. - **Configurable Ollama** — Use the built-in Docker Ollama (default, zero config) or point to your own Ollama instance (native install -GPU access-, remote server, etc.). Configure via `OLLAMA_MODE`, `OLLAMA_URL`, `EMBEDDING_MODEL` and `EMBEDDING_DIMENSIONS` environment variables. -- **Multi-provider embeddings** — Switch between Local Ollama (private, GPU access), Docker Ollama (zero-config), OpenAI (`text-embedding-3-small`, fastest), or Google Gemini (`gemini-embedding-001`, free tier) with a single environment variable. No provider-specific configuration files. +- **Multi-provider embeddings** — Switch between Local Ollama (private, GPU access), Docker Ollama (zero-config), OpenAI (`text-embedding-3-small`, fastest), Google Gemini (`gemini-embedding-001`, free tier), or LM Studio (local OpenAI-compatible server) with a single environment variable. No provider-specific configuration files. - **Private & secure** — Everything runs on your machine — your code never leaves your network. The default Docker setup includes Ollama (embeddings) and Qdrant (vector storage) with no external API calls. No API costs, no token limits. Suitable for air-gapped and on-premises environments. Optional cloud providers (OpenAI, Google Gemini, Qdrant Cloud) are available but never required. - **AST-aware chunking** — Files are split at function/class boundaries using AST parsing (ast-grep), not arbitrary line counts. This produces higher-quality search results. Falls back to line-based chunking for unsupported languages. - **Polyglot code dependency graph** — Static analysis of import/require/use/include statements using ast-grep for 18+ languages. No external tools like dependency-cruiser required. Detects circular dependencies and generates visual Mermaid diagrams. @@ -685,6 +685,36 @@ Use Google's Gemini embedding API. Requires an [API key](https://aistudio.google > Defaults: `EMBEDDING_MODEL=gemini-embedding-001`, `EMBEDDING_DIMENSIONS=3072`. +#### LM Studio (local, OpenAI-compatible) + +[LM Studio](https://lmstudio.ai/) ships with a Local Server that exposes an OpenAI-compatible +API on `http://localhost:1234/v1`. Use this provider when you want to host embedding models +in LM Studio (e.g. when LM Studio is your single source for both chat and embedding models, +or when you want a Mac/Windows-friendly desktop UI for managing GGUF models). + +```json +{ + "mcpServers": { + "socraticode": { + "command": "node", + "args": ["/absolute/path/to/socraticode/dist/index.js"], + "env": { + "EMBEDDING_PROVIDER": "lmstudio", + "EMBEDDING_MODEL": "nomic-embed-text-v1.5", + "EMBEDDING_DIMENSIONS": "768" + } + } + } +} +``` + +> **No defaults — `EMBEDDING_MODEL` and `EMBEDDING_DIMENSIONS` are required.** LM Studio has +> no out-of-the-box embedding model; you load one yourself in the Local Server tab. SocratiCode +> fails fast if either is missing. +> +> Optional: `LMSTUDIO_URL` (default `http://localhost:1234/v1`) for non-default ports; +> `LMSTUDIO_API_KEY` if you've enabled API key auth in LM Studio. + ### Git Worktrees (shared index across directories) If you use [git worktrees](https://git-scm.com/docs/git-worktree) — or any workflow where the same repository lives in multiple directories — each path would normally get its own Qdrant index. This means redundant embedding and storage for what is essentially the same codebase. @@ -1072,10 +1102,10 @@ The rest of this section documents the variables themselves. Pass them using whi | Variable | Default | Description | |----------|---------|-------------| -| `EMBEDDING_PROVIDER` | `ollama` | Embedding backend: `ollama` (local, default), `openai`, or `google` | -| `EMBEDDING_MODEL` | *(per provider)* | Model name. Defaults: `nomic-embed-text` (ollama), `text-embedding-3-small` (openai), `gemini-embedding-001` (google) | -| `EMBEDDING_DIMENSIONS` | *(per provider)* | Vector dimensions. Defaults: `768` (ollama), `1536` (openai), `3072` (google) | -| `EMBEDDING_CONTEXT_LENGTH` | *(auto-detected)* | Model context window in tokens. Auto-detected for known models. Set manually for custom models. | +| `EMBEDDING_PROVIDER` | `ollama` | Embedding backend: `ollama` (local, default), `openai`, `google`, or `lmstudio` | +| `EMBEDDING_MODEL` | *(per provider)* | Model name. Defaults: `nomic-embed-text` (ollama), `text-embedding-3-small` (openai), `gemini-embedding-001` (google). **Required** for `lmstudio` (no default). | +| `EMBEDDING_DIMENSIONS` | *(per provider)* | Vector dimensions. Defaults: `768` (ollama), `1536` (openai), `3072` (google). **Required** for `lmstudio` (no default; varies per loaded model). | +| `EMBEDDING_CONTEXT_LENGTH` | *(auto-detected)* | Model context window in tokens. Auto-detected for known models. Set manually for custom or LM Studio models. | ### Ollama Configuration (when `EMBEDDING_PROVIDER=ollama`) @@ -1094,6 +1124,13 @@ The rest of this section documents the variables themselves. Pass them using whi | `OPENAI_API_KEY` | *(none)* | Required when `EMBEDDING_PROVIDER=openai`. Get from [platform.openai.com](https://platform.openai.com/api-keys) | | `GOOGLE_API_KEY` | *(none)* | Required when `EMBEDDING_PROVIDER=google`. Get from [aistudio.google.com](https://aistudio.google.com/apikey) | +### LM Studio Configuration (when `EMBEDDING_PROVIDER=lmstudio`) + +| Variable | Default | Description | +|----------|---------|-------------| +| `LMSTUDIO_URL` | `http://localhost:1234/v1` | Full base URL of LM Studio's OpenAI-compatible Local Server. Override when the server runs on a non-default port or a remote machine (e.g. `http://gpu-rig.local:5678/v1`). Must include the `/v1` suffix. | +| `LMSTUDIO_API_KEY` | *(none)* | Optional. LM Studio's Local Server has no auth by default; set this only if you've enabled API key auth in the LM Studio UI. | + ### Qdrant Configuration | Variable | Default | Description | diff --git a/src/services/embedding-config.ts b/src/services/embedding-config.ts index 7554df5..10c9be8 100644 --- a/src/services/embedding-config.ts +++ b/src/services/embedding-config.ts @@ -7,6 +7,8 @@ * - "ollama" (default): Use Ollama for embeddings (Docker or external). * - "openai": Use OpenAI Embeddings API. Requires OPENAI_API_KEY. * - "google": Use Google Generative AI Embedding API. Requires GOOGLE_API_KEY. + * - "lmstudio": Use a local LM Studio server (OpenAI-compatible). Requires + * EMBEDDING_MODEL and EMBEDDING_DIMENSIONS to be set explicitly. * * Ollama-specific: * OLLAMA_MODE: @@ -26,9 +28,16 @@ * OPENAI_API_KEY: Required for openai provider. * GOOGLE_API_KEY: Required for google provider. * + * LM Studio-specific: + * LMSTUDIO_URL: OpenAI-compatible base URL for LM Studio's local server. + * Default: http://localhost:1234/v1 + * LMSTUDIO_API_KEY: Optional API key. LM Studio's Local Server has no auth by default; + * set this only if you've enabled an API key in LM Studio. + * * Shared: - * EMBEDDING_MODEL: Model name (default depends on provider). - * EMBEDDING_DIMENSIONS: Vector dimensions — must match the model (default depends on provider). + * EMBEDDING_MODEL: Model name (default depends on provider; required for lmstudio). + * EMBEDDING_DIMENSIONS: Vector dimensions — must match the model (default depends on + * provider; required for lmstudio). * EMBEDDING_CONTEXT_LENGTH: Override context window in tokens (auto-detected for known models). */ @@ -36,7 +45,7 @@ import { logger } from "./logger.js"; // ── Types ───────────────────────────────────────────────────────────────── -export type EmbeddingProvider = "ollama" | "openai" | "google"; +export type EmbeddingProvider = "ollama" | "openai" | "google" | "lmstudio"; export type OllamaMode = "docker" | "external" | "auto"; export interface EmbeddingConfig { @@ -46,6 +55,8 @@ export interface EmbeddingConfig { ollamaMode: OllamaMode; /** Ollama API URL (only relevant when embeddingProvider is "ollama"). */ ollamaUrl: string; + /** LM Studio OpenAI-compatible base URL (only relevant when embeddingProvider is "lmstudio"). */ + lmstudioUrl: string; embeddingModel: string; embeddingDimensions: number; /** Max context window in tokens. Used for client-side pre-truncation. */ @@ -55,10 +66,16 @@ export interface EmbeddingConfig { // ── Provider defaults ───────────────────────────────────────────────────── +/** + * lmstudio has empty defaults: LM Studio has no out-of-the-box model — users must load + * one in the UI and choose dimensions to match. We fail-fast in loadEmbeddingConfig() + * when the user picks lmstudio without setting EMBEDDING_MODEL / EMBEDDING_DIMENSIONS. + */ const PROVIDER_DEFAULTS: Record = { - ollama: { model: "nomic-embed-text", dimensions: 768 }, - openai: { model: "text-embedding-3-small", dimensions: 1536 }, - google: { model: "gemini-embedding-001", dimensions: 3072 }, + ollama: { model: "nomic-embed-text", dimensions: 768 }, + openai: { model: "text-embedding-3-small", dimensions: 1536 }, + google: { model: "gemini-embedding-001", dimensions: 3072 }, + lmstudio: { model: "", dimensions: 0 }, }; // ── Ollama mode defaults ────────────────────────────────────────────────── @@ -109,14 +126,39 @@ export function loadEmbeddingConfig(): EmbeddingConfig { // ── Provider ──────────────────────────────────────────────────────── const rawProvider = process.env.EMBEDDING_PROVIDER || "ollama"; - if (rawProvider !== "ollama" && rawProvider !== "openai" && rawProvider !== "google") { + if ( + rawProvider !== "ollama" && + rawProvider !== "openai" && + rawProvider !== "google" && + rawProvider !== "lmstudio" + ) { throw new Error( - `Invalid EMBEDDING_PROVIDER: "${rawProvider}". Must be "ollama", "openai", or "google".`, + `Invalid EMBEDDING_PROVIDER: "${rawProvider}". Must be "ollama", "openai", "google", or "lmstudio".`, ); } const embeddingProvider: EmbeddingProvider = rawProvider; const providerDefaults = PROVIDER_DEFAULTS[embeddingProvider]; + // LM Studio has no sensible defaults — model and dimensions vary per loaded model. + // Fail fast with an actionable message rather than silently sending empty values. + if (embeddingProvider === "lmstudio") { + if (!process.env.EMBEDDING_MODEL) { + throw new Error( + "EMBEDDING_MODEL is required when EMBEDDING_PROVIDER=lmstudio. " + + "LM Studio has no built-in default — set it to the model identifier shown in " + + "LM Studio's Local Server tab (e.g. EMBEDDING_MODEL=nomic-embed-text-v1.5).", + ); + } + if (!process.env.EMBEDDING_DIMENSIONS) { + throw new Error( + "EMBEDDING_DIMENSIONS is required when EMBEDDING_PROVIDER=lmstudio. " + + "Different LM Studio models have different output dimensions — check the model card " + + "and set EMBEDDING_DIMENSIONS accordingly (e.g. 768 for nomic-embed-text-v1.5, " + + "1024 for bge-large-en-v1.5, 4096 for qwen3-embedding-8b).", + ); + } + } + // ── Ollama mode (only relevant for ollama provider) ───────────────── const rawMode = process.env.OLLAMA_MODE || "auto"; if (rawMode !== "docker" && rawMode !== "external" && rawMode !== "auto") { @@ -145,6 +187,7 @@ export function loadEmbeddingConfig(): EmbeddingConfig { embeddingProvider, ollamaMode, ollamaUrl: process.env.OLLAMA_URL || modeDefaults.url, + lmstudioUrl: process.env.LMSTUDIO_URL || "http://localhost:1234/v1", embeddingModel, embeddingDimensions, embeddingContextLength: contextLengthEnv @@ -167,6 +210,9 @@ export function loadEmbeddingConfig(): EmbeddingConfig { ollamaMode: _config.ollamaMode, ollamaUrl: _config.ollamaUrl, } : {}), + ...(embeddingProvider === "lmstudio" ? { + lmstudioUrl: _config.lmstudioUrl, + } : {}), embeddingModel: _config.embeddingModel, embeddingDimensions: _config.embeddingDimensions, embeddingContextLength: _config.embeddingContextLength || "auto", @@ -174,7 +220,11 @@ export function loadEmbeddingConfig(): EmbeddingConfig { ? _config.ollamaApiKey : embeddingProvider === "openai" ? process.env.OPENAI_API_KEY - : process.env.GOOGLE_API_KEY), + : embeddingProvider === "google" + ? process.env.GOOGLE_API_KEY + : embeddingProvider === "lmstudio" + ? process.env.LMSTUDIO_API_KEY + : undefined), }); return _config; diff --git a/src/services/embedding-provider.ts b/src/services/embedding-provider.ts index 95faf59..0d664db 100644 --- a/src/services/embedding-provider.ts +++ b/src/services/embedding-provider.ts @@ -8,9 +8,10 @@ * about which backend generates the vectors. * * Providers: - * - ollama (default) — local Ollama (Docker or external) - * - openai — OpenAI Embeddings API (text-embedding-3-small, etc.) - * - google — Google Generative AI Embedding API (gemini-embedding-001, etc.) + * - ollama (default) — local Ollama (Docker or external) + * - openai — OpenAI Embeddings API (text-embedding-3-small, etc.) + * - google — Google Generative AI Embedding API (gemini-embedding-001, etc.) + * - lmstudio — local LM Studio server via OpenAI-compatible API */ import type { InfraProgressCallback } from "./docker.js"; @@ -60,9 +61,14 @@ export async function getEmbeddingProvider(onProgress?: InfraProgressCallback): _provider = new GoogleEmbeddingProvider(); break; } + case "lmstudio": { + const { LMStudioEmbeddingProvider } = await import("./provider-lmstudio.js"); + _provider = new LMStudioEmbeddingProvider(); + break; + } default: throw new Error( - `Unknown embedding provider: "${name}". Must be "ollama", "openai", or "google".`, + `Unknown embedding provider: "${name}". Must be "ollama", "openai", "google", or "lmstudio".`, ); } diff --git a/src/services/provider-lmstudio.ts b/src/services/provider-lmstudio.ts new file mode 100644 index 0000000..809d63c --- /dev/null +++ b/src/services/provider-lmstudio.ts @@ -0,0 +1,220 @@ +// SPDX-License-Identifier: AGPL-3.0-only +// Copyright (C) 2026 Giancarlo Erra - Altaire Limited +/** + * LM Studio embedding provider. + * + * LM Studio's Local Server exposes an OpenAI-compatible /v1/embeddings endpoint, + * so we reuse the OpenAI SDK with a custom baseURL. This provider is intentionally + * separate from `provider-openai.ts` because: + * - LM Studio runs locally with no auth by default; OpenAI is cloud-only and requires a key. + * - LM Studio models have no Matryoshka support, so we never send a `dimensions` parameter. + * - Health check messaging, defaults, and error guidance differ meaningfully. + * + * Required env when using this provider: + * EMBEDDING_PROVIDER=lmstudio + * EMBEDDING_MODEL= + * EMBEDDING_DIMENSIONS= + * + * Optional env: + * LMSTUDIO_URL=http://localhost:1234/v1 (default) + * LMSTUDIO_API_KEY= (only if you've enabled API key auth in LM Studio) + * EMBEDDING_CONTEXT_LENGTH= (defaults to 2048 if model unknown) + */ + +import OpenAI from "openai"; +import { getEmbeddingConfig } from "./embedding-config.js"; +import type { EmbeddingHealthStatus, EmbeddingProvider, EmbeddingReadinessResult } from "./embedding-types.js"; +import { logger } from "./logger.js"; + +// ── Constants ─────────────────────────────────────────────────────────── + +/** + * Conservative batch size — LM Studio runs locally and is bound by VRAM, + * not API rate limits. Large batches risk OOM with 7B+ embedding models. + * Tune via implementation if you have specific hardware headroom. + */ +const LMSTUDIO_BATCH_SIZE = 64; + +/** + * Conservative chars-per-token ratio for code. Same value used by provider-openai + * (LM Studio uses the same tokenizers for OpenAI-compat models in most cases). + */ +const CHARS_PER_TOKEN_ESTIMATE = 3.0; + +/** + * Fallback context length when EMBEDDING_CONTEXT_LENGTH is unset and the model + * is not in the known-models table. LM Studio embedding models commonly support + * 512–32768 tokens; 2048 is a safe lower bound that won't blow up for any + * mainstream model. + */ +const DEFAULT_CONTEXT_LENGTH = 2048; + +// ── Client management ─────────────────────────────────────────────────── + +let lmstudioClient: OpenAI | null = null; +let lmstudioBaseUrl: string | null = null; + +function getClient(): OpenAI { + const config = getEmbeddingConfig(); + const baseUrl = config.lmstudioUrl; + if (!lmstudioClient || lmstudioBaseUrl !== baseUrl) { + lmstudioClient = new OpenAI({ + // LM Studio doesn't validate the key by default. We send a non-empty placeholder + // because the OpenAI SDK throws if apiKey is empty/undefined. Users who enable + // API key auth in LM Studio's Local Server should set LMSTUDIO_API_KEY. + apiKey: process.env.LMSTUDIO_API_KEY || "lm-studio", + baseURL: baseUrl, + }); + lmstudioBaseUrl = baseUrl; + } + return lmstudioClient; +} + +/** Reset client (for testing or LMSTUDIO_URL hot-swap). */ +export function resetLMStudioClient(): void { + lmstudioClient = null; + lmstudioBaseUrl = null; +} + +// ── Pre-truncation ────────────────────────────────────────────────────── + +function pretruncateTexts(texts: string[], contextLength: number): string[] { + if (contextLength <= 0) return texts; + const maxChars = Math.floor(contextLength * CHARS_PER_TOKEN_ESTIMATE); + return texts.map((t) => (t.length > maxChars ? t.substring(0, maxChars) : t)); +} + +// ── Provider class ────────────────────────────────────────────────────── + +export class LMStudioEmbeddingProvider implements EmbeddingProvider { + readonly name = "lmstudio"; + + async ensureReady(): Promise { + const config = getEmbeddingConfig(); + const client = getClient(); + + // Step 1 — connectivity. The Local Server might be off, the port might be wrong, + // or LM Studio itself might not be running. Surface those as a single actionable + // message before checking model load state. + let modelList: Awaited>; + try { + modelList = await client.models.list(); + } catch (err) { + const message = err instanceof Error ? err.message : String(err); + throw new Error( + `LM Studio is not reachable at ${config.lmstudioUrl}. ` + + "Make sure LM Studio is running and the Local Server is started " + + "(Local Server tab > Start Server). " + + "If you've changed the port, set LMSTUDIO_URL accordingly (e.g. http://localhost:5678/v1). " + + `Underlying error: ${message}`, + ); + } + + // Step 2 — model loaded. LM Studio's /v1/models lists the currently-loaded + // models; if the configured EMBEDDING_MODEL isn't there, every embed() call + // will fail server-side with an opaque error. Fail early with a distinct, + // actionable message instead. + const modelLoaded = modelList.data.some((m) => m.id === config.embeddingModel); + if (!modelLoaded) { + throw new Error( + `LM Studio is reachable at ${config.lmstudioUrl} but the embedding model ` + + `"${config.embeddingModel}" is not loaded. Open LM Studio's Local Server tab, ` + + "load the model, and select it as the active embedding model — then retry. " + + "(Use EMBEDDING_MODEL to match the exact model identifier shown in LM Studio.)", + ); + } + + logger.info("LM Studio embedding provider ready", { + baseUrl: config.lmstudioUrl, + model: config.embeddingModel, + }); + // LM Studio is user-managed — no containers, no model pulls. + return { modelPulled: false, containerStarted: false, imagePulled: false }; + } + + async embed(texts: string[]): Promise { + if (texts.length === 0) return []; + + const config = getEmbeddingConfig(); + const client = getClient(); + const contextLength = config.embeddingContextLength > 0 + ? config.embeddingContextLength + : DEFAULT_CONTEXT_LENGTH; + const truncated = pretruncateTexts(texts, contextLength); + + if (truncated.length <= LMSTUDIO_BATCH_SIZE) { + return this._embedBatch(client, truncated, config.embeddingModel); + } + + const results: number[][] = []; + for (let i = 0; i < truncated.length; i += LMSTUDIO_BATCH_SIZE) { + const batch = truncated.slice(i, i + LMSTUDIO_BATCH_SIZE); + const embeddings = await this._embedBatch(client, batch, config.embeddingModel); + results.push(...embeddings); + } + return results; + } + + async embedSingle(text: string): Promise { + const results = await this.embed([text]); + if (results.length === 0) { + throw new Error("Embedding failed: no result returned"); + } + return results[0]; + } + + async healthCheck(): Promise { + const config = getEmbeddingConfig(); + const lines: string[] = []; + const icon = (ok: boolean) => (ok ? "[OK]" : "[MISSING]"); + + try { + const client = getClient(); + const models = await client.models.list(); + lines.push(`${icon(true)} LM Studio: Reachable at ${config.lmstudioUrl}`); + + // LM Studio /v1/models returns the list of currently-loaded models. + // If our embedding model isn't in that list, it likely isn't loaded. + const modelLoaded = models.data.some((m) => m.id === config.embeddingModel); + lines.push( + `${icon(modelLoaded)} Embedding model (${config.embeddingModel}): ` + + (modelLoaded + ? "Loaded" + : "Not loaded — load it in LM Studio's Local Server tab and select it as the active model"), + ); + + return { available: true, modelReady: modelLoaded, statusLines: lines }; + } catch (err) { + const message = err instanceof Error ? err.message : String(err); + lines.push(`${icon(false)} LM Studio: Not reachable at ${config.lmstudioUrl} (${message})`); + return { available: false, modelReady: false, statusLines: lines }; + } + } + + private async _embedBatch( + client: OpenAI, + texts: string[], + model: string, + ): Promise { + // No `dimensions` parameter: LM Studio doesn't implement Matryoshka projection. + // The model returns its native dimension and we trust the user to have set + // EMBEDDING_DIMENSIONS to match. + // + // `encoding_format: "float"` is REQUIRED. The OpenAI SDK (6.x+) defaults to + // `encoding_format: "base64"` for performance, then unconditionally decodes the + // response with toFloat32Array(). LM Studio ignores `encoding_format` and always + // returns a plain JSON array of floats. The SDK's decode path then runs + // `Buffer.from(, 'base64')` — Node.js silently drops the encoding for + // array inputs and clamps each float (<1.0) to uint8 0, producing a 4096-byte + // zero buffer that gets reinterpreted as a 1024-element Float32Array of zeros. + // Setting `encoding_format: "float"` makes the SDK skip the decode step entirely + // (see openai-node/src/resources/embeddings.ts: `if (hasUserProvidedEncodingFormat)`). + const response = await client.embeddings.create({ + model, + input: texts, + encoding_format: "float", + }); + const sorted = response.data.sort((a, b) => a.index - b.index); + return sorted.map((d) => d.embedding); + } +} diff --git a/src/services/qdrant.ts b/src/services/qdrant.ts index 3026fc0..4a644dd 100644 --- a/src/services/qdrant.ts +++ b/src/services/qdrant.ts @@ -337,22 +337,23 @@ async function searchChunksWithVector( const prefetchLimit = Math.max(limit * 3, 30); const activeFilter = filter.must.length > 0 ? filter : undefined; + const queryPayload = { + prefetch: [ + { query: queryVector, using: "dense", limit: prefetchLimit, filter: activeFilter }, + { + query: { text: query, model: "qdrant/bm25" }, + using: "bm25", + limit: prefetchLimit, + filter: activeFilter, + }, + ], + query: { fusion: "rrf" }, + limit, + with_payload: true, + filter: activeFilter, + }; const results = await withRetry( - () => qdrant.query(collectionName, { - prefetch: [ - { query: queryVector, using: "dense", limit: prefetchLimit, filter: activeFilter }, - { - query: { text: query, model: "qdrant/bm25" }, - using: "bm25", - limit: prefetchLimit, - filter: activeFilter, - }, - ], - query: { fusion: "rrf" }, - limit, - with_payload: true, - filter: activeFilter, - }), + () => qdrant.query(collectionName, queryPayload), "Qdrant hybrid search", ); diff --git a/tests/unit/embedding-config.test.ts b/tests/unit/embedding-config.test.ts index 7155e01..20cb132 100644 --- a/tests/unit/embedding-config.test.ts +++ b/tests/unit/embedding-config.test.ts @@ -22,6 +22,8 @@ describe("embedding-config", () => { delete process.env.OLLAMA_API_KEY; delete process.env.OPENAI_API_KEY; delete process.env.GOOGLE_API_KEY; + delete process.env.LMSTUDIO_URL; + delete process.env.LMSTUDIO_API_KEY; }); afterEach(() => { @@ -178,6 +180,7 @@ describe("embedding-config", () => { embeddingProvider: "ollama", ollamaMode: "external", ollamaUrl: "http://remote-gpu:11434", + lmstudioUrl: "http://localhost:1234/v1", embeddingModel: "mxbai-embed-large", embeddingDimensions: 1024, embeddingContextLength: 512, @@ -215,7 +218,7 @@ describe("embedding-config", () => { it("throws for invalid EMBEDDING_PROVIDER", () => { process.env.EMBEDDING_PROVIDER = "anthropic"; expect(() => loadEmbeddingConfig()).toThrow( - 'Invalid EMBEDDING_PROVIDER: "anthropic". Must be "ollama", "openai", or "google".', + 'Invalid EMBEDDING_PROVIDER: "anthropic". Must be "ollama", "openai", "google", or "lmstudio".', ); }); @@ -248,4 +251,82 @@ describe("embedding-config", () => { expect(config.embeddingContextLength).toBe(4096); }); }); + + describe("lmstudio provider", () => { + it("loads when EMBEDDING_MODEL and EMBEDDING_DIMENSIONS are set", () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + process.env.EMBEDDING_DIMENSIONS = "768"; + + const config = loadEmbeddingConfig(); + expect(config.embeddingProvider).toBe("lmstudio"); + expect(config.embeddingModel).toBe("nomic-embed-text-v1.5"); + expect(config.embeddingDimensions).toBe(768); + }); + + it("defaults LMSTUDIO_URL to http://localhost:1234/v1", () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + process.env.EMBEDDING_DIMENSIONS = "768"; + + const config = loadEmbeddingConfig(); + expect(config.lmstudioUrl).toBe("http://localhost:1234/v1"); + }); + + it("respects LMSTUDIO_URL override", () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + process.env.EMBEDDING_DIMENSIONS = "768"; + process.env.LMSTUDIO_URL = "http://gpu-rig.local:5678/v1"; + + const config = loadEmbeddingConfig(); + expect(config.lmstudioUrl).toBe("http://gpu-rig.local:5678/v1"); + }); + + it("throws when EMBEDDING_MODEL is missing", () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_DIMENSIONS = "768"; + + expect(() => loadEmbeddingConfig()).toThrow( + /EMBEDDING_MODEL is required when EMBEDDING_PROVIDER=lmstudio/, + ); + }); + + it("throws when EMBEDDING_DIMENSIONS is missing", () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + + expect(() => loadEmbeddingConfig()).toThrow( + /EMBEDDING_DIMENSIONS is required when EMBEDDING_PROVIDER=lmstudio/, + ); + }); + + it("includes example dimensions in the error message for discoverability", () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + + expect(() => loadEmbeddingConfig()).toThrow( + /768 for nomic-embed-text-v1\.5/, + ); + }); + + it("does not require LMSTUDIO_API_KEY", () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + process.env.EMBEDDING_DIMENSIONS = "768"; + // Intentionally no LMSTUDIO_API_KEY — LM Studio's Local Server has no auth by default. + + expect(() => loadEmbeddingConfig()).not.toThrow(); + }); + + it("respects EMBEDDING_CONTEXT_LENGTH override", () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "qwen3-embedding-8b"; + process.env.EMBEDDING_DIMENSIONS = "4096"; + process.env.EMBEDDING_CONTEXT_LENGTH = "32768"; + + const config = loadEmbeddingConfig(); + expect(config.embeddingContextLength).toBe(32768); + }); + }); }); diff --git a/tests/unit/embedding-provider.test.ts b/tests/unit/embedding-provider.test.ts index 6a3e91d..7ddf9b6 100644 --- a/tests/unit/embedding-provider.test.ts +++ b/tests/unit/embedding-provider.test.ts @@ -19,6 +19,8 @@ describe("embedding-provider", () => { delete process.env.OLLAMA_API_KEY; delete process.env.OPENAI_API_KEY; delete process.env.GOOGLE_API_KEY; + delete process.env.LMSTUDIO_URL; + delete process.env.LMSTUDIO_API_KEY; }); afterEach(() => { @@ -45,6 +47,14 @@ describe("embedding-provider", () => { expect(provider.name).toBe("google"); }); + it("creates LMStudioEmbeddingProvider when configured", async () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + process.env.EMBEDDING_DIMENSIONS = "768"; + const provider = await getEmbeddingProvider(); + expect(provider.name).toBe("lmstudio"); + }); + it("caches provider instance", async () => { const p1 = await getEmbeddingProvider(); const p2 = await getEmbeddingProvider(); @@ -133,3 +143,61 @@ describe("GoogleEmbeddingProvider", () => { expect(health.statusLines.some((l) => l.includes("Missing"))).toBe(true); }); }); + +describe("LMStudioEmbeddingProvider", () => { + const originalEnv = { ...process.env }; + + beforeEach(() => { + resetEmbeddingConfig(); + resetEmbeddingProvider(); + delete process.env.EMBEDDING_PROVIDER; + delete process.env.EMBEDDING_MODEL; + delete process.env.EMBEDDING_DIMENSIONS; + delete process.env.EMBEDDING_CONTEXT_LENGTH; + delete process.env.LMSTUDIO_URL; + delete process.env.LMSTUDIO_API_KEY; + }); + + afterEach(() => { + resetEmbeddingConfig(); + resetEmbeddingProvider(); + process.env = { ...originalEnv }; + }); + + it("ensureReady throws an actionable error when LM Studio is unreachable", async () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + process.env.EMBEDDING_DIMENSIONS = "768"; + // Point at a deliberately closed port so the request fails fast. + process.env.LMSTUDIO_URL = "http://127.0.0.1:1/v1"; + + const provider = await getEmbeddingProvider(); + await expect(provider.ensureReady()).rejects.toThrow( + /LM Studio is not reachable at http:\/\/127\.0\.0\.1:1\/v1/, + ); + }); + + it("healthCheck reports unreachable LM Studio without throwing", async () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + process.env.EMBEDDING_DIMENSIONS = "768"; + process.env.LMSTUDIO_URL = "http://127.0.0.1:1/v1"; + + const provider = await getEmbeddingProvider(); + const health = await provider.healthCheck(); + + expect(health.available).toBe(false); + expect(health.modelReady).toBe(false); + expect(health.statusLines.some((l) => l.includes("LM Studio") && l.includes("Not reachable"))).toBe(true); + }); + + it("does not require LMSTUDIO_API_KEY to construct the provider", async () => { + process.env.EMBEDDING_PROVIDER = "lmstudio"; + process.env.EMBEDDING_MODEL = "nomic-embed-text-v1.5"; + process.env.EMBEDDING_DIMENSIONS = "768"; + // Intentionally no LMSTUDIO_API_KEY. + + const provider = await getEmbeddingProvider(); + expect(provider.name).toBe("lmstudio"); + }); +}); From 852ae5d6d5b0dfdb99bbfc11319052ae4cc4e6a8 Mon Sep 17 00:00:00 2001 From: Giancarlo Erra Date: Mon, 4 May 2026 12:48:54 +0100 Subject: [PATCH 2/2] chore: release v1.8.3 --- .claude-plugin/plugin.json | 2 +- .codex-plugin/plugin.json | 2 +- .cursor-plugin/plugin.json | 2 +- CHANGELOG.md | 6 ++++++ extension/package.json | 2 +- package-lock.json | 4 ++-- package.json | 2 +- 7 files changed, 13 insertions(+), 7 deletions(-) diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index 2744dcc..0cffe04 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "socraticode", - "version": "1.8.2", + "version": "1.8.3", "description": "Codebase intelligence — semantic search workflows, dependency graph analysis, and context artifact exploration for SocratiCode", "author": { "name": "Giancarlo Erra", diff --git a/.codex-plugin/plugin.json b/.codex-plugin/plugin.json index aed2d1a..536407a 100644 --- a/.codex-plugin/plugin.json +++ b/.codex-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "socraticode", - "version": "1.8.2", + "version": "1.8.3", "description": "Codebase intelligence: semantic search workflows, dependency graph analysis, and context artifact exploration for SocratiCode", "author": { "name": "Giancarlo Erra", diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json index 531486b..83bbfb1 100644 --- a/.cursor-plugin/plugin.json +++ b/.cursor-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "socraticode", - "version": "1.8.2", + "version": "1.8.3", "description": "Codebase intelligence: semantic search workflows, dependency graph analysis, and context artifact exploration for SocratiCode", "author": { "name": "Giancarlo Erra", diff --git a/CHANGELOG.md b/CHANGELOG.md index faa3de2..653bffc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,12 @@ All notable changes to SocratiCode are documented here. This project uses [Conventional Commits](https://www.conventionalcommits.org/) and [Semantic Versioning](https://semver.org/). +## [1.8.3](https://github.com/giancarloerra/socraticode/compare/v1.8.2...v1.8.3) (2026-05-04) + +### Features + +* **embeddings:** add LM Studio as a first-class embedding provider ([#42](https://github.com/giancarloerra/socraticode/issues/42)) ([332ee80](https://github.com/giancarloerra/socraticode/commit/332ee800a85fd35ded4e37adabecbfdd6221d31b)) + ## [1.8.2](https://github.com/giancarloerra/socraticode/compare/v1.8.1...v1.8.2) (2026-05-04) ### Bug Fixes diff --git a/extension/package.json b/extension/package.json index c5de980..9965967 100644 --- a/extension/package.json +++ b/extension/package.json @@ -2,7 +2,7 @@ "name": "socraticode", "displayName": "SocratiCode", "description": "Codebase context engine for AI assistants. Hybrid search, dependency and call graphs, symbol-level impact analysis (blast radius), interactive graph explorer, and searchable architecture artefacts. Works with Copilot agent mode, Cline, Continue, Roo Code, and any MCP-compatible host.", - "version": "1.8.2", + "version": "1.8.3", "publisher": "giancarloerra", "license": "AGPL-3.0-only", "icon": "images/icon.png", diff --git a/package-lock.json b/package-lock.json index 4b7012d..347c08e 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "socraticode", - "version": "1.8.2", + "version": "1.8.3", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "socraticode", - "version": "1.8.2", + "version": "1.8.3", "license": "AGPL-3.0-only", "dependencies": { "@ast-grep/lang-bash": "^0.0.7", diff --git a/package.json b/package.json index e7c0696..efb5b20 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "socraticode", "mcpName": "io.github.giancarloerra/socraticode", - "version": "1.8.2", + "version": "1.8.3", "description": "SocratiCode — MCP server for local codebase indexing, semantic search, and code dependency graphs. All private, all local via Docker.", "type": "module", "main": "dist/index.js",