Bug: MCP server ignores embedding.model from .java-codebase-rag.yml

  Environment

  - OS: macOS Darwin 25.5.0
  - python-codebase-rag commit: 6b883d9 (master)

  Problem

  The MCP server (server.py) does not load .java-codebase-rag.yml YAML configuration for embedding.model. It always
  falls back to the HuggingFace hub default sentence-transformers/all-MiniLM-L6-v2, causing search tool failures when
  offline or behind a corporate proxy with self-signed certificates.
  
  Reproduction

  1. Create .java-codebase-rag.yml in the project root:
  embedding:
    model: /Users/<user>/Downloads/sentence-transformers:all-MiniLM-L6-v2
  2. Configure MCP server in .mcp.json (no SBERT_MODEL env var):
  {
    "mcpServers": {
      "java-codebase-rag": {
        "command": ".venv/bin/java-codebase-rag-mcp",
        "env": {
          "LANCEDB_URI_PROJECT_ROOT": "/path/to/project"
        }
      }
    }
  }
  3. Call the search tool — it fails with:
  RuntimeError: Cannot send a request, as the client has been closed.

  Expected behavior

  The MCP server should read embedding.model from .java-codebase-rag.yml and use the locally cached model, same as the
  CLI does.

  Root cause

  The YAML config is resolved by resolve_operator_config() (config.py:248), which calls apply_to_os_environ() to set
  SBERT_MODEL in the environment. However, this function is only called from CLI commands (init, reprocess, etc.), not
  from the MCP server startup path.
  
  The MCP server instead imports SBERT_MODEL from index_common.py:8 at module load time:

  SBERT_MODEL = os.path.expandvars(
      os.path.expanduser(os.environ.get("SBERT_MODEL", "sentence-transformers/all-MiniLM-L6-v2"))
  )

  Since SBERT_MODEL is not set in the MCP server's environment, it defaults to the HuggingFace hub name. When search_v2
  runs (mcp_v2.py:903):

  model_name = resolved_sbert_model_for_process_env(SBERT_MODEL)

  It gets the hub name, and SentenceTransformer attempts to reach HuggingFace to check for adapter configs, which fails
  behind corporate proxies or offline.

  Affected tools

  - search — BROKEN (requires SentenceTransformers / embeddings)
  - find, resolve, describe, neighbors — unaffected (Kuzu graph only, no embedding model needed)

  Workaround

  Set SBERT_MODEL explicitly in .mcp.json env:
  {
    "mcpServers": {
       "java-codebase-rag": {
         "env": {
           "SBERT_MODEL": "/Users/<user>/Downloads/sentence-transformers:all-MiniLM-L6-v2"
         } 
       } 
     } 
  }  

  Suggested fix

  In server.py, during startup, detect .java-codebase-rag.yml in the project root (using _project_root()), parse the
  embedding.model key, and apply it to os.environ["SBERT_MODEL"] before any tool handler runs — mirroring what
  resolve_operator_config().apply_to_os_environ() already does for the CLI path. The _pick_str /
  _resolve_embedding_model_path functions in config.py already handle YAML precedence resolution and can be reused.

  Key files

  - index_common.py:7-10 — hardcoded default, import-time eval
  - java_codebase_rag/config.py:248 — resolve_operator_config() (CLI-only path)
  - java_codebase_rag/config.py:55-64 — resolved_sbert_model_for_process_env()
  - server.py — MCP server startup (no YAML loading for embedding)
  - mcp_v2.py:903 — where search_v2 reads the model name

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: MCP server ignores embedding.model from .java-codebase-rag.yml #238

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: MCP server ignores embedding.model from .java-codebase-rag.yml #238

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions