Skip to content

[FEATURE] Separate Embedding Base URL Configuration #1354

@pmffromspace

Description

@pmffromspace

Problem

Currently SurfSense appears to use a single OLLAMA_BASE_URL for both:

  • chat/completion models
  • embedding models

This makes it impossible to:

  • run embeddings on a separate Ollama instance
  • use a dedicated embedding server
  • split workloads across GPUs/machines
  • combine Ollama chat with another embedding backend

At the moment only the embedding model itself seems configurable:

OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text

but not a separate embedding endpoint/base URL.


Requested Feature

Add support for a dedicated embedding base URL, for example:

EMBEDDING_BASE_URL=http://localhost:11435

or provider-specific variants such as:

OLLAMA_EMBEDDING_BASE_URL=

Example Use Cases

Separate Ollama instances

OLLAMA_BASE_URL=http://ollama-chat:11434
EMBEDDING_BASE_URL=http://ollama-embed:11434

Dedicated embedding server

OLLAMA_BASE_URL=http://ollama:11434
EMBEDDING_BASE_URL=http://tei:8080

Why This Matters

Embedding generation is often:

  • GPU intensive
  • latency sensitive
  • deployed separately in production RAG systems

Separating inference and embedding backends is a common architecture pattern and would improve scalability and deployment flexibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions