Problem
Currently SurfSense appears to use a single OLLAMA_BASE_URL for both:
- chat/completion models
- embedding models
This makes it impossible to:
- run embeddings on a separate Ollama instance
- use a dedicated embedding server
- split workloads across GPUs/machines
- combine Ollama chat with another embedding backend
At the moment only the embedding model itself seems configurable:
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text
but not a separate embedding endpoint/base URL.
Requested Feature
Add support for a dedicated embedding base URL, for example:
EMBEDDING_BASE_URL=http://localhost:11435
or provider-specific variants such as:
OLLAMA_EMBEDDING_BASE_URL=
Example Use Cases
Separate Ollama instances
OLLAMA_BASE_URL=http://ollama-chat:11434
EMBEDDING_BASE_URL=http://ollama-embed:11434
Dedicated embedding server
OLLAMA_BASE_URL=http://ollama:11434
EMBEDDING_BASE_URL=http://tei:8080
Why This Matters
Embedding generation is often:
- GPU intensive
- latency sensitive
- deployed separately in production RAG systems
Separating inference and embedding backends is a common architecture pattern and would improve scalability and deployment flexibility.
Problem
Currently SurfSense appears to use a single
OLLAMA_BASE_URLfor both:This makes it impossible to:
At the moment only the embedding model itself seems configurable:
but not a separate embedding endpoint/base URL.
Requested Feature
Add support for a dedicated embedding base URL, for example:
or provider-specific variants such as:
Example Use Cases
Separate Ollama instances
Dedicated embedding server
Why This Matters
Embedding generation is often:
Separating inference and embedding backends is a common architecture pattern and would improve scalability and deployment flexibility.