[FEATURE] Separate Embedding Base URL Configuration



### Problem

Currently SurfSense appears to use a single `OLLAMA_BASE_URL` for both:

* chat/completion models
* embedding models

This makes it impossible to:

* run embeddings on a separate Ollama instance
* use a dedicated embedding server
* split workloads across GPUs/machines
* combine Ollama chat with another embedding backend

At the moment only the embedding model itself seems configurable:

```env
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text
```

but not a separate embedding endpoint/base URL.

---

### Requested Feature

Add support for a dedicated embedding base URL, for example:

```env
EMBEDDING_BASE_URL=http://localhost:11435
```

or provider-specific variants such as:

```env
OLLAMA_EMBEDDING_BASE_URL=
```

---

### Example Use Cases

#### Separate Ollama instances

```env
OLLAMA_BASE_URL=http://ollama-chat:11434
EMBEDDING_BASE_URL=http://ollama-embed:11434
```

#### Dedicated embedding server

```env
OLLAMA_BASE_URL=http://ollama:11434
EMBEDDING_BASE_URL=http://tei:8080
```

---

### Why This Matters

Embedding generation is often:

* GPU intensive
* latency sensitive
* deployed separately in production RAG systems

Separating inference and embedding backends is a common architecture pattern and would improve scalability and deployment flexibility.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Separate Embedding Base URL Configuration #1354

Problem

Requested Feature

Example Use Cases

Separate Ollama instances

Dedicated embedding server

Why This Matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Separate Embedding Base URL Configuration #1354

Description

Problem

Requested Feature

Example Use Cases

Separate Ollama instances

Dedicated embedding server

Why This Matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions