-
-
Notifications
You must be signed in to change notification settings - Fork 207
8.6 Self Hosted AI Stack for Developers
Harbor is a local AI toolkit for developers building a self-hosted AI stack, private AI stack, or open source AI stack around local LLMs. It wires Ollama, Open WebUI, llama.cpp, vLLM, Docker Model Runner, MLX, oMLX, SearXNG, OpenCode, LiteLLM, Harbor Boost, ComfyUI, Dify, n8n, Flowise, LangFlow, and other services through Docker Compose and one CLI/app.
This guide is the broad developer landing page for Harbor's local AI stack. The practical center is OpenCode through harbor launch: run a coding agent in your current repository while Harbor supplies the local backend, model, and adapter configuration.
Harbor's local AI stack layers: frontends, local LLM backends, satellite services, and shared host resources.
A developer-focused self-hosted AI stack usually needs these layers:
- Local inference: Ollama, llama.cpp, vLLM, Docker Model Runner, MLX, oMLX, or another OpenAI-compatible local backend.
- Browser chat and prompt checks: Open WebUI.
- Coding agents on local models: OpenCode, Codex, Claude Code, Copilot, mi, Pi, and VS Code through
harbor launch. - Local web RAG and research: SearXNG, AnythingLLM, Khoj, Onyx, Kotaemon, Cognee, Docling, Airweave, or Local Deep Research.
- Tools and agent connectivity: MCP, OpenAPI, mcpo, MetaMCP, and MCP Forge.
- Routing, proxying, and evaluation: LiteLLM, Bifrost, Harbor Boost, Harbor Bench, LangFuse, Beszel, K6, and lm-evaluation-harness.
- Optional media and app-builder layers: ComfyUI, Speaches, Dify, n8n, Flowise, and LangFlow.
Harbor keeps this as a Docker Compose local AI setup instead of a pile of unrelated containers. Service handles select the services, cross-service Compose files add integrations, and the CLI keeps model/backend choices available to both web UIs and host coding tools.
The default Harbor stack starts Ollama and Open WebUI:
harbor up
harbor openThat gives you the common "Ollama + Open WebUI" base: a local model backend and a browser UI for testing prompts. For developers, this is the shared local LLM stack that can also feed coding agents, RAG services, tool routers, and benchmarks.
For the Docker Compose model behind this setup, read Local LLM Stack with Docker Compose. To compare this with hand-written Compose, read Harbor vs Manual Docker Compose for Local AI.
OpenCode running through Harbor with local backend integration.
For a developer, the most direct use of a private AI stack is a coding agent that runs in the current repository. Harbor Launch does that for installed host tools:
harbor launch --backend ollama --model qwen3.5:4b opencodeThis runs the host opencode CLI from your current project directory. Harbor supplies the backend URL, API key, model selection, and generated OpenCode provider configuration, so OpenCode can use the same local LLM backend as the rest of the stack.
Generate or inspect the OpenCode adapter config without launching the tool:
harbor launch --config opencodeOpenCode also exists as a Harbor container service:
harbor launch --service opencode --help
harbor up opencode --openChoose the mode that fits your preferred OpenCode workflow. The host harbor launch path and the containerized OpenCode service are both valid ways to connect OpenCode to Harbor's local LLM stack.
harbor launch is the bridge between local OpenAI-compatible backends and host coding tools:
# OpenCode with Ollama
harbor launch --backend ollama --model qwen3.5:4b opencode
# Codex with the same local backend
harbor launch --backend ollama --model qwen3.5:4b codex
# OpenCode with llama.cpp
harbor launch --backend llamacpp --model Qwen3.5-4B opencodeThe launch options go before the tool name. Arguments after the tool name are passed through unchanged:
harbor launch --backend ollama --model qwen3.5:4b codex --sandbox workspace-writeThis is the developer workflow behind searches like local LLM for OpenCode, local LLM for Codex, coding agents local models, and run coding agents with Ollama. For the dedicated walkthrough, read Run Coding Agents with Local LLMs.
Open WebUI using SearXNG for local web search in a RAG workflow.
A private AI stack becomes more useful for development when agents can search docs, read URLs, and pull in repository or project context.
# Add local web search
harbor up searxng
# Launch OpenCode through Harbor Boost with web search and URL reading
harbor launch --web --backend ollama --model qwen3.5:4b opencodeSearXNG is the default local web search service. The --web modifier starts SearXNG and Harbor Boost web tools, then routes the launched OpenAI-compatible host tool through a generated workflow model.
For a browser-chat path, use Ollama + Open WebUI + SearXNG Local Web RAG Setup. For repository RAG and document indexing, evaluate services such as AnythingLLM, Khoj, Onyx, Kotaemon, Cognee, Docling, Airweave, and Local Deep Research from the services catalog.
Use the services below as a practical developer menu, not as a recommendation to start everything at once:
| Need | Harbor services |
|---|---|
| Local LLM backend | Ollama, llama.cpp, vLLM, TabbyAPI, mistral.rs, SGLang, LMDeploy, Aphrodite |
| Browser chat | Open WebUI, LibreChat, AnythingLLM |
| Coding agents | OpenCode, Codex, Claude Code, Copilot, mi, Pi, VS Code through harbor launch
|
| Local web RAG | SearXNG, Open WebUI, Local Deep Research |
| Code and document RAG | AnythingLLM, Khoj, Onyx, Kotaemon, Cognee, Docling, Airweave |
| Tool access | MCP, OpenAPI, mcpo, MetaMCP, MCP Forge, SuperGateway |
| Routing and proxying | LiteLLM, Bifrost, Harbor Boost |
| Evaluation and observability | Harbor Bench, LangFuse, Beszel, K6, lm-evaluation-harness |
| Optional media and workflows | ComfyUI, Speaches, Dify, n8n, Flowise, LangFlow |
This keeps the page aligned with the self-hosted AI stack, private AI stack, open source AI stack, and local AI toolkit intent while still making the developer workflow concrete.
A self-hosted LLM stack is easier to reuse when clients can talk to a local OpenAI-compatible API. Harbor supports that pattern across several backends:
- Ollama for convenient local model management.
- llama.cpp for GGUF-focused local serving.
- vLLM for heavier throughput-oriented serving when hardware fits.
- Other backends from the Services catalog when the model format or runtime calls for them.
Use OpenAI-Compatible Local LLM Backends when choosing between Ollama, llama.cpp, vLLM, and other local inference servers.
Harbor Tools integration: MCP, OpenAPI, MetaMCP, and MCP Forge for connecting agents to tools.
Tool access is what turns a local AI stack into a developer environment. Harbor documents MCP and OpenAPI paths for agents and frontends:
- Harbor Tools explains MCP, OpenAPI, mcpo, MetaMCP, SuperGateway, and MCP Forge.
- mcpo bridges MCP tools into OpenAPI-compatible clients.
- MetaMCP provides a web UI for managing MCP tools.
- MCP Forge provides a gateway and admin UI for larger MCP setups.
Add this layer when OpenCode, Open WebUI, or another agent needs controlled access to project tools, internal APIs, documentation services, issue trackers, or other local systems.
Harbor Bench report comparing model performance across benchmarks.
Local models vary a lot by backend, quantization, context length, and tool behavior. Harbor includes Harbor Bench for local LLM benchmark runs against OpenAI-compatible APIs:
harbor bench run --name local-coding-baseline
harbor bench resultsAdd LiteLLM, Bifrost, Harbor Boost, LangFuse, Beszel, K6, or lm-evaluation-harness when you need routing, request shaping, traces, host monitoring, load tests, or quality evaluation.
A balanced developer setup can start small:
# Start the default local AI stack
harbor up
# Run OpenCode in the current repository with a local model
harbor launch --backend ollama --model qwen3.5:4b opencode
# Inspect OpenCode adapter configuration
harbor launch --config opencode
# Add local web search and launch OpenCode with web tools
harbor up searxng
harbor launch --web --backend ollama --model qwen3.5:4b opencode
# Benchmark the local backend path
harbor bench run --name local-coding-baselineDo not start every service just because it exists. Add the next layer only when the workflow needs it: a backend first, OpenCode or another coding agent next, SearXNG for web RAG, MCP/OpenAPI tools for controlled actions, then routing, evaluation, image, voice, or workflow services as needed.
- Build the base layer with Local LLM Stack with Docker Compose.
- Add web search with Ollama + Open WebUI + SearXNG Local Web RAG Setup.
- Run host coding tools with Run Coding Agents with Local LLMs.
- Compare Harbor with hand-written Compose in Harbor vs Manual Docker Compose for Local AI.
- Choose inference engines with OpenAI-Compatible Local LLM Backends.
- Return to the docs Home for the full documentation index.
- Return to Harbor Guides for the full guide index.