-
-
Notifications
You must be signed in to change notification settings - Fork 207
8.6 Self Hosted AI Stack for Developers
Harbor is a local AI toolkit for developers who want a self-hosted AI stack that can grow beyond a single chat container. It brings together local inference, browser chat, coding agents, RAG/search, tools, image generation, voice, workflows, benchmarks, and observability through Docker Compose and Harbor service handles.
Use this guide as the high-level map for a private AI stack or open source AI stack built with Harbor. It links to the deeper setup guides where a workflow needs more detail.
Most Harbor setups start with the default local LLM stack:
harbor up
harbor openThat starts Ollama and Open WebUI, giving you a local model backend and a browser chat frontend. The important design point is that Harbor treats these as the base layer of a larger developer stack, not the final destination.
For the Compose-level model, read Local LLM Stack with Docker Compose. If you are deciding whether to maintain your own Compose files, read Harbor vs Manual Docker Compose for Local AI.
A self-hosted AI stack needs a reliable path from user interface to model runtime. Harbor gives you several choices:
- Open WebUI for the default chat UI, model switching, RAG features, and compatible tool use.
- Ollama for the default model management path.
- llama.cpp when GGUF serving and direct llama.cpp behavior matter.
- vLLM when throughput-oriented OpenAI-compatible serving is the goal.
- Other frontend and backend services from the Services catalog when your workflow needs a different interface or inference engine.
The common integration point is often an OpenAI-compatible local API. Harbor starts the services, applies matching cross-service Compose files, and lets consumers use the backend endpoints that fit their protocol. For backend selection, see OpenAI-Compatible Local LLM Backends.
Developers usually want the same local models available in their editor, terminal, and repository workflows. Harbor Launch connects installed host tools to Harbor backends:
harbor launch --backend ollama --model qwen3.5:4b codex
harbor launch --backend llamacpp --model Qwen3.5-4B opencodeThis keeps the coding agent in your current project directory while Harbor supplies the backend URL, model selection, and adapter configuration. It is the path for using local LLMs with Codex, Claude Code, OpenCode, Copilot, mi, and other supported host tools.
For the dedicated walkthrough, read Run Coding Agents with Local LLMs.
A private AI stack often needs retrieval and search before it needs more model size. Harbor includes services for local web RAG, document search, and deeper research workflows:
harbor up searxng
harbor up searxng ldrSearXNG is the default local web search building block and is pre-wired into Open WebUI and several other services. Harbor also includes RAG and research-oriented services such as AnythingLLM, Onyx, Khoj, Kotaemon, Local Deep Research, DeerFlow, Cognee, Docling, and Airweave in the services catalog.
For the default Open WebUI path, use Ollama + Open WebUI + SearXNG Local Web RAG Setup.
Tool use turns a chat box into a developer environment. Harbor documents tool connectivity around HTTP-accessible OpenAPI tools, MCP tools, SSE transports, and adapters for stdio-based tools:
- Harbor Tools explains MCP, OpenAPI, mcpo, MetaMCP, SuperGateway, and MCP Forge.
- mcpo bridges MCP tools into OpenAPI-compatible clients.
- MetaMCP provides a web UI for managing MCP tools.
- MCP Forge provides a gateway and admin UI for larger MCP setups.
This lets a self-hosted stack expose tools to Open WebUI or other compatible clients without assuming every service shares the same process, filesystem, or transport.
Local AI stacks are not only text chat. Harbor includes image and voice services that can sit beside the same LLM backend layer:
harbor up comfyui
harbor up speachesComfyUI covers local image generation and visual workflows. Speaches provides OpenAI-compatible speech-to-text and text-to-speech. Harbor also includes services such as Voicebox, openedai-speech, Parler, Lemonade, and SGLang for different audio, image, and multimodal paths.
The practical pattern is to keep the LLM stack stable, then add multimodal services by handle when the workflow needs them.
When prompts become repeatable applications, use workflow services instead of one-off chats. Harbor includes open source AI stack components such as:
- Dify for LLM app development.
- n8n and Activepieces for automation.
- Flowise and LangFlow for visual LLM and RAG flows.
- LiteLLM, Bifrost, and Harbor Boost for proxying, routing, or modifying LLM requests.
Harbor Boost is especially useful when you want a scriptable proxy in front of local models. It can expose OpenAI-compatible, Anthropic-compatible, and Responses-style APIs while routing through custom modules and downstream backends.
A developer stack should make it possible to compare model and backend choices. Harbor includes Harbor Bench, a built-in benchmark service for OpenAI-compatible APIs:
harbor bench run --name local-baseline
harbor bench resultsUse benchmarks when choosing models, quantization levels, temperature settings, or backends for a real task. Harbor also includes evaluation and observability-oriented services such as lm-evaluation-harness, LangFuse, Bifrost, Beszel, and K6, depending on whether you need quality evaluation, tracing, gateway metrics, host monitoring, or load testing.
That makes Harbor useful for both experimentation and longer-lived private AI stack operations.
A balanced self-hosted AI stack for developers might look like this:
# Chat and local inference
harbor up
# Search and RAG
harbor up searxng
# Host coding agents
harbor launch --backend ollama --model qwen3.5:4b codex
# Optional image, voice, workflow, and proxy layers.
# Bring up optional layers one at a time while sizing hardware and storage.
harbor up comfyui
harbor up speaches
harbor up dify
harbor up boostYou do not need to run everything at once. Start with the layer you are actively testing, then add the next handle after the service is healthy. Image, voice, workflow, and proxy services can add first-run downloads, persistent databases, model caches, or GPU pressure: ComfyUI pulls image/model assets and needs enough VRAM for the workflow, Speaches may use CUDA or CPU depending on host support, and Dify brings its own API, worker, web UI, Postgres, Redis, Weaviate, sandbox, and proxy containers. Harbor's value is that each service keeps a stable handle and can be combined with compatible services through the same CLI, app, configuration profile, and Compose matching rules.
- Build the base layer with Local LLM Stack with Docker Compose.
- Add web search with Ollama + Open WebUI + SearXNG Local Web RAG Setup.
- Run host coding tools with Run Coding Agents with Local LLMs.
- Choose inference engines with OpenAI-Compatible Local LLM Backends.
- Compare Harbor with hand-written Compose in Harbor vs Manual Docker Compose for Local AI.
- Return to the docs Home for the full documentation index.
- Return to Harbor Guides for the full guide index.