Skip to content

8.6 Self Hosted AI Stack for Developers

av edited this page May 23, 2026 · 3 revisions

Self-Hosted AI Stack for Developers

Harbor is a local AI toolkit for developers who want a self-hosted AI stack that can grow beyond a single chat container. It brings together local inference, browser chat, coding agents, RAG/search, tools, image generation, voice, workflows, benchmarks, and observability through Docker Compose and Harbor service handles.

Use this guide as the high-level map for a private AI stack or open source AI stack built with Harbor. It links to the deeper setup guides where a workflow needs more detail.

Start with the Core Stack

Most Harbor setups start with the default local LLM stack:

harbor up
harbor open

That starts Ollama and Open WebUI, giving you a local model backend and a browser chat frontend. The important design point is that Harbor treats these as the base layer of a larger developer stack, not the final destination.

For the Compose-level model, read Local LLM Stack with Docker Compose. If you are deciding whether to maintain your own Compose files, read Harbor vs Manual Docker Compose for Local AI.

Chat and Local Inference

A self-hosted AI stack needs a reliable path from user interface to model runtime. Harbor gives you several choices:

  • Open WebUI for the default chat UI, model switching, RAG features, and compatible tool use.
  • Ollama for the default model management path.
  • llama.cpp when GGUF serving and direct llama.cpp behavior matter.
  • vLLM when throughput-oriented OpenAI-compatible serving is the goal.
  • Other frontend and backend services from the Services catalog when your workflow needs a different interface or inference engine.

The common integration point is often an OpenAI-compatible local API. Harbor starts the services, applies matching cross-service Compose files, and lets consumers use the backend endpoints that fit their protocol. For backend selection, see OpenAI-Compatible Local LLM Backends.

Coding Agents on Local Models

Developers usually want the same local models available in their editor, terminal, and repository workflows. Harbor Launch connects installed host tools to Harbor backends:

harbor launch --backend ollama --model qwen3.5:4b codex
harbor launch --backend llamacpp --model Qwen3.5-4B opencode

This keeps the coding agent in your current project directory while Harbor supplies the backend URL, model selection, and adapter configuration. It is the path for using local LLMs with Codex, Claude Code, OpenCode, Copilot, mi, and other supported host tools.

For the dedicated walkthrough, read Run Coding Agents with Local LLMs.

RAG, Search, and Research

A private AI stack often needs retrieval and search before it needs more model size. Harbor includes services for local web RAG, document search, and deeper research workflows:

harbor up searxng
harbor up searxng ldr

SearXNG is the default local web search building block and is pre-wired into Open WebUI and several other services. Harbor also includes RAG and research-oriented services such as AnythingLLM, Onyx, Khoj, Kotaemon, Local Deep Research, DeerFlow, Cognee, Docling, and Airweave in the services catalog.

For the default Open WebUI path, use Ollama + Open WebUI + SearXNG Local Web RAG Setup.

Agents and Tools

Tool use turns a chat box into a developer environment. Harbor documents tool connectivity around HTTP-accessible OpenAPI tools, MCP tools, SSE transports, and adapters for stdio-based tools:

  • Harbor Tools explains MCP, OpenAPI, mcpo, MetaMCP, SuperGateway, and MCP Forge.
  • mcpo bridges MCP tools into OpenAPI-compatible clients.
  • MetaMCP provides a web UI for managing MCP tools.
  • MCP Forge provides a gateway and admin UI for larger MCP setups.

This lets a self-hosted stack expose tools to Open WebUI or other compatible clients without assuming every service shares the same process, filesystem, or transport.

Image, Voice, and Multimodal Workflows

Local AI stacks are not only text chat. Harbor includes image and voice services that can sit beside the same LLM backend layer:

harbor up comfyui
harbor up speaches

ComfyUI covers local image generation and visual workflows. Speaches provides OpenAI-compatible speech-to-text and text-to-speech. Harbor also includes services such as Voicebox, openedai-speech, Parler, Lemonade, and SGLang for different audio, image, and multimodal paths.

The practical pattern is to keep the LLM stack stable, then add multimodal services by handle when the workflow needs them.

Workflow Builders and Automation

When prompts become repeatable applications, use workflow services instead of one-off chats. Harbor includes open source AI stack components such as:

Harbor Boost is especially useful when you want a scriptable proxy in front of local models. It can expose OpenAI-compatible, Anthropic-compatible, and Responses-style APIs while routing through custom modules and downstream backends.

Benchmarks, Evaluation, and Observability

A developer stack should make it possible to compare model and backend choices. Harbor includes Harbor Bench, a built-in benchmark service for OpenAI-compatible APIs:

harbor bench run --name local-baseline
harbor bench results

Use benchmarks when choosing models, quantization levels, temperature settings, or backends for a real task. Harbor also includes evaluation and observability-oriented services such as lm-evaluation-harness, LangFuse, Bifrost, Beszel, and K6, depending on whether you need quality evaluation, tracing, gateway metrics, host monitoring, or load testing.

That makes Harbor useful for both experimentation and longer-lived private AI stack operations.

A Practical Developer Stack Shape

A balanced self-hosted AI stack for developers might look like this:

# Chat and local inference
harbor up

# Search and RAG
harbor up searxng

# Host coding agents
harbor launch --backend ollama --model qwen3.5:4b codex

# Optional image, voice, workflow, and proxy layers.
# Bring up optional layers one at a time while sizing hardware and storage.
harbor up comfyui
harbor up speaches
harbor up dify
harbor up boost

You do not need to run everything at once. Start with the layer you are actively testing, then add the next handle after the service is healthy. Image, voice, workflow, and proxy services can add first-run downloads, persistent databases, model caches, or GPU pressure: ComfyUI pulls image/model assets and needs enough VRAM for the workflow, Speaches may use CUDA or CPU depending on host support, and Dify brings its own API, worker, web UI, Postgres, Redis, Weaviate, sandbox, and proxy containers. Harbor's value is that each service keeps a stable handle and can be combined with compatible services through the same CLI, app, configuration profile, and Compose matching rules.

Next Steps

Clone this wiki locally