Out-of-the-Box · Open-Source · Universal · Vendor-Neutral
English | Català | Čeština | Dansk | Deutsch | Ελληνικά | English (UK) | Español | Français | Gaeilge | Hrvatski | Magyar | Italiano | 日本語 | 한국어 | മലയാളം | Norsk Bokmål | Nederlands | Polski | Português (BR) | Português (PT) | Română | Русский | Slovenčina | Svenska | 简体中文 | 繁體中文
Inspired by OpenClaw, we believe the future of personal computing will be shaped by diverse, local-first AI agents running at the edge.
ZimaOS Blue is our answer — a fully open-source, auditable, vendor-neutral, and production-ready agent runtime and toolkit that lets you ship private, self-hosted agents with zero friction.
Built for bold developers who want to vibe or handcraft their own agents, Blue is engineered for performance: written in Go, with a memory footprint as low as 19 MB. It runs on any x86,
ZimaOS,
Raspberry Pi,
Windows,
macOS — anywhere you plug in power.
A quick demo of conversation flow and task execution in Blue.
A quick demo of Blue's LLM providers integration experience.
A quick demo covering the overall product overview, channels, and additional configuration.
100% Go, static binary. Cross-compiles to 5 targets out of the box (
linux/amd64, linux/arm64,
darwin/amd64, darwin/arm64,
windows/amd64). No Node runtime, no Python, no containers required. Drop it on a NAS, a
ZimaOS, a
Raspberry Pi, an old x86 router, or a
Mac — it just runs. Then layer on your own UI, logic, and agent skills — one codebase, every platform.
Everyone wants tools that are simple, reliable, and scale when you need them. Tools that just work, so you can focus on what you're actually building.
This isn't a new philosophy. It's the same one that built
ZimaOS: simple, reliable, and built to stay out of your way. Blue is that philosophy, extended to the agent stack.
From deep research that delivers a full HTML report, to OCR, PDF, browser automation, and document conversion, Blue handles complex, real-world workflows without sending your data to the cloud. Voice wake, STT/TTS, Talk Mode, and support for local inference make everyday interactions instant, private, and always available.
Get the native application — no dependencies, no compilation. Built-in trial configuration with onboarding in seconds — start chatting instantly via remote connection, no bot setup required. True out-of-the-box experience.
macOS: Download DMG
Windows: Download Installer
curl -fsSL https://ota.zimaos.com/blue | shirm https://ota.zimaos.com/blue/windows | iexgit clone https://github.com/IceWhaleTech/ZimaOS-Blue.git
cd ZimaOS-Blue
git submodule update --init --recursivesh build.sh.\build.batNote: Windows builds require:
- MinGW-w64 (gcc) and CMake for native C dependencies (espeak-ng, whisper.cpp, opus, kokoro, onnx)
- Windows SDK for system libraries (winmm, etc.)
Make sure
gcc,cmakeare in yourPATH.
Take it further: it delivers native support for 20+ IM platforms, voice‑driven interfaces for natural, context‑aware dialogue, zero‑config model switching with IDE scanning.
⚠️ [!IMPORTANT]If you plan to keep tuning or vibe coding on top of Blue, do not treat a few good-looking chats as release evidence. Any change that affects routing, execution behavior, tool surface, budget control, model selection, or the execution framework should be validated with Blue Harness, not with ad hoc spot checks.
Blue should follow one simple rule here: data first, gates first, cut over last. In practice, that means updating the relevant Harness dataset / eval spec before judging a change, then keeping one stable
candidate_idacross the whole attempt so selector, execution, budget, and readiness reports all describe the same candidate instead of four unrelated runs.
- Run
blue harness selector verify - Run
blue harness execution verify - Reuse the selector eval run for
blue harness budget gate - Finish with
blue harness cutover-readiness
For local iteration, nightly validation, or CI evidence collection, prefer python3 scripts/cutover_candidate_pipeline.py. It runs the full selector -> execution -> budget -> readiness sequence under one shared candidate, which makes the result easier to compare, review, and cut over from.
| Area | What to Watch |
|---|---|
| Baseline stability | Keep the baseline, dataset version, and candidate_id stable, or the comparison will drift and the result will not be trustworthy. |
| Real build output | Rebuild the affected binary or frontend bundle before running Harness, otherwise you may end up validating stale behavior instead of the current change. |
| Route registration | If frontend and backend change together, confirm that all new backend routes are actually registered before judging the feature through UI behavior, because missing registration often looks like a logic bug but is really a 404. |
| Release judgment | A tuning pass is only ready when Harness shows no meaningful regression and cutover-readiness confirms that the candidate is actually ready to cut over. |
In short, tuning on top of Blue is not about "it feels better in a few chats." It is about putting the candidate into Harness, collecting comparable evidence, and letting the gate and readiness results decide whether the change is truly safe to keep.
| Feature | What It Delivers |
|---|---|
| High-Availability Web Retrieval and Browser Runtime | One of Blue's sharpest differentiators. Blue unifies four web access paths for search, read, extract, and crawl; keeps three fallback layers across HTTP, proxy extraction, and browser sessions; handles anti-bot pages with challenge detection, cookie/session reuse, stealth, and browser handoff; and routes across three browser engines: lightpanda, managed Chromium, and relay/local Chrome. |
| Three-in-One Research Runtime | One public research entry can route into deep_research, analyze, and ui_review. The same discovery and evidence stack then produces citation-first research, bounded reports, and structured UI/UX/accessibility reviews. |
| Harness Runtime, Evaluation, and Evolution Framework | Makes evaluation a runtime primitive across development, training, and production. Harness covers regression and smoke checks, scoring, baselines, reports, and runtime validation, then carries the same evidence into skill evolution, follow-up evaluation, promotion or rollback, and AGENTS.md or instruction proposal review. |
| Multimodal Native-Capability-First Runtime | Keeps voice, OCR, PDF, browser tasks, document conversion, structured form filling, media processing, and local media generation on native and local paths first, with model routing only when it is actually needed. |
| Security and Governance | Includes sandbox execution, prompt-injection defense, session auditing, permissions, RBAC, WebAuthn, operational guardrails, and skill security scanning. |
| LLM Wiki and Knowledge Space | Turns memory, research, and runtime outputs into a wiki-like knowledge surface with summary pages, indexes, backlinks, freshness, and archive workflows. |
| Skill Store and Marketplace | Ships built-in skill discovery, curation, sync, and local scanning so extensibility is available from day one. |
| Production-Grade Provider Pool | Provides a real provider pool with health checks, automatic failover, circuit breakers, and provider racing for long-running workloads. |
| Built-In Local Small Model Runtime | Ships a built-in Qwen3.5-0.8B + llama.cpp runtime for local short Q&A, image recognition, tool routing, summarization, context compression, and document preprocessing. |
| Long-Running Reliability | Treats OTA updates, backup and restore, config hot reload, and post-failure recovery as built-in operating concerns. |
| Date | Version | Keywords / Features |
|---|---|---|
| Jan 26, 2026 | v0.1–v0.9 |
Go runtime, plugin system, browser automation |
| Jan 27–28, 2026 | v0.9.0–v0.9.2 |
Browser task view, Blue Companion, Smart Form Filler |
| Jan 29–31, 2026 | v0.10.0–v0.10.9 |
Claude Code CLI, API Proxy, UI restructure |
| Feb 1–3, 2026 | v0.10.1–v0.10.22 |
Metrics, remote access, context cache |
| Feb 5–18, 2026 | v0.10.25–v0.10.29 |
i18n, CC Cache, release pipeline |
| Feb 20–25, 2026 | v0.10.28–v0.10.29 |
Desktop loader, mobile UX, memory redesign |
| Feb 28–Mar 2, 2026 | v0.10.30 |
Deep Research, skill reranker, security scan |
| Mar 9–18, 2026 | v0.10.31 |
Dashboard overhaul, VoiceChat refactor, approved sites |
| Mar 19–22, 2026 | v0.10.32 |
Harness rollout, transcript audit, web search |
| Mar 23–25, 2026 | v0.10.33 |
Harness groups, browser approvals, skill market |
| Mar 29–30, 2026 | v0.10.35 |
Harness v3, browser relay, context compression |
| Mar 31–Apr 1, 2026 | v0.10.36 |
Transcript audit, Harness overlays, tool parsing |
| Apr 1, 2026 | v0.10.37 |
Runtime hardening, Skill+Exec cutover, recovery polish |
| Apr 2–5, 2026 | v0.10.38 |
GitHub support, marketplace refinement, reliability improvements |
| Apr 6–7, 2026 | v0.10.39 |
Research unification, evolution surfaces, memory footprint reduction |
- Issues: Please file bugs and feature requests here
- Discussions: Discord
- Follow us on GitHub
This project is licensed under the MIT License - see the LICENSE file for details. We believe in open source and giving back to the community.
Thanks to all Blue contributors:
- OpenClaw — Local-first open source agent. Pioneered connecting LLMs to local devices through channel adapters and tool calling, directly inspiring Blue's agent runtime architecture. https://github.com/openclaw/openclaw
- MiroMind — Deep research mode with evidence-backed synthesis. Shaped Blue's built-in deep research pipeline: planning, parallel retrieval, evidence deduplication, and HTML report generation. https://www.miromind.ai
- Karpathy's LLM Wiki — LLM as knowledge compiler. Reframes LLMs to build persistent, evolving knowledge spaces, moving beyond RAG's accumulation trap.
- OpenSpace (HKUDS) — Self-evolving skill engine. A DAG-based framework where agents learn from failures and derive specialized skills. https://github.com/HKUDS/OpenSpace
- Andrew Ng's Context Hub — Versioned API documentation registry for coding agents. Addresses agent hallucinations and forgotten session knowledge. Provides curated, versioned docs with annotation and feedback loops, turning documentation into a self-improving knowledge layer. https://github.com/andrewyng/context-hub
- Notion — Simple, human, and intentionally quiet. Inspired by the minimalist ethos of Notion, Blue brings warmth back to the grid. Where refined serif meets thoughtful design, crafting a space that feels like home. https://www.notion.com/about
- Matrix — Visual inspiration from the iconic digital rain aesthetic. The aesthetic direction for Blue's technical diagrams.
- IceWhale — Love, Death & Robots S2E2 "Ice". A collective that gathers worldwide to break through internet giants' walls and resist data concentration. The ice whale symbolizes a community building sovereign tools together at the edge.
- ZimaOS Blue — Love, Death & Robots S1E14 "Zima Blue". A metaphor: intelligence that begins in service and evolves to explore the world. Blue is an agent for wisdom, rooted in simplicity and reaching for depth.
- ZimaOS — Simplified, Focused, Open design principles. Both ZimaOS and Blue share the belief that technology should serve the user — deploy in 30 seconds, run anywhere, stay vendor-neutral. https://www.zimaspace.com/zimaos











