Skip to content
ankurCES edited this page Jun 8, 2026 · 2 revisions

FAQ

blumi is a local-first, provider-agnostic AI coding agent: one Rust binary that runs on your own machines (macOS and Linux, with an Android app called blugo), talks to your choice of model with your own API key (BYOK — bring your own key), and keeps history and config on disk in ~/.blumi. This page answers the most common questions about it — covering cost, data residency, vendor lock-in, extensibility, connectivity, and security — against what the code actually does, not aspirations.

Questions are grouped into four lenses: a decision-maker ("why would we use this?"), a developer ("how does it actually work?"), and the connectivity and security questions that come up most. Each answer leads with a direct response; links point to the page with the full details.


For decision-makers

What is blumi (and what is blugo)?

blumi is a local-first, provider-agnostic AI coding agent distributed as a single Rust binary for macOS and Linux. It runs the agent loop, an always-on gateway, a terminal UI, and a web UI on hardware you own. blugo is its companion Android app, which attaches to the same live session over your network. The project is free and Apache-2.0 licensed, and there is no blumi-operated cloud in the loop.

What problem does blumi solve that a hosted coding assistant doesn't?

blumi is local-first and provider-agnostic: one Rust binary runs on your machines, talks to your choice of model with your API key (BYOK), and keeps history/config on disk in ~/.blumi (SQLite). There's no blumi-operated cloud in the loop. Practically that buys you three things the codebase is built around:

  1. No vendor lock-in — the same session can run on Anthropic, Google Gemini, Anthropic-on-Azure, or any OpenAI-compatible endpoint (incl. local Ollama), switchable live. See Configuration → Providers.
  2. Use idle hardware you already own — the Grid turns spare Macs/PCs on your LAN into a pool that work can be dispatched/offloaded across.
  3. One agent, every surface — a terminal UI, a web UI, an always-on gateway, and the blugo phone app all mirror the same live session.

What does it cost to run?

The software is free and Apache-2.0 (see below). Your only marginal cost is whatever model provider you point it at — and because it's BYOK, that's billed directly by the provider, not by blumi. Run a local model via Ollama and the marginal cost is $0. There is per-task cost telemetry and a cost-aware router that can send cheap tasks to cheap models (Configuration → router).

Where does our code and data go?

Your code and data stay on your machines except for the model prompts you send to your chosen provider — and even those go nowhere off-machine if you run a local model. Concretely:

  • Prompts/code go only to the provider you configure — or nowhere off-machine if you use a local model (Ollama / llama.cpp via the OpenAI-compatible client).
  • History, sessions, memory, embeddings live locally in ~/.blumi (SQLite + a local vector store). Embeddings are computed in-process with a bundled model — no embedding API call.
  • There is no telemetry/phone-home to a blumi backend; there isn't one.
  • The one nuance is mobile push — see does my code go to Google? (short answer: only a ~140-char preview, and only if you opt in).

Are we locked into one AI vendor?

No. Providers are pluggable: a native Anthropic client, Anthropic on Azure AI Foundry, native Google Gemini, and a single OpenAI-compatible client that covers OpenAI, OpenRouter, DeepSeek, Ollama, llama.cpp, MiniMax, and any custom base_url. You can configure several at once and switch model/provider from the header picker mid-session.

Can we use it commercially? What's the license?

Yes — it's Apache-2.0 (LICENSE at the repo root), which permits commercial use, modification, and redistribution with attribution and a patent grant.

Do we need new hardware?

No. It runs on macOS and Linux (the phone app is Android). GPU is optional: the bundled embedder auto-uses CoreML on Apple Silicon and can opt into CUDA on NVIDIA Linux (BLUMI_CUDA=1); for GPU-accelerated LLM inference you run Ollama and point blumi at it. Check what's active with blumi accel doctor.

How much ops does it add?

Minimal. One self-contained binary, installed with a curl … | sh one-liner. The always-on gateway installs as a managed OS service (blumi serve install → launchd/systemd) with start/stop/status, and binary updates are atomic (no "text file busy" even while running). See Installation.

Can a team / multiple machines share it?

Yes, with a caveat. The gateway lets multiple clients (TUI, web, several phones) attach to the same sessions, and the Grid federates multiple nodes on a LAN. Auth is a shared bearer token per gateway (blumi serve pair) — i.e. a single trust domain, ideal for one person or a small trusted team, not a multi-tenant SaaS with per-user accounts/RBAC.

How mature is it?

It's a young, fast-moving 0.x project (currently v0.5.0) with CI (clippy/tests/Flutter build) green on main. Treat it as a capable self-hosted tool under active development, not a locked-down enterprise product. Everything in this wiki reflects shipped behavior.


For developers

Which providers and models can I use?

Four client kinds (ProviderKind in crates/blumi-config/src/provider.rs):

  • anthropic — native /v1/messages
  • anthropic_foundry — Claude on Azure AI Foundry
  • gemini — native Google Gemini
  • openai_compat — anything speaking OpenAI /chat/completions (OpenAI, OpenRouter, DeepSeek, Ollama, llama.cpp, MiniMax, NIM, HF, custom base_url)

Keys come from env vars via api_key_env (preferred) so they're never written into config. See Configuration → Providers.

Can I run it fully offline / no cloud?

Yes. Point llm.provider at a local Ollama/llama.cpp endpoint (openai_compat + base_url) and the model runs on your box. Embeddings are already local (bundled, in-process). So the whole loop — agent + memory/RAG (retrieval-augmented generation) — can run with no external API calls.

Where do tools actually run? Is it sandboxed?

You choose, per the executor config (crates/blumi-exec):

  • Local (default) — runs on the host.
  • Docker (feature-gated, via bollard) — runs tools inside a container.
  • SSH — runs them on a remote host.

On top of that, a permission engine gates tools with allow/deny/ask globs (e.g. deny rm -rf*, sudo*; ask on git push*), and PreToolUse hooks can block a call before it runs. See Configuration → permissions.

Is it safe to let it run commands autonomously?

That's exactly what the guardrails are for: the permission engine + interactive approval cards (Allow once / Allow session / Deny) in every UI, a YOLO toggle when you want to drop them, a per-turn iteration cap with auto-continue, an optional local-LLM "brain" that auto-reviews approvals, and Docker/SSH executors to contain blast radius. Nothing destructive runs unattended unless you configure it to.

How do I extend it?

  • Skills — drop a SKILL.md (frontmatter + instructions); they're injected into the system prompt. The agent can even author its own (manage_skill).
  • MCP servers — Model Context Protocol tools; sensible defaults ship and work out of the box (blumi mcp), plus your own in mcp_servers.
  • HooksPreToolUse / UserPromptSubmit lifecycle hooks (Configuration → hooks).
  • Sub-agents / personas — delegate to specialized agents; a "team" persona auto-orchestrates.
  • LSP — language servers give code-intel tools (lsp_servers).

What's the "grid"? Do I need it?

Optional. A single machine is fully functional. The Grid is LAN federation: nodes discover each other via mDNS (_blumi._tcp), and you can dispatch sub-tasks across peers (grid_dispatch), aggregate metrics, and offload GPU work (e.g. embeddings) to a node that has the hardware. Local sub-agents are capped (default 4) with overflow spilling to the grid.

How is the same session live on the TUI, web, and phone?

The UI-agnostic core emits one typed event stream. The gateway (blumi serve) exposes it over REST + SSE (with Last-Event-ID replay). Every face — TUI, web UI, blugo — subscribes to the same stream, so they're 1:1 mirrors, not separate apps.

How does it remember things across sessions?

Layered: SQLite + FTS5 transcript history (searchable via session_search), a dual memory file pair (MEMORY.md / USER.md), semantic long-term memory (a local vector store with RAG + governance), and a code-understanding knowledge index. See Memory & Knowledge.

Won't an autonomous agent burn tokens?

There are explicit controls: per-task cost telemetry, a cost-aware router that downshifts cheap work to cheap models, the per-turn iteration cap, and an auto-continue budget. And again — on Ollama the token cost is zero.

Does blumi use my GPU?

For the bundled embedder: yes, via ONNX execution providers — CoreML on Apple Silicon (automatic), CUDA on NVIDIA Linux (BLUMI_CUDA=1, opt-in). For LLM inference, the big win is to run Ollama (which uses your GPU) and point blumi at it — no rebuild. Run blumi accel doctor to see what's detected and active.

What is "self-healing", realistically?

A learn-from-failures loop (crates/.../heal): a reflex controller reacts to failing steps, failures and their fixes are stored as episodes in memory, and an evolution miner surfaces recurring patterns; /heal shows a summary. It's pragmatic pattern-learning, not magic — see Self-Management.

Can the agent modify itself?

Yes, within guardrails. blumi exposes agent tools to edit its own validated config (self_config), author its own skills (manage_skill), and reload in place without losing the conversation (reload_self). Config writes are validated and atomic, skill names are slug-jailed, and mutating actions still surface an approval card outside YOLO mode. See Self-Management.

Can I make the agent double-check risky actions before it runs them?

Yes — enable RPL (Raskolnikov's Psychological Loop), an opt-in adversarial review that scores a plan's blast radius and puts it before an adversarial "Porfiry" judge before execution. It's off by default (rpl.enabled) and meant for high-stakes or unattended runs. See RPL-Judgement.


Connectivity & remote access

Can I use blumi / blugo from outside my LAN?

The gateway speaks HTTP/SSE on whatever host/IP you bind it to (typically a LAN address). To reach it from elsewhere you need network reachability — a VPN (Tailscale/WireGuard), an SSH tunnel, or a reverse proxy you control. blumi deliberately ships no built-in tunnel; it connects by host:port, so any overlay that gives your client a route to the gateway works transparently.

Do I need a Tailscale integration in the app?

No. Because blugo connects by address, if both devices are on Tailscale you just add the gateway by its Tailscale IP or MagicDNS name and it behaves exactly like a LAN connection — no Tailscale API, OAuth, or app integration required. blumi is intentionally "zero Tailscale": it doesn't know or care what gives it the route. (This is the most-asked question — hence this entry.)

Do I need Firebase / FCM? What works without it?

Push is optional and additive. Without any Firebase config, Dispatch and chat still work fully in-app on a reachable network — you just don't get backgrounded push notifications. To enable push, drop a Firebase service account at ~/.blumi/fcm-service-account.json on each gateway (it's a silent no-op otherwise). See Configuration → Push notifications (FCM).

So can I "get messages from anywhere"?

Partly: notifications reach you anywhere, but reading or sending the actual message content needs a network route to your gateway. There are two planes — don't conflate them:

  • Notifications travel via Google's FCM (Firebase Cloud Messaging), so a push arrives anywhere the phone has internet.
  • The message content (sending a dispatch, reading the reply transcript) goes over the gateway's REST/SSE, which needs reachability to the gateway. On the LAN that's automatic; off-LAN you need a VPN/Tailscale. So: notified anywhere; full read/send needs a route to the gateway.

Is the gateway exposed to the internet?

Only if you expose it. It binds to the host/IP you pass (e.g. a LAN IP), and it's token-authed by default. Don't raw-port-forward it; put it behind a VPN/Tailscale or an authenticating reverse proxy.


Security & privacy

How is the gateway protected?

Bearer-token auth established by blumi serve pair; require_auth is on by default, so every client (TUI, web, phone, the push-registration endpoints) must present the token. See Gateway.

Where are my secrets stored? Are any committed?

  • Model API keys are read from environment variables (api_key_env), not stored in config or git.
  • The Firebase service account (a private key) stays at ~/.blumi/fcm-service-account.json (gitignored, chmod 600) and the app's google-services.json is gitignored too. Neither is ever committed.
  • Sessions/memory are local SQLite under ~/.blumi.

Mobile push — does our code go through Google?

No transcript does. The turn-complete push carries only a short preview (a title + a body truncated to ~140 chars) through FCM/Google — the full conversation stays on the gateway and is fetched by the app directly over your network when you open the thread. If even the preview is too much, simply don't install the service account: push stays off and Dispatch works in-app.


Didn't find your question? Check Troubleshooting, the Configuration reference, or open an issue.

Clone this wiki locally