Skip to content
ankurCES edited this page Jun 7, 2026 · 2 revisions

FAQ

Common questions about blumi, answered against what the code actually does — not aspirations. Two lenses: a decision-maker ("why would we use this?") and a developer ("how does it actually work?"), plus the connectivity and security questions that come up most. Links point to the page with the details.


For decision-makers

What problem does blumi solve that a hosted coding assistant doesn't?

blumi is local-first and provider-agnostic: one Rust binary runs on your machines, talks to your choice of model with your API key (BYOK), and keeps history/config on disk in ~/.blumi (SQLite). There's no blumi-operated cloud in the loop. Practically that buys you three things the codebase is built around:

  1. No vendor lock-in — the same session can run on Anthropic, Google Gemini, Anthropic-on-Azure, or any OpenAI-compatible endpoint (incl. local Ollama), switchable live. See Configuration → Providers.
  2. Use idle hardware you already own — the Grid turns spare Macs/PCs on your LAN into a pool that work can be dispatched/offloaded across.
  3. One agent, every surface — a terminal UI, a web UI, an always-on gateway, and the blugo phone app all mirror the same live session.

What does it cost to run?

The software is free and Apache-2.0 (see below). Your only marginal cost is whatever model provider you point it at — and because it's BYOK, that's billed directly by the provider, not by blumi. Run a local model via Ollama and the marginal cost is $0. There is per-task cost telemetry and a cost-aware router that can send cheap tasks to cheap models (Configuration → router).

Where does our code and data go?

  • Prompts/code go only to the provider you configure — or nowhere off-machine if you use a local model (Ollama / llama.cpp via the OpenAI-compatible client).
  • History, sessions, memory, embeddings live locally in ~/.blumi (SQLite + a local vector store). Embeddings are computed in-process with a bundled model — no embedding API call.
  • There is no telemetry/phone-home to a blumi backend; there isn't one.
  • The one nuance is mobile push — see does my code go to Google? (short answer: only a ~140-char preview, and only if you opt in).

Are we locked into one AI vendor?

No. Providers are pluggable: a native Anthropic client, Anthropic on Azure AI Foundry, native Google Gemini, and a single OpenAI-compatible client that covers OpenAI, OpenRouter, DeepSeek, Ollama, llama.cpp, MiniMax, and any custom base_url. You can configure several at once and switch model/provider from the header picker mid-session.

Can we use it commercially? What's the license?

Yes — it's Apache-2.0 (LICENSE at the repo root), which permits commercial use, modification, and redistribution with attribution and a patent grant.

Do we need new hardware?

No. It runs on macOS and Linux (the phone app is Android). GPU is optional: the bundled embedder auto-uses CoreML on Apple Silicon and can opt into CUDA on NVIDIA Linux (BLUMI_CUDA=1); for GPU-accelerated LLM inference you run Ollama and point blumi at it. Check what's active with blumi accel doctor.

How much ops does it add?

Minimal. One self-contained binary, installed with a curl … | sh one-liner. The always-on gateway installs as a managed OS service (blumi serve install → launchd/systemd) with start/stop/status, and binary updates are atomic (no "text file busy" even while running). See Installation.

Can a team / multiple machines share it?

Yes, with a caveat. The gateway lets multiple clients (TUI, web, several phones) attach to the same sessions, and the Grid federates multiple nodes on a LAN. Auth is a shared bearer token per gateway (blumi serve pair) — i.e. a single trust domain, ideal for one person or a small trusted team, not a multi-tenant SaaS with per-user accounts/RBAC.

How mature is it?

It's a young, fast-moving 0.x project (currently v0.3.0) with CI (clippy/tests/Flutter build) green on main. Treat it as a capable self-hosted tool under active development, not a locked-down enterprise product. Everything in this wiki reflects shipped behavior.


For developers

Which providers and models can I use?

Four client kinds (ProviderKind in crates/blumi-config/src/provider.rs):

  • anthropic — native /v1/messages
  • anthropic_foundry — Claude on Azure AI Foundry
  • gemini — native Google Gemini
  • openai_compat — anything speaking OpenAI /chat/completions (OpenAI, OpenRouter, DeepSeek, Ollama, llama.cpp, MiniMax, NIM, HF, custom base_url)

Keys come from env vars via api_key_env (preferred) so they're never written into config. See Configuration → Providers.

Can I run it fully offline / no cloud?

Yes. Point llm.provider at a local Ollama/llama.cpp endpoint (openai_compat + base_url) and the model runs on your box. Embeddings are already local (bundled, in-process). So the whole loop — agent + memory/RAG — can run with no external API calls.

Where do tools actually run? Is it sandboxed?

You choose, per the executor config (crates/blumi-exec):

  • Local (default) — runs on the host.
  • Docker (feature-gated, via bollard) — runs tools inside a container.
  • SSH — runs them on a remote host.

On top of that, a permission engine gates tools with allow/deny/ask globs (e.g. deny rm -rf*, sudo*; ask on git push*), and PreToolUse hooks can block a call before it runs. See Configuration → permissions.

Is it safe to let it run commands autonomously?

That's exactly what the guardrails are for: the permission engine + interactive approval cards (Allow once / Allow session / Deny) in every UI, a YOLO toggle when you want to drop them, a per-turn iteration cap with auto-continue, an optional local-LLM "brain" that auto-reviews approvals, and Docker/SSH executors to contain blast radius. Nothing destructive runs unattended unless you configure it to.

How do I extend it?

  • Skills — drop a SKILL.md (frontmatter + instructions); they're injected into the system prompt. The agent can even author its own (manage_skill).
  • MCP servers — Model Context Protocol tools; sensible defaults ship and work out of the box (blumi mcp), plus your own in mcp_servers.
  • HooksPreToolUse / UserPromptSubmit lifecycle hooks (Configuration → hooks).
  • Sub-agents / personas — delegate to specialized agents; a "team" persona auto-orchestrates.
  • LSP — language servers give code-intel tools (lsp_servers).

What's the "grid"? Do I need it?

Optional. A single machine is fully functional. The Grid is LAN federation: nodes discover each other via mDNS (_blumi._tcp), and you can dispatch sub-tasks across peers (grid_dispatch), aggregate metrics, and offload GPU work (e.g. embeddings) to a node that has the hardware. Local sub-agents are capped (default 4) with overflow spilling to the grid.

How is the same session live on the TUI, web, and phone?

The UI-agnostic core emits one typed event stream. The gateway (blumi serve) exposes it over REST + SSE (with Last-Event-ID replay). Every face — TUI, web UI, blugo — subscribes to the same stream, so they're 1:1 mirrors, not separate apps.

How does it remember things across sessions?

Layered: SQLite + FTS5 transcript history (searchable via session_search), a dual memory file pair (MEMORY.md / USER.md), semantic long-term memory (a local vector store with RAG + governance), and a code-understanding knowledge index. See Memory & Knowledge.

Won't an autonomous agent burn tokens?

There are explicit controls: per-task cost telemetry, a cost-aware router that downshifts cheap work to cheap models, the per-turn iteration cap, and an auto-continue budget. And again — on Ollama the token cost is zero.

Does blumi use my GPU?

For the bundled embedder: yes, via ONNX execution providers — CoreML on Apple Silicon (automatic), CUDA on NVIDIA Linux (BLUMI_CUDA=1, opt-in). For LLM inference, the big win is to run Ollama (which uses your GPU) and point blumi at it — no rebuild. Run blumi accel doctor to see what's detected and active.

What is "self-healing", realistically?

A learn-from-failures loop (crates/.../heal): a reflex controller reacts to failing steps, failures and their fixes are stored as episodes in memory, and an evolution miner surfaces recurring patterns; /heal shows a summary. It's pragmatic pattern-learning, not magic — see Self-Management.


Connectivity & remote access

Can I use blumi / blugo from outside my LAN?

The gateway speaks HTTP/SSE on whatever host/IP you bind it to (typically a LAN address). To reach it from elsewhere you need network reachability — a VPN (Tailscale/WireGuard), an SSH tunnel, or a reverse proxy you control. blumi deliberately ships no built-in tunnel; it connects by host:port, so any overlay that gives your client a route to the gateway works transparently.

Do I need a Tailscale integration in the app?

No. Because blugo connects by address, if both devices are on Tailscale you just add the gateway by its Tailscale IP or MagicDNS name and it behaves exactly like a LAN connection — no Tailscale API, OAuth, or app integration required. blumi is intentionally "zero Tailscale": it doesn't know or care what gives it the route. (This is the most-asked question — hence this entry.)

Do I need Firebase / FCM? What works without it?

Push is optional and additive. Without any Firebase config, Dispatch and chat still work fully in-app on a reachable network — you just don't get backgrounded push notifications. To enable push, drop a Firebase service account at ~/.blumi/fcm-service-account.json on each gateway (it's a silent no-op otherwise). See Configuration → Push notifications (FCM).

So can I "get messages from anywhere"?

Two planes, don't conflate them:

  • Notifications travel via Google's FCM, so a push arrives anywhere the phone has internet.
  • The message content (sending a dispatch, reading the reply transcript) goes over the gateway's REST/SSE, which needs reachability to the gateway. On the LAN that's automatic; off-LAN you need a VPN/Tailscale. So: notified anywhere; full read/send needs a route to the gateway.

Is the gateway exposed to the internet?

Only if you expose it. It binds to the host/IP you pass (e.g. a LAN IP), and it's token-authed by default. Don't raw-port-forward it; put it behind a VPN/Tailscale or an authenticating reverse proxy.


Security & privacy

How is the gateway protected?

Bearer-token auth established by blumi serve pair; require_auth is on by default, so every client (TUI, web, phone, the push-registration endpoints) must present the token. See Gateway.

Where are my secrets stored? Are any committed?

  • Model API keys are read from environment variables (api_key_env), not stored in config or git.
  • The Firebase service account (a private key) stays at ~/.blumi/fcm-service-account.json (gitignored, chmod 600) and the app's google-services.json is gitignored too. Neither is ever committed.
  • Sessions/memory are local SQLite under ~/.blumi.

Mobile push — does our code go through Google?

No transcript does. The turn-complete push carries only a short preview (a title + a body truncated to ~140 chars) through FCM/Google — the full conversation stays on the gateway and is fetched by the app directly over your network when you open the thread. If even the preview is too much, simply don't install the service account: push stays off and Dispatch works in-app.


Didn't find your question? Check Troubleshooting, the Configuration reference, or open an issue.

Clone this wiki locally