Configuration

Everything blumi does is driven by one JSON file — ~/.blumi/settings.json. This page is the complete reference: every section, its fields, its defaults, and copy-pasteable examples. All sections are optional — blumi ships with sensible defaults, so an empty file is valid and a one-line provider key is usually all you need to start.

The annotated JSON blocks below use // comments for clarity. Strip them if your editor is strict — settings.json is plain JSON.

How config is loaded (layering)

blumi merges these in order (later wins), via figment:

Built-in defaults
~/.blumi/settings.json — global, your main file
./.blumi/settings.json — per-project overrides (commit-safe project tweaks; secrets stay global)
Environment variables — prefix BLUMI_, nest with __ (e.g. BLUMI_LLM__MODEL=claude-opus-4-5, BLUMI_GRID__SECRET=…, BLUMI_PROVIDERS__OPENAI__API_KEY=…)

settings.json is written mode 0600 and holds secrets — never commit it (the repo's .blumi/ is gitignored). Per-invocation flags (--provider, --model, --persona, --yolo) beat all of the above.

Fastest path: the login wizard

blumi login

Pick a provider, paste a key (or endpoint), choose a model — it writes the right bits into settings.json. Re-run any time to switch. That's all most people need; the rest of this page is for tuning.

Applying changes

TUI / gateway session: ask the agent to reload_self, or use the web/phone "reload" — it re-reads settings.json without losing the conversation.
Process-level settings (bind host/port, the web password, the grid identity) are read once at startup, so they need a restart: launchctl kickstart -k gui/$(id -u)/com.blumi.serve (macOS) / systemctl --user restart blumi-serve (Linux), or just relaunch blumi tui.

Providers & models

Providers & keys

blumi is provider-agnostic. Built-in presets exist for the common ones (so you usually set only a key), keyed by name in providers:

Preset name	`kind`	Notes
`anthropic`	`anthropic`	Claude — API-key auth only
`openai`	`openai_compat`	OpenAI and any OpenAI-compatible endpoint (set `base_url`)
`gemini`	`gemini`	Google Gemini (native client)
`azure`	`anthropic_foundry`	Azure AI Foundry (Anthropic models)
`local`	`openai_compat`	a local server (llama.cpp / Ollama-compatible) — no key
`mock`	—	deterministic, for tests/demos

Each provider entry:

"providers": {
  "anthropic": { "api_key_env": "ANTHROPIC_API_KEY" },   // read key from env (preferred)
  "openai":    { "api_key": "sk-…", "base_url": "https://api.openai.com/v1" },
  "local":     { "kind": "openai_compat", "base_url": "http://localhost:11434/v1" }
}

api_key_env (read the key from an env var) is preferred over a literal api_key so the key never sits in the file.
base_url points OpenAI-compatible clients at any endpoint (Ollama, llama.cpp, OpenRouter, …).
kind is only needed for a fully custom provider name; the presets above set it for you.

`llm` — the active model + turn limits

"llm": {
  "provider": "anthropic",          // which entry in "providers"
  "model": "claude-sonnet-4-5",     // "" = let the provider pick/probe
  "context_size": 131072,
  "max_output_tokens": 16384,
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_iterations": 25,             // tool steps allowed per turn
  "max_auto_continue": 12,          // self-continue rounds when a turn hits the step cap (0 = off)
  "max_auto_continue_tokens": 400000, // token ceiling for one auto-continue run (0 = no cap)
  "max_local_agents": 4             // max concurrent local sub-agents (overflow → grid or queue)
}

Override per run without editing: blumi --provider openai --model gpt-4o run "…".

Agent behavior

`permissions` — what the agent may do unattended

"permissions": {
  "yolo": false,                    // true = auto-approve EVERYTHING (use only sandboxed)
  "tools": {
    "Bash":      { "deny": ["rm -rf*", "sudo*"], "ask": ["git push*"] },
    "FileWrite": { "allow": ["src/**"] }
  }
}

Per-tool allow / deny / ask pattern lists. Interactive by default (mutating tools surface an approval card); toggle YOLO live with ctrl+y / /yolo / --yolo. For real guardrails, pair with hooks.pre_tool_use.

`persona` + `personas` — agent style

"persona": "default",               // active persona (built-in or custom)
"personas": {
  "reviewer": {
    "description": "Careful code reviewer",
    "instructions": "Be terse. Prioritise correctness + security. Propose diffs.",
    "model": "claude-opus-4-5",     // optional: switch model when active
    "temperature": 0.2              // optional override
  }
}

Built-ins include architect, pair, reviewer, team. Switch with --persona <name> or /persona.

`executor` — where tools run

"executor": {
  "backend": "local",               // "local" (host) | "docker" (sandbox) | "ssh" (remote)
  "docker_image": "debian:stable-slim",
  "ssh_host": "user@box",           // for backend = "ssh"
  "ssh_workdir": "/home/user/proj"
}

Use docker (or ssh to a throwaway box) when you want YOLO/gateway autonomy without risking the host.

`brain` — local-LLM approval reviewer

"brain": {
  "mode": "off",                    // "off" | "advisory" (annotate) | "auto" (decide)
  "provider": "local",              // "" = reuse the main agent's provider
  "model": "qwen2.5:3b"             // a small/cheap/local model is ideal here
}

A cheap model that vets approval prompts so the flagship isn't interrupted — escalates to you on uncertainty or danger.

`router` — cost-aware model routing

"router": {
  "mode": "off",                    // "off" | "heuristic" | "hybrid" | "judge"
  "light": { "provider": "", "model": "claude-haiku-4-5" },
  "heavy": { "provider": "", "model": "claude-opus-4-5" },
  "judge": { "provider": "", "model": "" },   // "" = reuse brain.* then llm.*
  "subagent_tier": "light",         // "light" | "heavy" | "inherit"
  "prefer_grid_light": false,       // run the light tier on a grid peer's local model (free)
  "heuristics": {
    "heavy_chars": 1500, "light_chars": 280, "heavy_tool_count": 12,
    "escalate_iteration": 6, "heavy_keywords": [], "light_keywords": []
  }
}

Per turn, picks light vs flagship to cut cost. Off by default. Empty tier provider/model reuse the active llm.*. See Self-Management → Cost-aware routing.

`heal` — self-healing & evolution

"heal": {
  "enabled": true,
  "recovery_budget": 2,             // recovery attempts per turn
  "verify": false,                  // only mark a fix "verified" if a later step succeeds
  "learn": true,                    // store failure→fix episodes in memory
  "evolve": "auto",                 // "auto" | "propose" | "off" (mine recurring fixes → skills)
  "redact_paths": true
}

Auto-recovers failed tool calls, learns fixes, and mines recurring ones into recovery skills. On by default. See Self-Management → Self-healing.

Memory & knowledge

These three power recall, semantic memory, and code search. All on by default — the bundled local embedder downloads a ~90 MB model on first use, then works fully offline. See Memory & Knowledge.

"embeddings": {
  "enabled": true,
  "backend": "local",               // "local" (bundled ONNX) | "openai" (endpoint) | "grid" (peer)
  "provider": "",                   // for backend = "openai": a name from "providers"
  "model": "bge-small-en-v1.5",
  "dim": 384
},
"memory": {
  "enabled": true,
  "recall_k": 5,                    // memories injected per turn (RAG)
  "dedup_threshold": 0.92,          // admission gate: near-duplicates are merged
  "max_per_namespace": 2000,
  "diffuse": true,                  // share non-`user` learnings across the grid
  "sweep_secs": 60                  // governance cadence (eviction/consolidation)
},
"knowledge": {
  "enabled": true,
  "max_file_kb": 256,               // skip files larger than this when indexing
  "exclude": ["target", "node_modules", ".git", "dist"]
}

Semantic memory accrues automatically (failure→fix episodes, the agent's memory tool); browse it with /memories (TUI) or the web/phone Control Center (where you can pin/edit/delete). The user namespace never leaves your node. Tell the agent "remember that …" to force-store; pin an entry to exempt it from eviction.
Knowledge is empty until you index a repo: blumi knowledge ingest . then blumi knowledge status (or /knowledge in the TUI). Powers code_search / code_retrieve.

`acceleration` — embedder execution provider

"acceleration": {
  "mode": "auto",                   // "auto" | "cpu" | "apple" (CoreML) | "cuda"
  "embeddings_accel": "auto"        // override just the bundled embedder
}

blumi accel doctor shows what was detected. See Memory & Knowledge → GPU acceleration.

Surfaces (web, phone, bots, grid)

`web` — the gateway / web UI auth

"web": { "password_hash": "" }      // set via `blumi serve pair --password <pw>` (argon2; never plaintext)

A password is required when binding to a non-loopback address. The same server serves the browser UI and the blugo phone app. See Gateway.

`voice` — speech-to-text + text-to-speech

"voice": {
  "enabled": false,
  "api_key": "",
  "stt_base_url": "https://api.openai.com/v1",
  "stt_model": "whisper-1",
  "tts_provider": "openai",         // "openai" | "elevenlabs"
  "tts_base_url": "",               // blank = provider default
  "tts_model": "tts-1",
  "tts_voice": "alloy",             // OpenAI voice name, or an ElevenLabs voice id
  "tts_api_key": ""                 // separate TTS key (falls back to api_key)
}

See Voice.

`gateway` — run blumi as a chat bot

"gateway": {
  "yolo": false,                    // auto-approve in bot sessions (only with a sandboxed executor!)
  "telegram": {
    "token": "123456:AA…",          // @BotFather
    "allowed_chats": [],            // [] = anyone who messages it; or [<your-chat-id>]
    "voice": false                  // transcribe voice notes + speak replies (needs voice.* too)
  },
  "discord":  { "token": "", "allowed_channels": [] },
  "slack":    { "bot_token": "xoxb-…", "app_token": "xapp-…" },
  "whatsapp": { "token": "", "phone_number_id": "", "verify_token": "", "webhook_port": 8080 }
}

Run one transport in the foreground (blumi gateway telegram) or all configured ones as a service (blumi gateway install). One bot per token — see Gateway → Messaging-bot gateway.

`grid` — distributed multi-node

"grid": {
  "enabled": false,
  "secret": "",                     // same value on every node = same grid (or BLUMI_GRID__SECRET)
  "grid_id": "",                    // blank = derived from the secret digest
  "node_name": "",                  // blank = hostname
  "peers": ["10.0.0.150:7777"]      // static peers (in addition to mDNS auto-discovery)
}

Several blumi serve nodes sharing a secret form a grid that hands tasks to peers. The secret is never advertised (only a non-sensitive digest). See Grid.

`remote` + `workspaces`

"remote": { "instances": [] },      // remote blumi instances the TUI can attach to as tabs
"workspaces": { "roots": ["~/code"] } // dirs scanned for sibling git repos in the TUI sidebar

`git` — commit identity

"git": { "author_name": "Blumi", "author_email": "you@example.com" }

Stamped on commits the agent makes (GIT_AUTHOR_* / GIT_COMMITTER_*). Empty author_name disables the override.

Autonomy & notifications

`always_on` — proactive discovery

"always_on": {
  "enabled": false,
  "autonomy": "propose",            // "off" | "propose" (add tasks) | "auto" (reserved)
  "cadence_secs": 900,
  "min_interval_secs": 300,
  "skip_if_todos": 1,               // skip while the board already has todos
  "max_open_discoveries": 5,
  "max_per_pass": 3
}

When idle, the gateway runs a read-only discovery pass and proposes tasks + a report. Off by default. See Self-Management → Always-on discovery.

`notify` — completion notifications

"notify": {
  "enabled": false,
  "on": ["loop", "discovery"],      // which completions fire (also "turn"); [] = loop+discovery
  "desktop": true,                  // OS notification on the host
  "bot": { "transport": "telegram", "target": "<chat-id>" },  // proactive bot message
  "web_push": false                 // browser Web Push (VAPID; secure-context only)
}

Pings you when blumi loop / discovery finishes. Off by default. See Self-Management → Completion notifications.

`hooks` — lifecycle hooks

"hooks": {
  "user_prompt_submit": [
    { "command": "git branch --show-current", "timeout_secs": 5 }
  ],
  "pre_tool_use": [
    { "command": "jq -e '.input.command|test(\"rm -rf\")|not' >/dev/null", "matcher": "Bash" }
  ]
}

user_prompt_submit injects each command's stdout as turn context; pre_tool_use can block a tool (non-zero exit). Off by default. See Self-Management → Lifecycle hooks.

Tools & integrations

`mcp_servers` — Model Context Protocol tools

"mcp_servers": {
  "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"],
              "env": { "GITHUB_TOKEN": "ghp_…" }, "enabled": true }
}

External MCP servers launched over stdio; their tools appear to the agent. {workspace} in args is substituted with the project path. See blumi mcp and CLI Usage.

`lsp_servers` — language servers (code intel)

"lsp_servers": {
  "rust": { "command": "rust-analyzer", "args": [], "extensions": ["rs"], "language_id": "rust" }
}

Power the Lsp code-intelligence tool (definitions, references, diagnostics).

Putting it together

A realistic everyday settings.json — Claude for the flagship, a cheap local brain, memory on, notifications to Telegram:

{
  "llm": { "provider": "anthropic", "model": "claude-sonnet-4-5" },
  "providers": {
    "anthropic": { "api_key_env": "ANTHROPIC_API_KEY" },
    "local": { "kind": "openai_compat", "base_url": "http://localhost:11434/v1" }
  },
  "brain": { "mode": "advisory", "provider": "local", "model": "qwen2.5:3b" },
  "router": { "mode": "hybrid", "light": { "model": "claude-haiku-4-5" },
              "heavy": { "model": "claude-opus-4-5" } },
  "memory": { "enabled": true },
  "knowledge": { "enabled": true },
  "git": { "author_name": "Blumi", "author_email": "you@example.com" },
  "notify": { "enabled": true, "bot": { "transport": "telegram", "target": "123456789" } },
  "gateway": { "telegram": { "token": "123456:AA…", "allowed_chats": [123456789] } }
}

Auto-continue (token budget)

When a turn stops only because it hit the per-turn step cap, blumi auto-continues the same session and narrates each step — bounded by both llm.max_auto_continue (default 12) and llm.max_auto_continue_tokens (default 400k), whichever hits first. Tune live with /autocontinue <n> (0 disables).

The memory files (`MEMORY.md` / `USER.md`)

Beyond semantic memory, two markdown files in ~/.blumi/ are read as a frozen snapshot each session: project MEMORY.md (agent notes) and USER.md (about you). View/edit them in the TUI (/memory), the web/phone Control Center, or on disk.

Configuration

Configuration

How config is loaded (layering)

Fastest path: the login wizard

Applying changes

Providers & models

Providers & keys

llm — the active model + turn limits

Agent behavior

permissions — what the agent may do unattended

persona + personas — agent style

executor — where tools run

brain — local-LLM approval reviewer

router — cost-aware model routing

heal — self-healing & evolution

Memory & knowledge

acceleration — embedder execution provider

Surfaces (web, phone, bots, grid)

web — the gateway / web UI auth

voice — speech-to-text + text-to-speech

gateway — run blumi as a chat bot

grid — distributed multi-node

remote + workspaces

git — commit identity

Autonomy & notifications

always_on — proactive discovery

notify — completion notifications

hooks — lifecycle hooks

Tools & integrations

mcp_servers — Model Context Protocol tools

lsp_servers — language servers (code intel)

Putting it together

Auto-continue (token budget)

The memory files (MEMORY.md / USER.md)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blumi wiki

Clone this wiki locally

`llm` — the active model + turn limits

`permissions` — what the agent may do unattended

`persona` + `personas` — agent style

`executor` — where tools run

`brain` — local-LLM approval reviewer

`router` — cost-aware model routing

`heal` — self-healing & evolution

`acceleration` — embedder execution provider

`web` — the gateway / web UI auth

`voice` — speech-to-text + text-to-speech

`gateway` — run blumi as a chat bot

`grid` — distributed multi-node

`remote` + `workspaces`

`git` — commit identity

`always_on` — proactive discovery

`notify` — completion notifications

`hooks` — lifecycle hooks

`mcp_servers` — Model Context Protocol tools

`lsp_servers` — language servers (code intel)

The memory files (`MEMORY.md` / `USER.md`)