Skip to content

AtomicBot-ai/atomic-agent

Repository files navigation

atomic-agent

An OpenClaw/Hermes-style local operator agent for llama.cpp.

atomic-agent terminal demo

Release Latest release Version License Node.js TypeScript Local first Private by default No per-token fees llama.cpp Tauri sidecar

atomic-agent is a local agent that can operate a real desktop: browser, files, shell, documents, notes, memory, scheduled work, approvals, and traces. Think of it in the same product category as OpenClaw Operator and Hermes Agent, but shipped as a standalone SEA binary and tuned for local models so it can squeeze the most out of them instead of relying on a hosted control plane. Your data, traces, browser profile, memory, and model traffic stay on your machine by default.

Active development: APIs, commands, configuration, and behavior may change while the runtime is still moving quickly. Pin a release if you need a stable integration point.

Why This Exists

OpenClaw, Hermes, and OpenCUA showed the shape: an agent should act like an operator, not just answer like a chatbot. It needs to read the browser, call tools, follow task playbooks, ask for approval before dangerous actions, and keep enough state to finish multi-step work.

Local models are good enough to operate software, but only if the runtime stops asking them to be a cloud-scale planner with an infinite context window.

The usual failure mode is predictable: every step stuffs more state into the prompt, JSON tool calls drift, browser pages become token soup, and a 7B model spends more time re-reading history than doing work.

atomic-agent goes the other way:

  • Keep durable state outside the prompt.
  • Keep the cache-hot prompt prefix byte-stable.
  • Force tool calls through a GBNF grammar.
  • Give the model compact browser and filesystem views.
  • Load detailed procedures only when a skill is actually needed.
  • Let the runtime execute independent read-only calls in parallel.

The result is an operator-agent loop that is small-model friendly, inspectable, and shippable.

Lineage

atomic-agent is informed by the same family of systems:

  • OpenClaw-style desktop operation: system browser control, compact terminal UI, persistent local profile, and a local-first product surface.
  • Hermes-style tool discipline: structured tool calls, OpenAI-compatible HTTP shapes where useful, and multi-call batches for independent work.
  • OpenCUA-style browser state: compact accessibility/ARIA snapshots instead of vision-heavy page screenshots for ordinary web operation.
  • Local-first constraints: keep the model, browser, files, traces, and long-lived state on the user's machine.

It is not a fork of those projects and does not claim wire compatibility with their full runtimes. The goal is the same operator-agent class, tuned for llama.cpp, TypeScript, Tauri sidecars, and shippable local products.

Built For Local Inference

atomic-agent is engineered around llama.cpp rather than treating it as a drop-in clone of a hosted API.

  • Stable prompt prefix: persona, rules, skill catalog, tool catalog, capabilities, and instructions stay byte-stable inside a session so cache_prompt and slot_id can reuse KV-cache instead of rebuilding the same context every step.
  • Externalized state: sessions, browser world snapshots, loaded skills, memory notes, task records, and traces live in SQLite or local files. The model receives a bounded slice, not the whole project history.
  • Grammar-constrained calls: every inference emits a JSON array of 1..N tool calls. Even a solo action is [{...}], which avoids the first-token bias that makes smaller models fall into the wrong shape.
  • Parallel tool batches: independent read-only calls can be emitted in the same inference and executed concurrently by resource class. Approval-gated tools and terminal replies stay solo.
  • Bounded prompt tail: conversation, memory, skills, and world state are rendered under caps. Older turns are folded instead of letting the context grow forever.
  • Compact browser state: browser automation uses ARIA snapshots from the installed system browser, which are far cheaper and more stable for local models than screenshots.
  • Narrow retries: transport and parser retries are bounded and never replay already-executed tool calls.

This is runtime architecture, not prompt superstition.

What It Can Operate

atomic-agent gives a local model a practical operator surface:

  • Browser: navigate, click, type, search, inspect tabs, and read compact ARIA snapshots through playwright-core against Chrome, Edge, or another Chromium-family executable.
  • Host OS: shell, filesystem reads/writes/patches, glob, grep, document extraction, archives, git inspection, process listing, clipboard, windows, notifications, and HTTP requests.
  • Documents: extract text from PDF, DOCX, DOC, XLSX, RTF, ODT, PPTX, archives, and plain text without sending files to a remote service.
  • Skills: local Markdown playbooks plus optional approved scripts. The stable prefix only lists skill names and descriptions; the full body is loaded with skill.view when the task matches.
  • Memory: profile facts, FTS5 note recall, pointer-style memory index, and async end-of-turn reflection that writes useful facts without blocking the user-visible reply.
  • Tasks: durable deferred turns, cron or interval schedules, webhook-triggered work, and agent self-scheduling.
  • Vision: optional vision.describe tool for multimodal llama.cpp models with an mmproj projector, kept outside the text conversation transcript.

Dangerous actions go through approvals. Read-heavy inspection stays low-friction.

Privacy And Cost

Most cloud agents have the same hidden tax: your prompts, files, browser context, tool outputs, and usage patterns must pass through somebody else's infrastructure. Even when the product is well-run, that is still remote telemetry, remote retention policy, and a bill that scales with tokens.

atomic-agent keeps the control plane local:

  • Private by default: model calls go to your configured llama-server; sessions, memory, tasks, skills, browser profile, and traces live under <stateDir> on your machine.
  • No SaaS meter: once you have local hardware and model files, the runtime does not charge per prompt, per token, per tool call, or per seat.
  • Inspectable artifacts: traces are local NDJSON, state is local SQLite, and skills are local folders you can read and edit.
  • Explicit egress: network access happens only through configured model endpoints, browser navigation, HTTP tools, webhooks, or user-installed skills. Dangerous actions still pass through approvals.

Local-first does not magically remove every privacy risk: if you point the runtime at a remote llama-server, browse to a website, call an API, or put secrets in skill scripts, those systems still see what you send. The difference is that atomic-agent does not require a hosted agent provider to sit in the middle of every step.

Quick Start

Install From Release

curl -fsSL https://raw.githubusercontent.com/AtomicBot-ai/atomic-agent/main/scripts/install.sh | sh

The installer downloads the matching archive, verifies the checksum, and installs the CLI plus runtime assets such as grammars/, vendor/rg, and native prebuilds.

Optional overrides:

ATOMIC_AGENT_VERSION=v0.1.3       # pin a release
ATOMIC_AGENT_INSTALL_DIR=/opt/bin # choose install directory
ATOMIC_AGENT_NO_PATH=1            # do not edit shell rc files
ATOMIC_AGENT_REPO=owner/repo      # install from a fork

Use Managed Local Models

If you want atomic-agent to manage the local llama.cpp backend and GGUF model files:

atomic-agent models update
atomic-agent models list
atomic-agent models pull qwen-3.5-4b
atomic-agent models use qwen-3.5-4b
atomic-agent models start

atomic-agent tui --cwd /path/to/work

The managed path handles backend download/update, model download/remove, active model selection, and detached llama-server lifecycle. The current catalog focuses on Qwen and Gemma families.

Use Your Own llama-server

If you already run llama.cpp, point the runtime at it:

export ATOMIC_AGENT_LLAMA_URL=http://127.0.0.1:8080

./llama-server -m Qwen2.5-9B-Instruct-Q4_K_M.gguf \
  --slots 4 \
  --parallel 4 \
  --port 8080 \
  --cache-reuse 256

atomic-agent tui --cwd /path/to/work

Configuration lives in <stateDir>/config.json and can be inspected or replaced with:

atomic-agent config get
atomic-agent config set '<full-json>'

The Runtime Loop

This is an agent, not a helper library. One user message becomes one macro-turn:

user message
  -> build compact prompt
  -> llama-server completion with GBNF grammar
  -> parse JSON tool-call array
  -> execute tool calls by resource class
  -> compress results into session state
  -> repeat until reply, finish, cancel, or max steps

There is no hidden planner loop inside a single inference. The runtime owns the loop, the model chooses the next tool call, and every effect is recorded as conversation turns plus trace events.

The prompt itself has two zones:

  • Stable prefix: system persona, rules, skill catalog, tool catalog, capabilities, and tool-call instructions. This is the cache target.
  • Variable tail: loaded skills, profile facts, memory pointers, recalled notes, world snapshot, conversation, notices, and the final response anchor. This is rebuilt every step and kept bounded.

See PROMPT.md for the full anatomy.

Product Surfaces

TUI And CLI

atomic-agent run --cwd /path/to/work
atomic-agent tui --cwd /path/to/work

atomic-agent skill list
atomic-agent skill install ./my-skill

atomic-agent task list
atomic-agent task create --message "hourly triage" --cron "0 * * * *"

atomic-agent trace list --limit 10
atomic-agent trace show <sessionId>

Use run for a simple terminal chat loop. Use tui for approvals, debug panes, local model management, skills, tasks, and long-lived operator sessions.

OpenAI-Compatible HTTP

atomic-agent serve \
  --host 127.0.0.1 \
  --port 8787 \
  --cwd /path/to/work \
  --api-key "$ATOMIC_AGENT_API_KEY"

The main chat surface is POST /v1/chat/completions. One request maps to one full macro-turn: user -> 0..N tool steps -> reply. Atomic-specific routes expose capabilities, config, sessions, approvals, tasks, webhooks, events, and traces.

Telegram Remote Control

Optional. When enabled in <stateDir>/config.json (or via the TUI Telegram tab), atomic-agent connects your own bot to the same single-user runtime — Telegram is just another way to talk to the same sessions, share the same approvals, and observe the same world.

// <stateDir>/config.json
{
  "telegram": { "enabled": true, "ownerUserId": null }
}
# <stateDir>/.env  (file is created with mode 0600 when written from the TUI)
TELEGRAM_BOT_TOKEN=123456789:AA-your-bot-token

Workflow:

  1. Create a bot with @BotFather and copy the token.
  2. Open atomic-agent tui, switch to the Telegram tab.
  3. Press t to paste the token (input is masked, never echoed back, written only into <stateDir>/.env).
  4. Press e to enable; the channel boots and reports up in the tab header.
  5. Press o to open a 60-second pairing window, then send any text from your phone — the first private DM claims ownerUserId. From that point only the owner can drive the bot.
  6. From your phone: send a normal message to drive a turn, /cancel to abort the running turn, /new to rotate the Telegram session, /help for the full command list.

Approvals come back as inline-keyboard messages (Approve / Deny) directly in your DM, with an 8-minute auto-deny timeout. Slash commands (/telegram enable|disable|restart|pair|token|clear-token|clear-owner) are also available from the TUI chat. The Telegram session is persisted separately from the TUI session, so the two never collide.

Telegram is intentionally single-user; multi-user flows are out of scope. See AGENTS.md §"Telegram remote-control channel" for the full architecture, polling carve-out, and locked invariants.

Tauri Sidecar

The sidecar speaks newline-delimited JSON over stdio, which is easy to embed, tail, and debug from a desktop app.

{"kind":"request","id":"r-1","type":"start_session","payload":{"workingDir":"/home/me"}}
{"kind":"request","id":"r-2","type":"send_message","payload":{"sessionId":"s-1","text":"Check the inbox and summarize urgent mail."}}

Events stream back as the turn runs:

{"kind":"event","id":"e-1","type":"turn_started","correlationId":"r-2","payload":{"sessionId":"s-1","turnIndex":0}}
{"kind":"event","id":"e-2","type":"tool_call_result","correlationId":"r-2","payload":{"sessionId":"s-1","stepIndex":0,"tool":"browser.read_aria","status":"ok","summary":"url: https://mail.google.com/ ..."}}
{"kind":"event","id":"e-3","type":"assistant_reply","correlationId":"r-2","payload":{"sessionId":"s-1","text":"You have 3 urgent threads."}}

Safety And Observability

Local does not mean opaque. The runtime is built to be inspected.

  • Approval gate: shell, filesystem writes, patches, archive extraction, process kill, HTTP requests, and skill scripts are gated according to policy.
  • Append-only traces: run, tui, and serve can emit NDJSON traces with prompts, completions, tool invocations, outcomes, and failure categories.
  • Prompt drift replay: atomic-agent trace replay <sessionId> compares current stable-prefix hashes against recorded traces to diagnose lost KV-cache wins.
  • Failure taxonomy: transport, grammar, model, tool, and cancellation failures are classified and propagated through events, traces, metrics, TUI, sidecar, and HTTP.
  • Local state: sessions, memory, tasks, skills, browser profile, and traces live under <stateDir> by default.

Treat traces and <stateDir>/.env as sensitive local artifacts. Secret redaction and per-tool environment filtering are not implemented yet.

Requirements

  • Node.js for development. The release bundle is a Node SEA binary.
  • A reachable llama-server, either managed by atomic-agent models or launched externally.
  • Google Chrome, Microsoft Edge, or another configured Chromium-family executable. Browser binaries are not bundled.
  • macOS users may need Accessibility, Screen Recording, Automation, or Reminders permissions depending on the workflow.
  • Linux window-control workflows work best with wmctrl.
  • External git is expected for git tools. The release bundle ships a pinned ripgrep for os.fs.grep.

Configuration And Secrets

User-facing configuration lives in:

<stateDir>/config.json

Useful environment variables:

  • ATOMIC_AGENT_STATE_DIR: state, config, skills, browser profile, memory, tasks, and traces. Default: ~/.atomic-agent.
  • ATOMIC_AGENT_LLAMA_URL: external llama-server URL.
  • ATOMIC_AGENT_LLAMA_API_KEY: optional bearer token for llama-server.
  • ATOMIC_AGENT_LLAMA_MAX_TOKENS: completion cap.
  • ATOMIC_AGENT_BROWSER_CHANNEL: chrome, msedge, or chromium.
  • ATOMIC_AGENT_BROWSER_EXECUTABLE_PATH: explicit Chromium-family executable path.
  • ATOMIC_AGENT_BROWSER_CDP_URL: attach to an already-running browser via CDP.

Secrets for skills belong in <stateDir>/.env, not in config.json:

NOTION_API_KEY=ntn_xxxxxxxx
GITHUB_TOKEN=ghp_xxxxxxxx
OBSIDIAN_VAULT_PATH=/Users/me/Documents/Obsidian Vault

The dotenv parser is intentionally small: one KEY=VALUE per line, optional surrounding quotes, no interpolation, no export, no multiline values. Shell-exported variables win.

Shipping Model

atomic-agent is designed to ship as a compact local runtime:

  • Node SEA CLI binaries per target.
  • Runtime assets next to the binary: GBNF grammar, pinned ripgrep, native prebuilds, starter skills.
  • No bundled browser.
  • No bundled model weights in the runtime itself.
  • No forced hosted control plane.
  • Tauri-friendly sidecar protocol for desktop products.

See BUNDLING.md for packaging, signing, notarization, target matrix, and runtime asset details.

Non-Goals

  • Not a cloud agent platform.
  • Not a full IDE coding-agent product.
  • Not a giant-prompt framework.
  • Not a hidden multi-agent planner.
  • Not a browser or model distribution.
  • Not a secret-redaction or sandbox-isolation system yet.

That restraint is deliberate. The runtime stays small, explicit, local, and embeddable.

Development

npm install
npm run lint
npm test
npm run build

Core docs:

License

MIT (c) 2026 Atomic Bot