A personal AI assistant for the desktop — multi-channel, model-agnostic, and deliberately lightweight.
Arini is a full-stack personal AI assistant: a polished Electron desktop application backed by a Python agent runtime. It lets you converse with any major AI model across 16+ messaging platforms — Telegram, Discord, WeChat, Slack, Matrix, and more — while surfacing every conversation in a single native desktop UI.
In spirit, Arini resembles OpenClaw — the open-source agentic framework that shifted the AI assistant paradigm from stateless chatbot to persistent, context-aware agent running on your own machine. Like OpenClaw, Arini operates across the messaging channels you already live in, routes every conversation through a serialized tool-using agent loop, and stores all context as plain files you own. Unlike typical cloud assistants, nothing leaves your machine unless you choose it to.
What sets Arini apart is a second agent running alongside the main assistant: an Observability Agent that watches your work as it happens — analyzing your conversation history and monitoring macOS screen events via the Accessibility API (similar to how Littlebird passively captures on-screen context) — and continuously evolves its own memory and configuration to fit your actual workflow.
The project is built around a clear architectural conviction: an AI assistant should be powerful without being heavy. Arini handles streaming responses, concurrent tool execution, multi-channel message routing, long-context management, scheduled tasks, and continuous self-improvement — without a database, without a cloud dependency, and without unnecessary framework overhead. The entire backend fits in ~26,000 lines of Python across 95 focused modules.
Download the latest .dmg from Releases, install, and launch. Arini will prompt for a provider and API key on first run.
git clone <repo-url> && cd nanobot
# Add credentials
cp .env.example .env # then edit with your API keys
# Start the gateway
docker compose up nanobot-gateway
# OpenAI-compatible API available at:
# http://localhost:18790/v1/chat/completions# Backend
pip install uv
uv pip install -e .
nanobot chat # Interactive CLI
nanobot gateway # API server on :8900
# Frontend (separate terminal)
cd frontend && npm install
cd packages/client && npm start # Electron dev mode, hot reload
# WhatsApp bridge (optional)
cd bridge && npm install && npm run build && npm start| Area | Capability |
|---|---|
| Channels | 16+ platforms: Telegram, Discord, Slack, Matrix, WeChat (Personal + Work), DingTalk, Feishu, QQ, WhatsApp, and more |
| LLM Providers | 30+ backends: Anthropic, OpenAI, DeepSeek, Google Gemini, Zhipu, Qwen, Groq, Mistral, Ollama, vLLM, and more |
| MCP Support | First-class Model Context Protocol integration — any MCP server appears as a native tool |
| Tool Execution | Concurrent async tool dispatch; built-in web search, shell exec, file I/O, and extensible via MCP |
| Skills | Markdown-based skills loaded progressively into context — GitHub, cron, memory, summarize, tmux, and more |
| Scheduling | Cron scheduler with three modes (at, every, cron) and full IANA timezone support |
| API | OpenAI-compatible HTTP API + WebSocket streaming — drop-in for any OpenAI client |
| Desktop UI | Native Electron app with warm, minimal design; real-time streaming, session management, file attachments |
| Observability Agent | Background agent analyzes chat history + macOS screen events; autonomously evolves memory and surfaces workflow suggestions |
| Security | SSRF protection with comprehensive IP blocking, HMAC auth, tool sandboxing |
| Deployment | Single Docker image; no database required; 256 MB RAM reservation |
Arini has two major components that communicate over a local WebSocket and HTTP interface.
┌─────────────────────────────────────────────────────────────┐
│ Arini Desktop (Electron) │
│ │
│ ┌──────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Sidebar │ │ NanobotChatView │ │ Settings / │ │
│ │ Sessions │ │ Streaming UI │ │ Onboarding │ │
│ └──────────┘ └──────────────────┘ └──────────────────┘ │
│ │ │ │ │
│ └──────────────┼──────────────────────┘ │
│ │ WebSocket (ws://127.0.0.1:8900) │
│ ┌─────────┴──────────┐ │
│ │ NanobotManager │ (spawns Python process) │
│ └─────────┬──────────┘ │
└────────────────────────│────────────────────────────────────┘
│ subprocess + IPC
┌────────────────────────▼────────────────────────────────────┐
│ nanobot (Python) │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ AgentLoop │ │ AgentRunner │ │ Provider Layer │ │
│ │ Orchestrates│→ │ Tool-using │→ │ 30+ LLM backends │ │
│ │ sessions │ │ LLM loop │ │ auto-detected │ │
│ └──────┬──────┘ └──────────────┘ └───────────────────┘ │
│ │ │
│ ┌──────▼──────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ Message │ │ Channel │ │ Tool Registry │ │
│ │ Bus │ │ Adapters │ │ MCP / Built-in │ │
│ └─────────────┘ │ (16+ chans) │ └───────────────────┘ │
│ └──────────────┘ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ Sessions │ │ Skills │ │ Cron Scheduler │ │
│ │ (JSONL) │ │ (Markdown) │ │ 3 schedule modes │ │
│ └─────────────┘ └──────────────┘ └───────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ObservabilityAgent (separate scheduled agent) │ │
│ │ macOS screen events → session JSONL analysis → │ │
│ │ auto-evolves MEMORY.md · suggests config changes │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌────────▼──────────┐
│ WhatsApp Bridge │ (Node.js / Baileys — optional sidecar)
└───────────────────┘
- The Electron app spawns the Python nanobot process on startup, managing its full lifecycle (health polling, SIGTERM/SIGKILL shutdown, auto-restart).
- Channels (Telegram, Discord, etc.) receive messages and publish them to a central async message bus, completely decoupled from the agent layer.
- AgentLoop consumes the bus, routes each message to a session, and invokes AgentRunner.
- AgentRunner runs the tool-using LLM loop: calls the configured provider, dispatches tool calls concurrently via
asyncio.gather(), applies token budgeting, and streams the response back. - Streaming responses are delivered over WebSocket to the Electron renderer in real time.
- Sessions persist as JSONL files — no database required.
Every messaging platform is implemented as a BaseChannel subclass with a uniform interface: start(), stop(), send(), and an optional send_delta() for per-token streaming. Channels publish standardized InboundMessage objects to the message bus, completely decoupled from the agent layer. Adding a new channel requires no changes to the agent, tools, or session management.
Supported Channels:
| Platform | Type | Notes |
|---|---|---|
| Telegram | Bot API | Media support (photos, files) |
| Discord | Bot | Rich embeds, markdown formatting |
| Slack | Workspace | Block kit formatting |
| Matrix / Element | Client | E2E encryption support |
| WeChat Personal | Web | QR code auth via bridge |
| WeChat Work (WeCom) | Enterprise | Message threading |
| DingTalk | Enterprise | Alibaba ecosystem |
| Feishu / Lark | Enterprise | ByteDance ecosystem |
| Bot | Tencent ecosystem | |
| Web | Via Node.js/Baileys bridge | |
| SMTP/IMAP | Async communication | |
| Direct / CLI | Local | Programmatic and terminal access |
| Twilio | SMS/Voice | Telephony integration |
| Web Chat | HTTP | Browser-based interface |
| Voice | Audio | Speech-first interface |
| Mo Chat | IM | Additional IM platform |
Audio transcription is available across channels via the Groq Whisper API — voice messages on any supported platform are automatically transcribed before processing.
Providers are detected automatically from model name keywords, API key prefixes, and base URL patterns — no manual configuration required in most cases.
Provider Detection Strategy:
- Gateway detection (API key prefix — e.g.
sk-or-→ OpenRouter) - Model keyword matching (e.g.
claude→ Anthropic,gpt→ OpenAI) - Environment variable detection
- Fallback to custom endpoint
Selected Providers:
| Category | Providers |
|---|---|
| Frontier | Anthropic Claude, OpenAI GPT, Google Gemini, DeepSeek |
| Chinese | Zhipu (GLM), Alibaba Qwen, Moonshot Kimi, Xiaomi MIMO, BytePlus, VolcEngine, SiliconFlow |
| Gateways | OpenRouter, AiHubMix |
| Open Source | Mistral, Groq (Mixtral/Llama), Together AI, Fireworks, Replicate, Perplexity |
| Local | Ollama, vLLM, LM Studio, OpenVINO Model Server |
| Enterprise | Azure OpenAI, GitHub Copilot |
| Custom | Any OpenAI-compatible endpoint |
Each provider is a stateless adapter implementing chat() and chat_stream_with_retry(). Model-specific constraints (e.g. minimum temperature for certain models) are declared per ProviderSpec without forking provider implementations. Transient errors (429, 5xx, timeouts) trigger configurable exponential backoff retry.
Arini's backend is ~26,000 lines across 95 Python files — an average of 275 lines per module. This is intentional.
Stateless processors. AgentRunner, all providers, and all tools are stateless. State lives in sessions and is loaded on demand. Horizontal scaling is trivial; reasoning about any single component is easy.
No database. Sessions use JSONL — append-only, human-readable, diffable, zero-dependency. Cron jobs use a single cron.json. The entire persistent state of the assistant is a directory of text files you can version-control or inspect with any editor.
Progressive skill loading. The system prompt includes only a one-line summary per skill. Full skill content is loaded on-demand when the agent invokes read_file. A conversation using two of twelve skills pays token cost for two, not twelve.
Concurrent tool execution. Tools marked concurrency_safe run in parallel via asyncio.gather(). Three 1-second tool calls complete in ~1 second total.
Smart context governance. The runner tracks token consumption per iteration and trims history intelligently — removing the oldest complete user-assistant pairs while preserving legal message boundaries (no orphaned tool results; always starts at a user turn). Tool results are individually truncated to max_tool_result_chars to prevent context overflow on large outputs.
Configuration-driven extensibility. Providers, channels, tools, and MCP servers are added through config, not subclasses. Thirty providers are handled by five concrete provider implementations.
Resource footprint. In Docker: 0.25 CPU / 256 MB RAM reserved; 1 CPU / 1 GB RAM limit. No Redis, no PostgreSQL, no external queues required.
AgentRunner (nanobot/agent/runner.py, ~600 lines) is the pure tool-using LLM loop, separated from all product concerns:
- Sanitize — remove empty content, validate message structure
- Call — invoke the configured LLM provider with available tools
- Execute — dispatch tool calls concurrently for
concurrency_safetools - Budget — truncate oversized tool results; snip oldest history if approaching context limit
- Iterate — repeat up to
max_iterationsor untilend_turn - Stream — deliver each token to the registered
StreamHookcallback in real time
AgentRunSpec declares execution parameters (model, tools, context window, iteration limit, retry mode). AgentRunResult returns the final content, full message history, tool usage, token counts, and stop reason.
AgentLoop sits above AgentRunner and handles the product-level orchestration: per-session routing, multi-channel dispatch, observability agent integration, and message bus consumption. When a message with chat_id="observability" arrives, AgentLoop routes it directly to the ObservabilityAgent rather than the main agent — enabling interactive dialogue with the background analysis layer from any channel.
Built-in tools available to the main agent:
| Category | Tools |
|---|---|
| Filesystem | read_file, write_file, edit_file, list_files |
| Web | web_search, fetch_url (with SSRF protection) |
| Shell | exec — sandboxed command execution |
| Memory | memory_read, memory_write — persistent key-value workspace |
| Communication | message — send to any registered channel |
| MCP | Any connected MCP server tool via MCPToolWrapper |
All tools participate in the unified token budget and concurrency system. Tools marked concurrency_safe run in parallel via asyncio.gather(); the rest serialize to prevent race conditions on shared state.
Arini includes a self-improving observability layer — a dedicated background agent that watches your work as it happens and continuously refines the assistant's understanding of your preferences, workflows, and habits. The premise is identical to what Littlebird demonstrated: the AI should already know your context, not require you to re-explain it every session.
Desktop activity via macOS Accessibility API. The Electron frontend continuously polls the active macOS window using the native Accessibility API, extracting structured text from the UI element hierarchy — application name, window title, active document, and file change events — without taking screenshots. This follows the same approach Littlebird pioneered: reading AXUIElement trees gives clean structured text rather than requiring an expensive vision model pass over raw pixels.
Events are batched in a local in-memory buffer (up to 10,000 events), deduplicated, and synced to the Python backend every five minutes over the local API:
// Event types captured by the Electron SyncEngine
interface ObservationEvent {
event_type: "app_switch" | "file_change"
app_name?: string // e.g. "Xcode", "Cursor", "Notion"
window_title?: string // active window/document title
file_path?: string // file being edited
change_type?: "created" | "modified" | "deleted"
timestamp: string
}Conversation history. Session JSONL files accumulate every turn of every conversation. The observability agent reads these files to extract behavioral signals: corrections, preferences, frustrations, and positive reinforcement buried in natural language.
The agent runs on a configurable schedule (default: every 3 hours) and executes a structured prompt-driven pipeline — no bespoke ML model, just the main LLM reading your own data:
| Phase | What Happens |
|---|---|
| 1 — Orient | Reads MEMORY.md, HISTORY.md, cron job history (cron/jobs.json), and the observability state file to establish baseline — what was analyzed last run, what has already been learned |
| 2 — Analyze | Greps session JSONL files for behavioral signals: corrections ("no", "don't", "wrong", "instead"), preferences ("I prefer", "always use", "never"), frustrations ("I already told you", "again"), positive signals ("perfect", "exactly", "great"), and tool call frequency patterns |
| 3 — Evolve | Auto-applies high-confidence improvements directly to workspace files: adds confirmed preferences (3+ occurrences) to MEMORY.md, fixes stale references, consolidates duplicate entries, records validated workflow patterns, appends a timestamped entry to HISTORY.md |
| 4 — Suggest | Queues lower-confidence findings to memory/pending-suggestions.md for user review: new skill recommendations, model/temperature changes, tool configuration updates |
The agent tracks a fingerprint for each suggestion. Rejected fingerprints are persisted to memory/observability-state.md — dismissed suggestions never reappear.
After each run, the main agent checks pending-suggestions.md and surfaces outstanding suggestions unprompted at the start of new conversations. Users can accept, reject, or ask the observability agent to explain its reasoning — routing to the dedicated chat_id="observability" session:
You have 3 pending observability suggestions.
Would you like to review them?
→ message(chat_id="observability", content="Review pending suggestions")
Safe changes (preference additions, memory cleanup) are applied automatically. Structural changes (config edits, model switches) always go through the suggestion queue.
The desktop app's dedicated Observability panel receives live WebSocket pushes as the agent runs:
- Stats stream: sessions analyzed, patterns found, pending suggestion count
- Suggestion cards: type, title, body text, and tags — one card per finding
- Activity timeline: recent app switches and file change events, surfacing the raw context the agent is working from
The panel is intentionally read-only except for accept/reject actions — the goal is transparency into what the agent has observed and learned, not a manual configuration UI.
Electron (renderer) Python Backend
───────────────── ──────────────────────────────────
macOS Accessibility API → SyncEngine → POST /api/obs/events
(AXUIElement polling) (batch/5min) ObservationEventBuffer
ObservabilityAgent (scheduled via CronService, every N hours)
├── Phase 1: Orient (read memory, session list, cron history)
├── Phase 2: Analyze (grep session JSONL for behavioral signals)
├── Phase 3: Evolve (auto-write to MEMORY.md, HISTORY.md)
└── Phase 4: Suggest (write pending-suggestions.md)
│
└──→ WebSocket push → Observability panel (live stats + cards)
Any MCP (Model Context Protocol) server integrates transparently. MCPToolWrapper wraps each MCP tool as a first-class nanobot tool — it appears identical to built-in tools from the LLM's perspective and participates fully in the token budget and concurrency system.
Three transport modes are supported: stdio (local subprocess), sse (Server-Sent Events over HTTP), and streamableHttp. Each MCP tool has a configurable timeout (default 30 seconds). The schema normalizer converts MCP's JSON Schema (with oneOf/anyOf/nullable unions) to OpenAI-compatible format automatically.
Skills are markdown documents — a unified format that serves as both documentation and LLM instruction. Each skill lives in a directory with a SKILL.md file and optional supporting code.
12 built-in skills: GitHub, cron management, memory consolidation, observability, skill creation, document summarization, tmux integration, weather, and more.
Skills declare requirements (CLI binaries and environment variables). SkillsLoader validates availability at runtime and marks unavailable skills gracefully rather than failing. Users add custom skills to ~/.nanobot/skills/ without modifying any code.
The cron service supports three scheduling modes with full IANA timezone support:
| Mode | Description | Example |
|---|---|---|
at |
One-time execution at a timestamp | Run once on Dec 25 at 10:00 UTC |
every |
Fixed-interval recurrence | Every 3,600,000 ms (1 hour) |
cron |
Standard cron expression | 0 9 * * 1 — 9 AM every Monday |
Jobs persist to cron.json with atomic writes. Each job records a run history (last 20 executions), last status, and next scheduled time. delete_after_run supports one-shot alerts.
The backend exposes an OpenAI-compatible HTTP API — any existing OpenAI client works without modification:
POST /v1/chat/completions # Chat (streaming and non-streaming)
GET /v1/models # List available models
GET /health # Health check
POST /api/upload # File upload (multipart)
WS /ws # WebSocket for streaming + session events
Session-level async locks prevent concurrent modification of the same session. WebSocket clients authenticate via HMAC token, then subscribe to specific sessions to receive streaming deltas, tool call events, and session state updates in real time.
The Arini desktop app is a native macOS application built with Electron 33, React 18, and TypeScript 5 in strict mode. The Python backend is embedded — users get a single .dmg install with no separate server setup.
The interface takes clear design cues from Anthropic's Claude Cowork — the agentic desktop mode inside Claude Desktop that brought warm earth tones, sidebar-based panel navigation, and task-transparency affordances to knowledge work. Arini adopts the same visual DNA: a terracotta accent, cream and beige backgrounds, generous whitespace, and a layout where the left sidebar navigates between contexts (conversations, system sessions, observability) while the main pane stays focused on the current task.
The visual language is warm and minimal — a palette of natural taupes, off-whites, and a restrained terracotta accent that mirrors Anthropic's brand, deliberately avoiding the harsh contrast of typical AI interfaces.
| Token | Value | Usage |
|---|---|---|
--bg-primary |
#f5f5f0 |
Main background (warm off-white) |
--bg-sidebar |
#eeede8 |
Sidebar (slightly darker beige) |
--bg-user-bubble |
#ddd9ce |
User message bubbles (tan) |
--accent |
#ae5630 |
Primary actions (burnt sienna / terracotta) |
--green |
#788c5d |
Active/success states (muted sage) |
--text-primary |
#1a1a18 |
Headlines (near-black, warm-tinted) |
Typography uses Inter with system fallbacks; monospace content renders in SF Mono / Fira Code / Cascadia Code. All transitions run at 150ms — snappy without feeling mechanical. The macOS title bar uses hiddenInset for a native feel, and the background color matches the app chrome to eliminate any seam.
Messages stream token-by-token from the backend. The renderer maintains a StreamingTurn with an immutable StreamingBlock[] array — each token delta creates a new TextBlock rather than mutating existing state. An animated blinking dot marks the active text block. Tool calls appear inline as collapsible blocks with execution-type icons (File, Edit, Terminal, Search, List). When streaming ends, blocks are committed to the ChatMessage[] history.
Markdown renders via react-markdown + remark-gfm: headings, lists, tables, code blocks (monospace-styled), and GFM-flavored content all render correctly.
The sidebar groups sessions into two sections — Conversations (direct/channel chats) and System (CLI sessions). Unread badge counts increment locally on incoming messages and sync with the server via session.updated WebSocket events. Sessions persist watch state in localStorage. The WebSocket client reconnects automatically after network interruption, with exponential backoff (1s base, 30s cap) and a queued-send buffer that flushes on reconnect.
The sidebar includes a dedicated Observability section — a live view into the background analysis agent. It displays:
- Stats: sessions analyzed, behavioral patterns found, pending suggestion count
- Suggestion cards: each finding from the latest analysis run, with type (preference / skill / config), title, body, and tags — accept or reject with one click
- Activity timeline: a rolling log of macOS screen events (app switches, file changes) that the agent used as context, making the agent's reasoning transparent and inspectable
Users can also open a direct chat thread with the observability agent from this panel to ask questions, challenge a finding, or trigger an on-demand analysis run. The panel subscribes to observability.* WebSocket messages and updates in real time as the agent works.
The input area supports drag-and-drop, paste (including clipboard images), and click-to-browse. Files up to 10 MB are uploaded to /api/upload and attached to the message as FileRef objects. Image attachments render as inline thumbnails; documents show a file emoji chip. All chips are removable before send.
- No state management library — React hooks + WebSocket subscriptions (intentionally minimal, zero Redux/Zustand overhead)
- Strict TypeScript throughout —
strict: true,readonlyon all interfaces, union types for all discriminated states - Immutable state — spread operators everywhere; no in-place mutation
- Singleton WebSocket client —
NanobotWsClient.getInstance()with type-based subscription system anduseEffectcleanup - Monorepo shared types —
@arini/shareddefines the full API contract (envelope types, session types, file refs, API paths) shared across main and renderer processes - Auto-update support —
electron-updaterhandles background updates with user notification
| Component | Technology |
|---|---|
| Language | Python 3.11+ |
| Async runtime | asyncio (stdlib) |
| Config / validation | Pydantic v2 |
| HTTP server | aiohttp |
| CLI | typer + prompt_toolkit |
| LLM clients | anthropic, openai (official SDKs) |
| MCP | mcp library (stdio, SSE, streamableHttp) |
| Scheduling | croniter |
| Audio transcription | Groq Whisper API |
| Persistence | JSONL files (no database) |
| Security | ipaddress, hmac (stdlib) |
| Packaging | uv |
| Component | Technology |
|---|---|
| Desktop shell | Electron 33 |
| UI framework | React 18 |
| Language | TypeScript 5 (strict mode) |
| Build tool | Vite 5 + Electron Forge |
| Markdown | react-markdown + remark-gfm |
| Auto-update | electron-updater |
| Shared types | @arini/shared (workspace monorepo) |
| Component | Technology |
|---|---|
| WhatsApp Web | Baileys (reverse-engineered WA Web) |
| Runtime | Node.js 20 |
| Language | TypeScript |
arini/
├── nanobot/ # Python agent backend
│ ├── agent/ # AgentLoop, AgentRunner, ObservabilityAgent, tool registry
│ ├── api/ # HTTP server, WebSocket hub, obs_events endpoint
│ ├── bus/ # Async message bus
│ ├── channels/ # 16+ platform adapters
│ ├── config/ # Pydantic schemas (incl. ObservabilityConfig)
│ ├── cron/ # Job scheduler (3 modes + timezone)
│ ├── providers/ # 30+ LLM backend adapters
│ ├── security/ # SSRF protection, network validation
│ ├── session/ # JSONL persistence, memory consolidation
│ ├── skills/ # 12 built-in markdown skills
│ └── templates/ # Agent system prompts (incl. observability pipeline)
├── frontend/ # Electron desktop app
│ └── packages/
│ ├── client/ # @arini/client — main + preload + renderer
│ └── shared/ # @arini/shared — API contract types, constants
├── bridge/ # WhatsApp bridge (Node.js / Baileys)
├── docker-compose.yml # Gateway + CLI service definitions
└── Dockerfile # Single-image build (Python + Node.js)
MIT — see LICENSE.