v0.11.0
Added
[tools].disabledconfig: omit built-in tools at startup by machine name (requires/restart). Off by default:git_info,git_commit(userun_command/terminal),policy_metrics,check_environment,service_status,project_inspect(useread_file/search_files/terminal),read_channel_history(Slack channel history; opt-in),tool_trace(redundant alias — usegoal_tracewithaction: "tool_trace"). Setdisabled = []to register all base tools. Optional: addgoal_traceto disable forensics entirely.[health].enableddefaults tofalse: thehealth_probetool and background uptime monitor are opt-in. Setenabled = trueand add[health].probes(or let the agent create probes) when you want scheduled service checks and failure alerts.[cli_agents].enableddefaults tofalse:cli_agentandmanage_cli_agentsare opt-in (~1.6k schema tokens combined). Setenabled = truewhen you want aidaemon to delegate to installed CLI coding agents (Claude Code, Codex, etc.).[diagnostics].enableddefaults tofalse: theself_diagnosetool is opt-in (~560 schema tokens).record_decision_pointsstays on by default; usedb_probe, the dashboard, or CLI agents for operator debugging.- Sliding-window cache-reuse observability (Phase 0): per-LLM-call prefix fingerprint (
info!) with region sub-hashes (system prompt, pre-boundary history, tool definitions, session summary) plusforce_text/boundary metadata, a window-decision log tyingkeep_frommovement to fetch mechanics, an explicit per-build window-boundary movement event (old_keep_from/new_keep_fromplus old/new oldest-kept message ids, emitted on every build so the boundary signal is continuous across trim and no-trim paths), split window-trim counters distinct from age-based collapse, and per-stage pre-boundary fingerprints (debug!) across the full message-build pipeline so prefix-cache breaks can be attributed to the exact transform that changed the prompt. Hashes never include raw message content. - Opt-in LLM request payload dumps: setting
AIDAEMON_DUMP_LLM_REQUESTS=1(defaultllm_request_dumps/directory) orAIDAEMON_DUMP_LLM_REQUESTS=/path/to/dirwrites each finalized provider request (messages + tool definitions + model/iteration metadata) as a pretty-printed JSON file, so the exact composition of input tokens can be inspected. Dumps contain raw conversation content — local debugging only. - Stable system-prompt core + per-session core cache (Pillar A of the cross-turn prefix stability design): the system prompt is split into a byte-stable "core" (message zero) and a
[Task Context]tail inserted at the turn boundary, with a per-session cache that reuses the rendered core verbatim across turns and logs the changed component on invalidation. The largest, most expensive prefix region is now byte-identical across turns, so a prompt-caching backend can reuse its KV instead of re-evaluating it every turn. - Turn-anchored conversation history (Pillar B): the message-count sliding window is replaced by whole-turn fetch / render / eviction. Archived turns are rendered once into a byte-stable permanent form (keyed by a content fingerprint), fetched by an immutable per-turn sequence (
MIN(events.id)— timestamps are never an ordering key), and evicted whole-turn at an in-memory anchor against an archived-region token budget, socore + archived[..N-1]is byte-identical across turns and every remaining prefix break is an eviction, a loggedPrefix mutation, or a logged late-write re-render. Adds an idempotentturn_idcolumn + index on conversation events; legacyturn_id = NULLrows are covered by the session summary. - Opt-in llama.cpp slot routing (
[provider.slot_routing]): pins interactive generation to a dedicated KV-cache slot (id_slot) so always-on background tasks (memory consolidation, summarization, etc.) cannot evict the interactive conversation's cache between turns. Default off and cloud-API-safe (noid_slotis sent when disabled); requires a localllama-serverstarted with--parallel >= 2. Operational note: sliding-window-attention models (e.g. Gemma) additionally requirellama-server --swa-fullfor cross-turn KV reuse — without it llama re-processes the full prompt every turn regardless of prefix stability; size-cto cover the added KV memory of the full-size SWA cache.
Changed
-
Per-call LLM payload reduced 27% (median 22.3k → 16.2k tokens; Pillar C
of the cross-turn prefix stability design): the duplicative## Tools
catalog in the system prompt (−16.9k bytes) was replaced by compact
routing, delegation, and runtime API guidance with all load-bearing rules
migrated into the owning tool schemas, and eleven admin-tool schemas were
compressed (−4.1k bytes) under test-enforced byte budgets. Tool roster
membership is unchanged; tool-selection behavior verified by integration
suites and a live smoke. -
Background terminal completion no longer dumps raw stdout: when a backgrounded command finishes, the user now gets a short "Background terminal command completed after Ns" status ping, and the actual output is fed back through the agent so it returns a formatted, summarized reply. The raw output is only delivered verbatim (in a code block) as a fallback when the agent re-engagement is unavailable or produces nothing, so content is never lost.
Removed
- Message-count sliding window / age-ladder (Prior-1/Prior-2 collapse, the adaptive window-size trim, the
current_user_injectedsynthetic-user path, and the index-based identity-preserve bypass): superseded by Pillar B's turn-anchored history. Conversation-history retention is now governed solely by the whole-turn anchor budget, and identity-critical content survives verbatim at turn granularity inside the renderer.
Fixed
- Approving a proposed tool action no longer fails with "I ran into a processing limit": a short affirmation ("Yes, try that", "go ahead", "do it") replying to an assistant that just offered to run a tool was contracted as a text-only turn (the bare affirmation carries no action signal — the intent lives in the prior turn), so the drift guard blocked the approved tool and the turn spun into force-text and hit the safety net. The plain-text gate is now approval-aware: a short approval whose preceding assistant message proposed an action keeps tools enabled, so the daemon executes the action it offered instead of refusing it.
- Background-deferred tasks are no longer scored
failed: when a long command exceeds the run window and is moved to the background (with a "result will be sent when it finishes" ack), the turn is a deferral, not a failure — but the outcome scorer had no background-handoff awareness and scored itfailed.TaskOutcomeDerivationnow readsbackground_handoff_activeand returnspartialfor a deferred-to-background turn (a genuine model error still outranks it and fails). - Live tool-activity pings no longer leak internal tool names or raw commands: progress updates ("Using …", "✓ …") streamed raw internal tool names (
spawn_agent) and full shell commands with absolute paths, bypassing the reply sanitizer. Tool names are now mapped to friendly labels (spawn_agent→ "delegating to a specialist",terminal→ "running a command", etc.), command summaries are dropped (label only), and other summaries are run through secret redaction + char-safe truncation. [Action completed]placeholder no longer leaks to users: the internal sliding-window placeholder for orphaned tool-call-only turns is now stripped from user-facing replies, and consecutive placeholders are collapsed to a single one in the model's context (in both the live message-build path and the skeleton-extraction path). Flooding the context with identical placeholders was inviting the model to regurgitate them verbatim.- Degeneration/repetition guard on final replies: a new conservative guard collapses runaway model repetition loops (4+ consecutive identical lines or repeated sentence cycles) before a reply is sanitized and sent, preventing the wall-of-duplicated-text + chunked-message spam seen when a model (especially a local one) collapses into a loop.
- Marked the encryption-only
db_probediagnostic binary as feature-gated socargo binstall aidaemonaccepts release archives that contain only the mainaidaemonexecutable. - Re-typing the same message is no longer dropped as a duplicate: message dedup now keys on the channel's stable per-message identity (Telegram
message_id, Slackts, Discord message id) instead of hashing the text. Webhook/poll redeliveries of the same message (which reuse the id) are still suppressed, but a user deliberately re-sending the same text (which gets a new id) is treated as a fresh request. Falls back to content hashing when no id is available. - Pronoun follow-ups no longer bind to the pinned core-profile person: a follow-up that carries its subject only via a third-person pronoun ("…what can you infer about her?") was prone to a coreference hijack — small models bound the pronoun to whoever was most salient in the injected core profile (e.g. the pinned partner) instead of the actual subject of the prior exchange. The loop now detects this shape, anchors the pronoun to the immediately preceding exchange, and forces a memory lookup before answering, returning "I don't know" rather than substituting a different person.
- Clearer message when a terse request can't be turned into an action: the force-text safety net no longer emits the misleading "I ran into a processing limit" when the real cause is an under-specified request (e.g. a bare "web search" with no query) that never produced a tool call. It now asks for the missing detail instead.
Full Changelog: v0.10.0...v0.11.0