Skip to content

v0.11.0

Choose a tag to compare

@davo20019 davo20019 released this 08 Jun 22:06
· 66 commits to master since this release

Added

  • [tools].disabled config: omit built-in tools at startup by machine name (requires /restart). Off by default: git_info, git_commit (use run_command/terminal), policy_metrics, check_environment, service_status, project_inspect (use read_file/search_files/terminal), read_channel_history (Slack channel history; opt-in), tool_trace (redundant alias — use goal_trace with action: "tool_trace"). Set disabled = [] to register all base tools. Optional: add goal_trace to disable forensics entirely.
  • [health].enabled defaults to false: the health_probe tool and background uptime monitor are opt-in. Set enabled = true and add [health].probes (or let the agent create probes) when you want scheduled service checks and failure alerts.
  • [cli_agents].enabled defaults to false: cli_agent and manage_cli_agents are opt-in (~1.6k schema tokens combined). Set enabled = true when you want aidaemon to delegate to installed CLI coding agents (Claude Code, Codex, etc.).
  • [diagnostics].enabled defaults to false: the self_diagnose tool is opt-in (~560 schema tokens). record_decision_points stays on by default; use db_probe, the dashboard, or CLI agents for operator debugging.
  • Sliding-window cache-reuse observability (Phase 0): per-LLM-call prefix fingerprint (info!) with region sub-hashes (system prompt, pre-boundary history, tool definitions, session summary) plus force_text/boundary metadata, a window-decision log tying keep_from movement to fetch mechanics, an explicit per-build window-boundary movement event (old_keep_from/new_keep_from plus old/new oldest-kept message ids, emitted on every build so the boundary signal is continuous across trim and no-trim paths), split window-trim counters distinct from age-based collapse, and per-stage pre-boundary fingerprints (debug!) across the full message-build pipeline so prefix-cache breaks can be attributed to the exact transform that changed the prompt. Hashes never include raw message content.
  • Opt-in LLM request payload dumps: setting AIDAEMON_DUMP_LLM_REQUESTS=1 (default llm_request_dumps/ directory) or AIDAEMON_DUMP_LLM_REQUESTS=/path/to/dir writes each finalized provider request (messages + tool definitions + model/iteration metadata) as a pretty-printed JSON file, so the exact composition of input tokens can be inspected. Dumps contain raw conversation content — local debugging only.
  • Stable system-prompt core + per-session core cache (Pillar A of the cross-turn prefix stability design): the system prompt is split into a byte-stable "core" (message zero) and a [Task Context] tail inserted at the turn boundary, with a per-session cache that reuses the rendered core verbatim across turns and logs the changed component on invalidation. The largest, most expensive prefix region is now byte-identical across turns, so a prompt-caching backend can reuse its KV instead of re-evaluating it every turn.
  • Turn-anchored conversation history (Pillar B): the message-count sliding window is replaced by whole-turn fetch / render / eviction. Archived turns are rendered once into a byte-stable permanent form (keyed by a content fingerprint), fetched by an immutable per-turn sequence (MIN(events.id) — timestamps are never an ordering key), and evicted whole-turn at an in-memory anchor against an archived-region token budget, so core + archived[..N-1] is byte-identical across turns and every remaining prefix break is an eviction, a logged Prefix mutation, or a logged late-write re-render. Adds an idempotent turn_id column + index on conversation events; legacy turn_id = NULL rows are covered by the session summary.
  • Opt-in llama.cpp slot routing ([provider.slot_routing]): pins interactive generation to a dedicated KV-cache slot (id_slot) so always-on background tasks (memory consolidation, summarization, etc.) cannot evict the interactive conversation's cache between turns. Default off and cloud-API-safe (no id_slot is sent when disabled); requires a local llama-server started with --parallel >= 2. Operational note: sliding-window-attention models (e.g. Gemma) additionally require llama-server --swa-full for cross-turn KV reuse — without it llama re-processes the full prompt every turn regardless of prefix stability; size -c to cover the added KV memory of the full-size SWA cache.

Changed

  • Per-call LLM payload reduced 27% (median 22.3k → 16.2k tokens; Pillar C
    of the cross-turn prefix stability design): the duplicative ## Tools
    catalog in the system prompt (−16.9k bytes) was replaced by compact
    routing, delegation, and runtime API guidance with all load-bearing rules
    migrated into the owning tool schemas, and eleven admin-tool schemas were
    compressed (−4.1k bytes) under test-enforced byte budgets. Tool roster
    membership is unchanged; tool-selection behavior verified by integration
    suites and a live smoke.

  • Background terminal completion no longer dumps raw stdout: when a backgrounded command finishes, the user now gets a short "Background terminal command completed after Ns" status ping, and the actual output is fed back through the agent so it returns a formatted, summarized reply. The raw output is only delivered verbatim (in a code block) as a fallback when the agent re-engagement is unavailable or produces nothing, so content is never lost.

Removed

  • Message-count sliding window / age-ladder (Prior-1/Prior-2 collapse, the adaptive window-size trim, the current_user_injected synthetic-user path, and the index-based identity-preserve bypass): superseded by Pillar B's turn-anchored history. Conversation-history retention is now governed solely by the whole-turn anchor budget, and identity-critical content survives verbatim at turn granularity inside the renderer.

Fixed

  • Approving a proposed tool action no longer fails with "I ran into a processing limit": a short affirmation ("Yes, try that", "go ahead", "do it") replying to an assistant that just offered to run a tool was contracted as a text-only turn (the bare affirmation carries no action signal — the intent lives in the prior turn), so the drift guard blocked the approved tool and the turn spun into force-text and hit the safety net. The plain-text gate is now approval-aware: a short approval whose preceding assistant message proposed an action keeps tools enabled, so the daemon executes the action it offered instead of refusing it.
  • Background-deferred tasks are no longer scored failed: when a long command exceeds the run window and is moved to the background (with a "result will be sent when it finishes" ack), the turn is a deferral, not a failure — but the outcome scorer had no background-handoff awareness and scored it failed. TaskOutcomeDerivation now reads background_handoff_active and returns partial for a deferred-to-background turn (a genuine model error still outranks it and fails).
  • Live tool-activity pings no longer leak internal tool names or raw commands: progress updates ("Using …", "✓ …") streamed raw internal tool names (spawn_agent) and full shell commands with absolute paths, bypassing the reply sanitizer. Tool names are now mapped to friendly labels (spawn_agent → "delegating to a specialist", terminal → "running a command", etc.), command summaries are dropped (label only), and other summaries are run through secret redaction + char-safe truncation.
  • [Action completed] placeholder no longer leaks to users: the internal sliding-window placeholder for orphaned tool-call-only turns is now stripped from user-facing replies, and consecutive placeholders are collapsed to a single one in the model's context (in both the live message-build path and the skeleton-extraction path). Flooding the context with identical placeholders was inviting the model to regurgitate them verbatim.
  • Degeneration/repetition guard on final replies: a new conservative guard collapses runaway model repetition loops (4+ consecutive identical lines or repeated sentence cycles) before a reply is sanitized and sent, preventing the wall-of-duplicated-text + chunked-message spam seen when a model (especially a local one) collapses into a loop.
  • Marked the encryption-only db_probe diagnostic binary as feature-gated so cargo binstall aidaemon accepts release archives that contain only the main aidaemon executable.
  • Re-typing the same message is no longer dropped as a duplicate: message dedup now keys on the channel's stable per-message identity (Telegram message_id, Slack ts, Discord message id) instead of hashing the text. Webhook/poll redeliveries of the same message (which reuse the id) are still suppressed, but a user deliberately re-sending the same text (which gets a new id) is treated as a fresh request. Falls back to content hashing when no id is available.
  • Pronoun follow-ups no longer bind to the pinned core-profile person: a follow-up that carries its subject only via a third-person pronoun ("…what can you infer about her?") was prone to a coreference hijack — small models bound the pronoun to whoever was most salient in the injected core profile (e.g. the pinned partner) instead of the actual subject of the prior exchange. The loop now detects this shape, anchors the pronoun to the immediately preceding exchange, and forces a memory lookup before answering, returning "I don't know" rather than substituting a different person.
  • Clearer message when a terse request can't be turned into an action: the force-text safety net no longer emits the misleading "I ran into a processing limit" when the real cause is an under-specified request (e.g. a bare "web search" with no query) that never produced a tool call. It now asks for the missing detail instead.

Full Changelog: v0.10.0...v0.11.0