Skip to content

What Changed

Choose a tag to compare

@chauncygu chauncygu released this 31 May 04:48
· 56 commits to main since this release
60e1eff
  • May 12, 2026 (v3.05.80): (latest, security-hardening branch): Two-round security hardening sweep — CRITICAL + HIGH findings from the in-repo code review. Lands a cluster of fixes that close real attack surfaces opened by the recent rapid feature growth. Zero regressions across the full 2347-test suite.

    Bot tokens off argv / readline history. cmd_telegram and cmd_slack now accept a single-arg form (/telegram <chat_id> / /slack <channel_id>) and read the bot token from $TELEGRAM_BOT_TOKEN / $SLACK_BOT_TOKEN. Env-supplied tokens never get persisted to ~/.cheetahclaws/config.json; only tokens that actually came in via the deprecated REPL-arg path are saved on disk. New bridges.scrub_token_from_history(token) walks readline.get_history_item backwards and removes any in-memory entry that embeds the token the moment we know its value. Bridge supervisors get a token=/channel= kwarg so the env-sourced token can flow to the worker thread without ever sitting on the config dict — _slack_start_bridge(config, *, token, channel). Telegram already passed the token explicitly to _tg_supervisor. WeChat is unaffected (QR-scan token, never in argv).

    Web UI CSRF — double-submit cookie. Server mints ccsrf=<24B>; Path=/; SameSite=Strict; Max-Age=86400 (non-HttpOnly) on every connection that arrives without one. _handle_connection gates POST/PUT/PATCH/DELETE on a matching X-CSRF-Token request header (rejection: 403 csrf token mismatch). Exempt: /api/auth/{bootstrap,register,login,logout,api/auth} — they establish the session that later carries the cookie. New web/static/js/csrf.js monkey-patches window.fetch so every state-changing request automatically echoes the cookie value; loaded as the first script in chat.html, the inline terminal script in _build_html, and lab.html. Test harness (tests/test_web_api.py:_client) gains an httpx event hook that mirrors the browser behaviour. SameSite=Strict on the JWT cookie remains the first-line defence; CSRF is the second line.

    Web terminal session ownership. _PtySession(owner_uid=...) records the creator's JWT sub at /api/session time. _check_pty_owner(session, cookie) is consulted at /api/stream / /api/input / /api/resize — any other authenticated user trying to reach a known sid gets 403 not session owner. Password-only mode (no JWT) keeps owner_uid=None and skips the check, preserving the shared-secret model. Closes the trivial-sid-hijack hole in multi-user web deployments.

    Bash hard-denylist. Eight regexes in tools/shell.py:_BASH_HARD_DENY refuse host-destroying patterns regardless of permission_moderm -rf / and its --recursive/--force variants, rm -rf /*, mkfs.*, dd of=/dev/{sd,hd,nvme,vd,mmcblk,xvd}, > /dev/{sd,hd,...}, chmod -R 777 /, chown -R <user> /, and the classic :(){ :|:& };: fork bomb. Hits the Bash tool, the REPL !cmd escape, and all three bridges' !cmd paths. Plus NUL-byte + control-char + 64 KB length rejection on every Bash invocation.

    Filesystem credential denylist. tools/security.py:_check_path_allowed now refuses access to a small denylist by default — SSH private keys (~/.ssh/id_*), ~/.aws, ~/.gnupg, ~/.kube, ~/.docker, ~/.netrc, ~/.pgpass, /etc/shadow, /etc/gshadow, /etc/sudoers*, /root. Public-by-convention SSH files (config, known_hosts, authorized_keys) remain readable. Set CHEETAHCLAWS_FS_NO_SANDBOX=1 to bypass when intentionally auditing your own secrets. Independent of allowed_root, which still works as the strict-mode toggle for multi-user daemon deployments.

    Plugin loader hardening. Two new env switches in plugin/loader.py: CHEETAHCLAWS_DISABLE_PLUGINS=1 (kill switch) and CHEETAHCLAWS_PLUGIN_ALLOWLIST=a,b,c (whitelist). EXTERNAL-scope plugins (loaded via $CHEETAHCLAWS_PLUGIN_PATH) print a one-time stderr warning on first load so a stolen env-var-set doesn't silently execute. Module path resolution now uses Path.resolve() + relative_to(install_dir) to confine a malicious manifest's "tools": ["../../etc/passwd_loader"] style entry.

    MCP env sanitisation. cc_mcp/client.py:_sanitized_mcp_env strips a fixed set of process-hijack keys (LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, PYTHONPATH, PYTHONSTARTUP, PYTHONHOME, PYTHONEXECUTABLE, NODE_OPTIONS, NODE_PATH, BASH_ENV, ENV) from any env map an .mcp.json config supplies. Dropped keys print a one-line stderr notice. Bypass: CHEETAHCLAWS_MCP_TRUST_ENV=1. Closes a real local-priv-esc path on a host with multiple MCP server configs of varying trust.

    macOS daemon peer-cred. cc_daemon/auth.py:get_peer_uid now branches on sys.platform: Linux keeps SO_PEERCRED, macOS / *BSD goes through ctypes-loaded getpeereid(2). Closes a long-standing TODO that effectively reduced macOS Unix-socket auth to token-only (a stolen daemon-token implied full RCE without peer-uid validation).

    Smaller fixes folded in. Web JWT secret loader rewritten with O_CREAT \| O_EXCL + 0o600 + post-write mode verification (refuses to read a world-readable secret file; auto-falls-back to in-memory secret if chmod can't be enforced; override with CHEETAHCLAWS_WEB_SECRET). Terminal one-time password from secrets.token_urlsafe(6)[:6] (~30 bits, online-bruteable) to secrets.token_urlsafe(32) (~190 bits). cc_config.save_config strips permission_mode=accept-all before persisting — once-confirmed escape hatches no longer outlive the session that set them. session_store.save_session wrapped in a module-level Lock + explicit BEGIN IMMEDIATE / ROLLBACK so two threads writing the same session_id no longer silently drop one set of changes. agent_runner.py err_msg initialised before the try block (defends against a NameError on first iteration if _handle_permission_request returns "error"); quota.QuotaExceeded matched by isinstance instead of class-name string. compaction.compact_messages wraps stream_auxiliary in try/except + falls back to the original messages instead of crashing the agent loop. providers._recover_args_from_text caps the regex scan window to the last 32 KB of accumulated text (was scanning ~100 KB+ on every tool call). context.get_git_info + get_claude_md get TTL caches (30 s / 10 s, keyed by cwd) so the per-turn git rev-parse / status / log and CLAUDE.md re-read stop showing up in profiles. cc_mcp/client.py reader loops use dict.pop() instead of in+index so a late response after a timeout doesn't race the request side. tool_registry._cache_key adds session_id dimension so a Read(/etc/...) cached for one session never leaks to another. session_store.search_sessions LIKE-fallback path escapes %/_/\ before interpolation.

    Frontend XSS audit. Existing _esc (textContent-→-innerHTML) and _renderMd (HTML-tag-strip → marked) cover all user/model content paths. One deep-trust hole closed: web/static/js/settings.js:_renderModels previously injected server-supplied model names directly into an onclick="app.selectModel('${full}')" attribute — now uses data-model + a delegated click handler, so a malicious model registry entry cannot break out of the JS string literal.

    Defaults you can flip. CHEETAHCLAWS_BRIDGE_TERMINAL=0 hard-disables the bridge !cmd shell entirely (default 1, owner-bound by chat_id whitelist anyway). CHEETAHCLAWS_FS_NO_SANDBOX=1 lifts the credential denylist. CHEETAHCLAWS_DISABLE_PLUGINS=1 / CHEETAHCLAWS_PLUGIN_ALLOWLIST=… / CHEETAHCLAWS_MCP_TRUST_ENV=1 control plugin + MCP behaviour. Full reference in docs/guides/security.md. All 12 CRITICAL + 10 HIGH items from the review now closed (4 of those 22 turned out to be review misjudgements — _all_errors init, permission double-answer race, _broadcast iter race, and the QuotaExceeded classname check was a real fix but the surrounding "shell injection in REPL !command" was reclassified as user-typed-input not RCE). Architecture refactor items (cheetahclaws.py / providers.py God-object split, sentinel state machine) deliberately left for a separate decision — they're shape changes, not bug fixes.

  • May 12, 2026 (daemon/f-4-followups-f-6-9 branch): Daemon foundation roadmap finished — all nine F-1…F-9 items in RFC 0002 now LANDED. Closes the remaining four scope items end-to-end (≈1500 LoC of code + ≈900 LoC of tests + docs). Drilldown:

    F-4 #2 — Bridge notify forwarding. The subprocess-runner reader loop's notify IPC branch used to drop the payload on the floor (F-6/7/8 didn't exist yet). Now it routes through cc_daemon.bridge_supervisor.notify(kind, text). The runner can target a specific bridge via msg["bridge"] (e.g. "telegram") or omit it for a "*" broadcast. agent_runner_notify events on the bus carry {name, run_id, bridge, delivered, text[:500]} so observers can audit deliveries. Empty-text frames are silently dropped (common during agent shutdown).

    F-4 #3 — Restart policy. New RestartPolicy dataclass: mode (none | on-crash), max_restarts, backoff_base_s, backoff_cap_s, backoff_jitter_s. Frozen + a pure next_delay(restart_count) so the decision matrix is unit-testable. agent.start accepts the five fields flat (validation rejects cap < base which would clamp every attempt down to a useless ceiling). On a crash the reader's finally arms a threading.Timer(delay, _do_restart, ...); the Timer respawns via a swappable spawner hook (_RESTART_SPAWNER for tests) and carries restart_count forward. stop() cancels the Timer before the kill ladder, and the same _unregister(name, expected=handle) identity check protects against a Timer-fired respawn racing past a deliberate stop. Bus events: agent_runner_restart_scheduled, agent_runner_restart, agent_runner_restart_failed, agent_runner_restart_exhausted.

    F-6 / F-7 / F-8 Phase 1 — Telegram / Slack / WeChat in daemon. Single cc_daemon/bridge_supervisor.py owns lifecycle for all three kinds, gated per-bridge by feature flags (CHEETAHCLAWS_ENABLE_F6/7/8, default off, REPL is byte-for-byte unchanged until the operator opts in). The Phase 1 worker invokes today's bridges/<kind>.py:_<kind>_supervisor unchanged — same HTTP code, same reconnect/backoff, just owned by a daemon thread instead of a REPL one. Outbound bridge.notify(kind, text) dispatches via the per-kind sender (_tg_send / _slack_send / _wx_send); F-4 #2 plugs straight into it. Persistence in the F-2 bridges SQLite table (kind, enabled, config_json with secrets redacted, last_poll_at, last_error); bridge.list merges live workers with rows from previous daemon runs so disabled bridges remain visible in daemon status. Wire surface: bridge.{start,stop,list,send,status} RPCs in cc_daemon/bridge_methods.py. F-7 depends on F-6 (shared scaffolding); F-8 the same. WeChat keeps a clear-error path for missing token/base_url since the QR-login handshake is still REPL-driven (/wechat login).

    F-6 Phase 2 — Inbound refactor. When bridge.start daemon_phase2=True is passed, the legacy supervisor is bypassed for a slim daemon-driven loop: (a) outbound subscriber on the event bus, filters session_outbound events by session_id (tg:<chat_id> / sl:<channel> / wc:<user_id>) + target_bridges, calls handle.sender for delivery; (b) per-kind inbound poller (_phase2_telegram_inbound / _phase2_slack_inbound / _phase2_wechat_inbound) that re-uses today's HTTP helpers but publishes session_inbound on every new phone message instead of calling session_ctx.run_query. The agent driver — REPL, Web, or a future automation client — subscribes to session_inbound, runs the agent, calls session.reply(session_id, text, target_bridges?) for outbound chunks. Three new RPCs in cc_daemon/session_methods.py: session.send, session.reply, session.list_recent. Permission requests born inside a bridge-driven turn route only back to the originating bridge via the existing PermissionStore originator stamp (<kind>:<session_id>).

    F-9 — Cost-guardrail defaults + per-runner quota-pause. Headless cheetahclaws serve now sets four conservative defaults (session_token_budget=200_000, session_cost_budget=$2, daily_token_budget=2_000_000, daily_cost_budget=$20) via _apply_serve_defaults; REPL --in-process keeps None (unlimited) for back-compat. New system.status RPC returns {budgets, runners, bridges} so daemon status prints the live ceilings. agent.resume(budget_overrides, name?) merges overrides into daemon_state.config and (when name is supplied) calls runner_supervisor.resume(name) to deliver a resume IPC frame to a paused runner. The hook itself: a new pre-iter quota.check_quota raises into _on_quota_exceeded; the base impl is a no-op (REPL keeps today's behaviour where agent.run catches internally and yields a quota text), while _PipeAgentRunner overrides it to ship a paused_budget IPC frame, set status, and block on _resume_event.wait(). Supervisor reader publishes quota_warn + flips agent_runs.status='paused_budget'. On resume, runner sends resumed IPC, supervisor publishes agent_runner_resumed + flips status back to running. Control loop's stop handler also sets _resume_event so a stop arriving while paused unblocks cleanly.

    Post-implementation audit fixed 5 real bugs in the new code. (1) _phase2_wechat_inbound used wrong field names (messages / fromUserName / msgId / syncKey instead of msgs / from_user_id / message_id / get_updates_buf per bridges/wechat.py:411). (2) _phase2_slack_inbound initialized cursor to None, so the first poll would replay the channel's recent backlog — fixed to seed at current wall-clock time (matches bridges/slack.py:_slack_poll_loop). (3) _phase2_telegram_inbound long-polled with timeout=25 s, meaning stop() had to wait up to 25 s for the HTTP call to return before observing stop_event — dropped to 5 s. (4) _unregister(name) was identity-blind; a Timer-fired _do_restart racing with stop() could see its freshly-spawned successor handle silently popped (orphaning the subprocess). Added an optional expected=handle identity check applied at every terminal stop site (runner_supervisor + bridge_supervisor have the symmetric fix). (5) _safe_cfg only matched token / secret keys; since bridge.start merges daemon_state.config into the bridge config, provider API keys (anthropic_api_key, etc.) and password / auth_* fields could bleed through to bridges SQLite rows and SSE events — extended to (token, secret, api_key, apikey, password, passwd, auth). Two new regression tests pin both.

    Full repo suite (three independent runs): 2347 passing, 3 skipped (env-gated live LiteLLM tests), 0 failed, ~3:32 each. ~90 new daemon-specific tests across test_cc_daemon_runner_{restart_policy,notify_routing,quota_pause}.py, test_cc_daemon_{bridge_supervisor,bridge_methods,bridge_phase2,session_methods,f9_budgets}.py. RFC 0002 + docs/architecture.md §Daemon updated to reflect all of F-1 → F-9 landed. Details: RFC 0002.

  • May 12, 2026 (fix/litellm-provider-followup branch): litellm/ provider follow-up to PR #119 — make litellm a real optional dep, fix ledger / streaming, and wire it into the CLI / Web UI path. PR #119 (RheagalFire) introduced cc_kernel/runner/llm/litellm_provider.py so CheetahClaws could route to 100+ LLM providers behind one SDK, but a careful re-review against the merge surfaced four classes of integration gap that the 12 mocked unit tests didn't catch. The follow-up branch (fix/litellm-provider-followup, 2 commits, 9 files, +1093/-229) fixes all of them and lands the docs the original PR was missing. (1) Dependency classification — description said optional, diff put it in core. Pyproject's [project] dependencies had grown a litellm>=1.60.0,<2.0.0 line, and requirements.txt's core block matched; every pip install cheetahclaws was force-pulling litellm and its transitive chain (tokenizers, tiktoken, pinned pydantic versions). Moved to [project.optional-dependencies] under a new litellm extra, also added to all; requirements.txt now only documents the optional install via a comment. Backed up by a test_litellm_is_optional_dependency regression. (2) Not reachable through either user path. cc_kernel/runner/llm/__main__.py:_select_provider only knew mock / scripted / anthropic, and the top-level providers.PROVIDERS registry (which the CLI + Web UI consult to resolve --model <X>) had no litellm entry at all, so end-to-end the new class was reachable only by direct Python import. Added a litellm branch to _select_provider (reads CC_LLM_API_KEY as an optional explicit override), a PROVIDERS["litellm"] entry with type: "litellm", and a new stream_litellm() generator in providers.py mirroring stream_openai_compat's shape — yields TextChunk per delta then AssistantTurn at end. The dispatcher in providers.stream() branches on prov["type"] == "litellm". bare_model("litellm/openai/gpt-4o") strips only the first /, leaving openai/gpt-4o — exactly what litellm.completion(model=...) expects. (3) Streaming silently zeroed the ledger. stream() returned tokens_input=0, tokens_output=0, tool_calls=(), finish_reason="stop" unconditionally. The kernel runner emits charge IPC messages from those fields and gates RFC 0022 tool dispatch on response.is_tool_use, so every streamed call bypassed quota and lost any tool_use the model emitted. Fix passes stream_options={"include_usage": True} to litellm.completion and reassembles the chunk list with litellm.stream_chunk_builder(chunks, messages=...) so the synthesized final response carries real token counts, tool_calls, and finish_reason. Two regression tests pin the contract (test_stream_emits_deltas_and_returns_usage, test_stream_preserves_tool_calls); a third (test_cost_unknown_set_when_chunk_builder_fails) covers the fallback when the builder returns None on very old litellm versions. (4) cost_micro hard-coded to 0 — quota free pass. Both __call__ and stream() returned cost_micro=0 regardless of model. Switched to litellm.completion_cost(completion_response=resp, model=model) which uses litellm's per-model price table (covers 100+ providers, kept in sync upstream); convert USD → micro-USD via the same * 1_000_000 factor AnthropicProvider uses. On completion_cost raising (unknown model) or returning None, the response carries metadata["cost_unknown"]=True so the ledger can distinguish a real $0 (Ollama, free NIM tier) from an unpriced call. Exception mapping. try: ... except Exception: raise ProviderUnavailable(...) swallowed every error class into "their fault" — 401s, malformed requests and connection timeouts all looked the same to the runner. New _map_exception reads self._litellm.exceptions.{AuthenticationError, BadRequestError, NotFoundError, UnsupportedParamsError} and re-raises those as ProviderInvalidRequest ("your fault"); everything else stays ProviderUnavailable so the runner may retry. Reads exception classes off the already-imported self._litellm module (instead of from litellm import exceptions) so the mapper stays testable without a real SDK installed. Lazy import. Top-level import litellm violated the module-level contract in cc_kernel/runner/llm/__init__.py ("imported lazily so the absence of an SDK doesn't break this module's import") — every place that imported the runner's LLM package was implicitly importing litellm. Refactored to an _ensure_litellm() first-use pattern matching AnthropicProvider._ensure_client, with a test_module_imports_without_litellm that strongly verifies the property (the local dev env doesn't have litellm installed — the test passes). Self-review caught 5 more bugs before pushing. (a) _parse_tool_calls called tc.function.name outside the try block — a malformed tool_call with function=None would crash the whole response instead of the single bad call; fixed by getattr chain + continue-on-empty-name. (b) json.loads("null") and json.loads("[1,2]") return None / list, which trip LlmResponse.__post_init__'s isinstance(tc["input"], dict) validator; fixed by coercing non-dict to {}. (c) Same JSON-non-dict bug in providers.stream_litellm's streaming tool-call assembly; same isinstance guard. (d) The streaming fallback (when stream_chunk_builder returns None) emitted metadata={} instead of {"cost_unknown": True}, breaking ledger consistency. (e) tests/e2e_litellm_provider.py's fixture's try/except ImportError was dead code once the import was lazy — would confusingly fail on real assertions rather than pytest.skip if CC_LITELLM_E2E=1 was set on a box without litellm. Replaced with an explicit _ensure_litellm() probe + pytest.skip on ProviderUnavailable. 6 new defensive tests pin all five fixes. Tests. 23 unit tests in tests/test_litellm_provider.py (was 12 mocked-only) — covers lazy import, registry wiring (both _select_provider and providers.PROVIDERS), cost computation with cost_unknown fallback, streaming usage + tool_calls preservation, exception class mapping (AuthenticationErrorProviderInvalidRequest), and 6 defensive tool-call parsing regressions. New tests/e2e_litellm_provider.py mirrors the 3 live-API tests the PR body claimed but never committed (basic call, streaming, system prompt steering); skipif-gated on CC_LITELLM_E2E=1 AND per-provider credentials so CI / dev runs don't accidentally bill. Full non-e2e suite: 2222 / 2222 passing, zero regressions (up from 2154 baseline). Docs. New section in docs/guides/recipes.md under Section 1, between the vLLM/custom/ walkthrough and Section 2 — covers Bedrock SigV4, Azure deployment routing, Vertex service-account JWTs with concrete env-var setup, plus a 5-row troubleshooting table mirroring the existing vLLM one (litellm not installed, drop_params masking, cost_unknown semantics, Bedrock 401 region mismatch, Azure 403 stale api_version). README gains a pip install ".[litellm]" line in Optional extras, three Supported Models table rows (Bedrock / Azure / Vertex via litellm), and a dedicated LiteLLM (AWS Bedrock / Azure / Vertex AI) subsection under Closed-Source API Models with concrete invocation examples and an explicit pointer toward custom/ for plain OpenAI-shaped endpoints so users don't pull litellm when they don't need it. i18n READMEs (CN/JP/ES/DE/PT) intentionally left for the maintainer's translation cadence. Branch: fix/litellm-provider-followup (2 commits — abc3357 code + tests + recipes, f5f364d README), open for review against main.

  • May 11, 2026 (daemon/f-4 branch): F-4 skeleton — agent_runner becomes a supervised subprocess (RFC 0002). The fourth piece of the daemon foundation roadmap lands as a feature-flagged skeleton on the daemon/f-4 branch. Today each /agent <template> runner lives in a Python thread inside the REPL / web server process — one rogue runner can OOM-kill or hang the whole thing. F-4 makes each runner its own python -m agent_runner --pipe subprocess under cc_daemon supervision so a leak, infinite loop, segfault, or kill -9 on the runner becomes an observable event (agent_runner_crash on the daemon event bus) instead of a process-wide failure. Components: (1) cc_daemon/runner_supervisor.py (~650 LoC)start / stop / stop_all / get / list_all, 3-phase stop (IPC stop → SIGTERM after 2 s → SIGKILL after another 3 s, bounded ≤ 5 s as required by the RFC acceptance criteria), background reader thread per runner pumping iteration_done / permission_request / notify / log IPC messages, crash classification on EOF, and best-effort writes to F-2's agent_runs + agent_iterations SQLite tables (INSERT OR IGNORE makes iteration re-delivery idempotent; last_iteration UPDATE never regresses). (2) cc_daemon/runner_ipc.py — thin re-export of cc_kernel.runner.ipc.JsonLineChannel so the kernel-side and daemon-side runners share one wire-format implementation (avoids the duplicate-fix-twice trap). (3) cc_daemon/agent_methods.py — four JSON-RPC methods agent.start / agent.stop / agent.list / agent.status registered alongside the F-3 monitor.* family, with full param validation (TypeError-32602 INVALID_PARAMS via cc_daemon.rpc). (4) agent_runner.py gains a --pipe entry point: _pipe_main reads init from stdin, builds a _PipeAgentRunner subclass that bridges send_fn → IPC notify and _persist_record → IPC iteration_done, then drives the existing _run_loop body so all stagnation-detection / circuit-breaker / dup-summary logic from the threaded path is preserved unchanged. (5) start_runner / stop_runner / stop_all now dispatch on agent_runner_subprocess config key or CHEETAHCLAWS_ENABLE_F4=1 env var; default off, Windows always thread-mode. Self-review caught and fixed 3 real bugs before pushing: (a) reader-thread race (started before _register + DB insert) reordered; (b) malformed-message orphan (a null iteration field unwound the reader → finally classified crashed but subprocess kept running) — wrapped per-message dispatch in try/except + hard-kill in finally if proc still alive; (c) pre-handshake log+exit IPC on template-not-found that supervisor misread as the ready reply, switched to stderr + non-zero exit so the handshake EOF surfaces a clean error. Tests: 27 new (test_cc_daemon_runner_supervisor.py 19 + test_cc_daemon_agent_methods.py 10 — handshake, graceful stop ≤ 5 s, SIGKILL escalation on hung runner, external SIGKILL crash detection, IPC shim identity, 9 SQLite persistence cases incl. duplicate-delivery idempotency, 2 malformed-input safety-net regressions, RPC param validation for all 4 methods, end-to-end list → status → stop with an inline runner). 104 / 104 passing across F-4 + cc_daemon + cc_kernel + existing agent_runner tests, zero regressions. Still TODO before flipping from "skeleton" to "MERGED": permission routing through cc_daemon/permission.py (currently auto-approves), bridge notify forwarding (waiting on F-6/7/8), restart policy, e2e test with the real python -m agent_runner against a tiny template. Branch: daemon/f-4. RFC: docs/RFC/0002-daemon-foundation-roadmap.md.