Skip to content

Releases: SafeRL-Lab/cheetahclaws

What Changed

07 Jun 05:58
99a3b4a

Choose a tag to compare

  • June 5, 2026 (v3.05.82) (latest): User-controllable token / cost budgets — set a spend cap; on hit the session auto-saves and you can resume or raise it. The quota engine (quota.py: per-session + per-day token/cost counters, enforced before each model call) already existed but had no friendly surface — you had to know four config keys (session_token_budget / session_cost_budget / daily_token_budget / daily_cost_budget) and there was no way to see how close you were, no warning before the wall, and the hard stop printed a bare [Quota exceeded]. This adds the UX layer on top of the unchanged engine: a /budget command — no args shows usage vs every budget as colored bars + percentages; /budget $5 sets a session cost cap (the $ means USD), /budget 200k a session token cap (parses 200k / 1.5m / 200000), /budget daily $20 / /budget daily 2m the daily caps, and /budget clear removes all. A --budget $5 / --budget 200k startup flag sets the session cap at launch. Proximity warnings fire at the end of any turn that crosses ≥80% (yellow) / ≥95% (red) of a cap, so the wall never arrives by surprise. On hit the agent now yields a QuotaPause event (instead of a plain text line): the REPL auto-saves the session (session_latest.json + daily backup, the same path /resume reads) and prints a friendly next-steps block — raise the same cap or remove it (/budget clear) then resend, or restart later and /resume. So a long task that runs out of budget is never lost: you analyze, adjust, and continue. Tight enforcement (no surprise overshoot): the check projects the next request's input (compaction.estimate_tokens) and stops before the call if it would cross the cap, and clamps that call's max_tokens to the remaining headroom (quota.output_room) — so a single tool-heavy turn can't blow 40k→49k past the budget the way a pure "already-spent ≥ limit" check let it. One budget per scope: setting a cap replaces the other unit for that scope (/budget $5 after /budget 200k switches the session cap to cost rather than stacking), so a leftover token cap can't silently keep blocking after you switch to a $ cap. Unit-matched hint: QuotaExceeded / QuotaPause carry which cap broke (key/scope/unit/limit), so the "raise it" suggestion is in the right unit — a token cap shows /budget 40k, a daily cost cap shows /budget daily $40 — instead of a generic $ amount that wouldn't lift a token cap. New helpers quota.parse_budget / fmt_amount / usage_vs_limits / warnings / output_room; command in commands/core.py:cmd_budget; QuotaPause in agent.py; REPL handling + --budget in cheetahclaws.py; 42-case tests/test_budget.py (isolated quota dir, incl. a regression that the hint matches the breached unit and that switching units clears the stale cap). The daemon's conservative serve-mode defaults (200k tok / $2 per session, 2M / $20 per day) are unchanged — interactive stays unlimited by default, the server stays guard-railed. See docs/guides/features.md · docs/guides/reference.md.
  • June 5, 2026 (v3.05.82): Adaptive Markdown streaming — live output that stays correct on every device. In-place Rich Live redraw is great on capable terminals but breaks elsewhere: it was disabled wholesale over SSH (so SSH users got raw tokens with no formatting), and where it did run it could leave duplicate or stale frames — on macOS Terminal (which can't erase above the scroll boundary), over laggy network PTYs, or with wide CJK / emoji text whose display width a naive line-count gets wrong. The renderer now selects a streaming tier per device in ui.render.auto_stream_mode(config): live — full in-place redraw, only on terminals known to handle cursor-up (local TTYs, and modern emulators even over SSH: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty, detected via TERM_PROGRAM / TERM / WT_SESSION / KITTY_WINDOW_ID / ALACRITTY_WINDOW_ID / WEZTERM_PANE); commitappend-only progressive Markdown, the safe default for unknown-SSH / Apple Terminal / pipes / non-TTY, where each completed block (split on blank lines, respecting open code fences so a fenced block renders atomically) is rendered and printed permanently and the cursor is never moved, making a duplicate frame structurally impossible regardless of terminal, latency, or character width; plain — raw tokens, only when rich is unavailable. The append-only floor is provably duplication-free; live is progressive enhancement on top. Override with /config stream_mode=live|commit|plain (legacy boolean /config rich_live=true|false still works → live/commit). Implemented in ui/render.py (set_stream_mode / auto_stream_mode / _safe_commit_point / _commit_stream / _commit_flush), wired in at REPL start in cheetahclaws.py, with a 26-case test suite in tests/test_stream_modes.py (device routing, code-fence-aware block boundaries, append-only commit, and a regression asserting commit mode emits zero cursor sequences even on a TTY with CJK text). Two related UX items shipped alongside: /context is now a visual grid — a Claude-Code-style 20×10 cell grid of context-window usage, colored and broken down by category (system prompt / system tools / memory files / skills / messages / free space) with per-category token counts and percentages, adapting to the model's real context window and falling back to #/. on non-UTF-8 terminals (commands/core.py:cmd_context); and deepseek-v4-flash is registered at its 1M context window in providers._MODEL_CONTEXT_LIMITS (overriding the 128K deepseek provider default, which still applies to deepseek-chat / deepseek-v4-pro), so the prompt %, /context, and the compaction trigger all reflect the true 1M window. See docs/guides/features.md · docs/guides/reference.md.

What Changed

05 Jun 05:59
cb282bf

Choose a tag to compare

  • June 4, 2026 (v3.05.81) (latest): Claude-Code-style quiet output — hide tool execution, show one summary line per turn. Long analysis turns used to scroll the terminal with a ⚙ Bash(...) line and a ✓ → N lines (… chars) line for every tool call, and the permission prompt dumped the entire inline script (e.g. a 60-line python3 << 'PYEOF' heredoc). A new quiet mode (on by default) suppresses the per-tool lines — the spinner conveys live activity and a single summary line is emitted at the tool→text boundary, sitting just above the reply (Read 2 files, ran 3 shell commands), the way Claude Code does. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the permission prompt also collapses a multi-line command to one line (Run: python3 << 'PYEOF' … (+59 行)) instead of printing the whole script. /verbose overrides quiet (full per-tool lines + inputs + token counts); toggle with /quiet, or launch with --show-tools (alias --no-quiet). The startup banner gains an Output: quiet / Output: full line so the active mode is visible at a glance. Live status line: the spinner now shows elapsed time plus a running output-token estimate (Thinking… (7s · ↓ 435 tokens)) — char-based, since providers only report real usage at the end — and each quiet turn closes with a real-usage footer ✻ Worked for 7.2s · ↑ 1.2k · ↓ 435 built from the true TurnDone counts. Implemented in ui/render.py (turn-level tool accumulator + turn_summary_line(), spinner token meter, print_turn_stats()), wired through the REPL event loop in cheetahclaws.py, with the /quiet toggle in commands/config_cmd.py. See docs/guides/features.md.

  • June 4, 2026: Context-window override — the prompt % and compaction now follow a settable context length. The prompt's context-usage % (and the compaction trigger) derive from the model's context window, which previously could only be a hardcoded provider default — and max_tokens (the OUTPUT cap) doesn't change it, so /config max_tokens=… left the % unchanged (a common point of confusion). New per-session key context_window (/config context_window=<N>, 0 = model default) overrides it, kept deliberately distinct from max_tokens. A single parser (providers.context_window_override) feeds the prompt %, /context, the compaction trigger, and the per-call output-token cap, so all four stay consistent; it is bidirectional — a smaller value forces earlier compaction, a larger value corrects a stale default. The value is read live each prompt, so switching model or context_window updates the % with no restart. /config warns when the value exceeds the model's real window (which would disable compaction and let the API reject oversized prompts). No-op when unset, so existing behavior is unchanged. See docs/guides/reference.md.

  • June 4, 2026: Rich Live streaming — long responses stay live via a bounded tail window. Large streamed responses that would overflow the terminal's redraw area could leave duplicate or stale frames behind on some emulators (macOS Terminal, etc.), because Rich Live redraws the whole accumulated output in place and the cursor can't reach content that has scrolled into the scrollback. Building on the per-response fallback from PR #133, Rich Live now keeps the live region bounded to the viewport: a short response is shown in full, but once it would overflow, only the last screenful of rendered lines (a tail window) is redrawn — so the Live region can never exceed the terminal and cannot leave stale frames. The complete output is committed once when the response finishes (including on Ctrl-C, since the REPL flushes on interrupt), so the head that scrolled out of the window is never lost. Plain streaming is kept only as a safety net (precise render failed, or the terminal is too small to bound a window). A cheap per-line wrap estimate short-circuits the expensive full render_lines() measurement while a response stays well under the limit, so normal responses pay no extra Markdown re-render per chunk. Adds focused tests covering full-frame streaming, the full→tail transition, tail-window commit-on-flush, real Segments rendering, and both safety-net fallbacks. See docs/guides/features.md.

  • May 31, 2026: QQ bot bridge — /qq connects cheetahclaws to QQ groups + C2C private chats (PR #121). Uses the official qq-botpy WebSocket + HTTP SDK (pip install "cheetahclaws[qq]"). botpy's async client runs on a dedicated asyncio event loop inside a daemon thread, bridged to the synchronous main thread via thread-safe queues. Handles on_group_at_message_create (group @-mentions, prefix stripped) and on_c2c_message_create (private). Since QQ has no message-edit API, replies stream as new messages every ~2 s (2000-char chunking) instead of updating a placeholder; passive replies reference the original msg_id/event_id within QQ's 5-minute window, then fall back to active pushes. Per-target FIFO job queues, slash-command passthrough, !jobs/!retry/!cancel remote control, image input, and permission prompts scoped to the originating chat (no cross-chat approvals). A supervisor reconnects with exponential backoff (2 s → 120 s). Secret handling matches the hardening standard below: $QQ_SECRET (recommended) > REPL arg (deprecated, warns + scrubs history) > config; env-supplied secrets never touch ~/.cheetahclaws/config.json. /qq <appid>, /qq, /qq stop|status|logout. Two follow-up fixes over the original PR: image downloads moved off the event loop into loop.run_in_executor (a blocking urlopen would freeze the WebSocket heartbeat for up to 30 s), and the secret no longer gets written to disk unconditionally. See docs/guides/bridges.md.

What Changed

31 May 04:48
60e1eff

Choose a tag to compare

  • May 12, 2026 (v3.05.80): (latest, security-hardening branch): Two-round security hardening sweep — CRITICAL + HIGH findings from the in-repo code review. Lands a cluster of fixes that close real attack surfaces opened by the recent rapid feature growth. Zero regressions across the full 2347-test suite.

    Bot tokens off argv / readline history. cmd_telegram and cmd_slack now accept a single-arg form (/telegram <chat_id> / /slack <channel_id>) and read the bot token from $TELEGRAM_BOT_TOKEN / $SLACK_BOT_TOKEN. Env-supplied tokens never get persisted to ~/.cheetahclaws/config.json; only tokens that actually came in via the deprecated REPL-arg path are saved on disk. New bridges.scrub_token_from_history(token) walks readline.get_history_item backwards and removes any in-memory entry that embeds the token the moment we know its value. Bridge supervisors get a token=/channel= kwarg so the env-sourced token can flow to the worker thread without ever sitting on the config dict — _slack_start_bridge(config, *, token, channel). Telegram already passed the token explicitly to _tg_supervisor. WeChat is unaffected (QR-scan token, never in argv).

    Web UI CSRF — double-submit cookie. Server mints ccsrf=<24B>; Path=/; SameSite=Strict; Max-Age=86400 (non-HttpOnly) on every connection that arrives without one. _handle_connection gates POST/PUT/PATCH/DELETE on a matching X-CSRF-Token request header (rejection: 403 csrf token mismatch). Exempt: /api/auth/{bootstrap,register,login,logout,api/auth} — they establish the session that later carries the cookie. New web/static/js/csrf.js monkey-patches window.fetch so every state-changing request automatically echoes the cookie value; loaded as the first script in chat.html, the inline terminal script in _build_html, and lab.html. Test harness (tests/test_web_api.py:_client) gains an httpx event hook that mirrors the browser behaviour. SameSite=Strict on the JWT cookie remains the first-line defence; CSRF is the second line.

    Web terminal session ownership. _PtySession(owner_uid=...) records the creator's JWT sub at /api/session time. _check_pty_owner(session, cookie) is consulted at /api/stream / /api/input / /api/resize — any other authenticated user trying to reach a known sid gets 403 not session owner. Password-only mode (no JWT) keeps owner_uid=None and skips the check, preserving the shared-secret model. Closes the trivial-sid-hijack hole in multi-user web deployments.

    Bash hard-denylist. Eight regexes in tools/shell.py:_BASH_HARD_DENY refuse host-destroying patterns regardless of permission_moderm -rf / and its --recursive/--force variants, rm -rf /*, mkfs.*, dd of=/dev/{sd,hd,nvme,vd,mmcblk,xvd}, > /dev/{sd,hd,...}, chmod -R 777 /, chown -R <user> /, and the classic :(){ :|:& };: fork bomb. Hits the Bash tool, the REPL !cmd escape, and all three bridges' !cmd paths. Plus NUL-byte + control-char + 64 KB length rejection on every Bash invocation.

    Filesystem credential denylist. tools/security.py:_check_path_allowed now refuses access to a small denylist by default — SSH private keys (~/.ssh/id_*), ~/.aws, ~/.gnupg, ~/.kube, ~/.docker, ~/.netrc, ~/.pgpass, /etc/shadow, /etc/gshadow, /etc/sudoers*, /root. Public-by-convention SSH files (config, known_hosts, authorized_keys) remain readable. Set CHEETAHCLAWS_FS_NO_SANDBOX=1 to bypass when intentionally auditing your own secrets. Independent of allowed_root, which still works as the strict-mode toggle for multi-user daemon deployments.

    Plugin loader hardening. Two new env switches in plugin/loader.py: CHEETAHCLAWS_DISABLE_PLUGINS=1 (kill switch) and CHEETAHCLAWS_PLUGIN_ALLOWLIST=a,b,c (whitelist). EXTERNAL-scope plugins (loaded via $CHEETAHCLAWS_PLUGIN_PATH) print a one-time stderr warning on first load so a stolen env-var-set doesn't silently execute. Module path resolution now uses Path.resolve() + relative_to(install_dir) to confine a malicious manifest's "tools": ["../../etc/passwd_loader"] style entry.

    MCP env sanitisation. cc_mcp/client.py:_sanitized_mcp_env strips a fixed set of process-hijack keys (LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, PYTHONPATH, PYTHONSTARTUP, PYTHONHOME, PYTHONEXECUTABLE, NODE_OPTIONS, NODE_PATH, BASH_ENV, ENV) from any env map an .mcp.json config supplies. Dropped keys print a one-line stderr notice. Bypass: CHEETAHCLAWS_MCP_TRUST_ENV=1. Closes a real local-priv-esc path on a host with multiple MCP server configs of varying trust.

    macOS daemon peer-cred. cc_daemon/auth.py:get_peer_uid now branches on sys.platform: Linux keeps SO_PEERCRED, macOS / *BSD goes through ctypes-loaded getpeereid(2). Closes a long-standing TODO that effectively reduced macOS Unix-socket auth to token-only (a stolen daemon-token implied full RCE without peer-uid validation).

    Smaller fixes folded in. Web JWT secret loader rewritten with O_CREAT \| O_EXCL + 0o600 + post-write mode verification (refuses to read a world-readable secret file; auto-falls-back to in-memory secret if chmod can't be enforced; override with CHEETAHCLAWS_WEB_SECRET). Terminal one-time password from secrets.token_urlsafe(6)[:6] (~30 bits, online-bruteable) to secrets.token_urlsafe(32) (~190 bits). cc_config.save_config strips permission_mode=accept-all before persisting — once-confirmed escape hatches no longer outlive the session that set them. session_store.save_session wrapped in a module-level Lock + explicit BEGIN IMMEDIATE / ROLLBACK so two threads writing the same session_id no longer silently drop one set of changes. agent_runner.py err_msg initialised before the try block (defends against a NameError on first iteration if _handle_permission_request returns "error"); quota.QuotaExceeded matched by isinstance instead of class-name string. compaction.compact_messages wraps stream_auxiliary in try/except + falls back to the original messages instead of crashing the agent loop. providers._recover_args_from_text caps the regex scan window to the last 32 KB of accumulated text (was scanning ~100 KB+ on every tool call). context.get_git_info + get_claude_md get TTL caches (30 s / 10 s, keyed by cwd) so the per-turn git rev-parse / status / log and CLAUDE.md re-read stop showing up in profiles. cc_mcp/client.py reader loops use dict.pop() instead of in+index so a late response after a timeout doesn't race the request side. tool_registry._cache_key adds session_id dimension so a Read(/etc/...) cached for one session never leaks to another. session_store.search_sessions LIKE-fallback path escapes %/_/\ before interpolation.

    Frontend XSS audit. Existing _esc (textContent-→-innerHTML) and _renderMd (HTML-tag-strip → marked) cover all user/model content paths. One deep-trust hole closed: web/static/js/settings.js:_renderModels previously injected server-supplied model names directly into an onclick="app.selectModel('${full}')" attribute — now uses data-model + a delegated click handler, so a malicious model registry entry cannot break out of the JS string literal.

    Defaults you can flip. CHEETAHCLAWS_BRIDGE_TERMINAL=0 hard-disables the bridge !cmd shell entirely (default 1, owner-bound by chat_id whitelist anyway). CHEETAHCLAWS_FS_NO_SANDBOX=1 lifts the credential denylist. CHEETAHCLAWS_DISABLE_PLUGINS=1 / CHEETAHCLAWS_PLUGIN_ALLOWLIST=… / CHEETAHCLAWS_MCP_TRUST_ENV=1 control plugin + MCP behaviour. Full reference in docs/guides/security.md. All 12 CRITICAL + 10 HIGH items from the review now closed (4 of those 22 turned out to be review misjudgements — _all_errors init, permission double-answer race, _broadcast iter race, and the QuotaExceeded classname check was a real fix but the surrounding "shell injection in REPL !command" was reclassified as user-typed-input not RCE). Architecture refactor items (cheetahclaws.py / providers.py God-object split, sentinel state machine) deliberately left for a separate decision — they're shape changes, not bug fixes.

  • May 12, 2026 (daemon/f-4-followups-f-6-9 branch): Daemon foundation roadmap finished — all nine F-1…F-9 items in RFC 0002 now LANDED. Closes the remaining four scope items end-to-end (≈1500 LoC of code + ≈900 LoC of tests + docs). Drilldown:

    F-4 #2 — Bridge notify forwarding. The subprocess-runner reader loop's notify IPC branch used to drop the payload on the floor (F-6/7/8 didn't exist yet). Now it routes through cc_daemon.bridge_supervisor.notify(kind, text). The runner can target a specific bridge via msg["bridge"] (e.g. "telegram") or omit it for a "*" broadcast. agent_runner_notify events on the bus carry {name, run_id, bridge, delivered, text[:500]} so observers can audit deliveries. Empty-text frames are silently dropped (common during agent shutdown).

    F-4 #3 — Restart policy. New RestartPolicy dataclass: mode (none | on-crash), max_restarts, backoff_base_s, backoff_cap_s, backoff_jitter_s. Frozen + a pure next_delay(restart_count) so the decision matrix is unit-testable. agent.start accepts the five fields flat (validation rejects cap < base which would clamp every attempt down to a useless ceiling). On a crash the reader's finally arms a threading.Timer(delay, _do_restart, ...); the Timer respawns via a swappable spawner hook (_RESTART_SPAWNER for tests) and carries restart_count forward. stop() cancels the Timer before the kill ladder, and the same _unregister(name, expected=handle) identity check protects against a Timer-fired respawn racing past a deliberate stop. Bus events: agent_runner_restart_scheduled, agent_runner_restart, `agent_runner_restart_fai...

Read more

What Changed

10 May 22:27
d7fc51a

Choose a tag to compare

  • May 10, 2026 (latest, v3.05.79): Web Chat UI session organization + headless-bridges slash handler + stale-session reaper crash fix. Three threads of work merged into a single release. Bridges / headless deploys (#84 follow-up): Telegram / Slack / WeChat /help, /monitor, /model, /status produced zero response in Docker / --web deploys because _start_headless_bridges() only wired run_query and agent_state on the shared session_ctx — never handle_slash. The bridge poll loops gate on if slash_cb: and fell through to continue before the 📩 Telegram: log line, so the failure was invisible in docker compose logs -f. Fix: extracted the slash handler (originally inlined in repl()) into a module-level factory _make_bridge_slash_handler(state, config, run_query); both REPL and headless paths now use it (single source of truth, no future drift between modes). Stale-session reaper crash: web/api.py:reap_stale_chat_sessions() called remove_chat_session(sid) without the user_id the function now requires for ownership-check parity — every reaper tick raised TypeError, killing the daemon thread, so stale ChatSession objects accumulated forever in the in-memory cache. Fix: capture (sid, user_id) pairs from the cached ChatSession objects under _chat_lock, then apply outside the lock. Web UI session organization: five-feature bundle layered on top — folders + drag-drop + Move-to context menu, ChatGPT-style active-folder context (click a folder name → + New and direct-typing both drop new sessions into that folder, with a Chat · in <Folder> topbar breadcrumb), batch select with Select-all-respecting-search-filter, batch delete + combined-Markdown export (chats-N-sessions.md), and a 4-px draggable sidebar divider with localStorage persistence. Backend adds a folders table, chat_sessions.folder_id nullable FK, in-place PRAGMA table_info + ALTER TABLE migration in init_db(), and 5 new HTTP endpoints (GET/POST /api/folders, PATCH/DELETE /api/folders/{id}, PATCH /api/sessions/{id}/folder). Also rolled in: issue #111 (handle_slash_sync / handle_slash_stream no longer double-broadcast to WS) and --web --model X persistence. Tests: +16 new across test_web_api.py (folder CRUD, batch ops, reaper regression) and the new test_bridge_slash_handler.py (5 cases pinning the headless handler contract). Full suite: 2154 / 2154 passing, zero regressions. User-side guide: docs/guides/web-ui.md.

  • May 10, 2026: Web Chat UI fixes — slash commands no longer reply twice; --web --model X actually applies the model. Two related issues that surfaced when wiring a self-hosted vLLM endpoint into the Chat UI. (1) Issue #111 — slash commands duplicated in Chat UI but not in terminal. web/api.py:handle_slash_sync was both returning events inline in the HTTP response and broadcasting the same events to the WS subscribers of the same client; chat.js then iterated data.events AND fired _handleEvent from ws.onmessage, rendering every reply twice. Same bug in handle_slash_stream for SSE-streamed long commands (/brainstorm, /worker, /agent, /plan). Both helpers now deliver events through a single channel — HTTP/SSE only — so _handleEvent runs exactly once per event. Background-thread events (sentinel flows, agent runs) are unaffected: by the time the worker thread emits, _broadcast is already restored to the live WS broadcaster in finally. (2) --web --model X was silently ignored. The CLI override branch only ran in the interactive-REPL path; the if args.web: branch loaded config straight from disk and started the server, so python cheetahclaws.py --web --model custom/qwen2.5-72b would happily boot but every request handler reloaded ~/.cheetahclaws/config.json with the previous model name (e.g. gemma-4-31B-it), producing a confusing 404: model does not exist against the new endpoint. Fix: cheetahclaws.py now persists args.model to config before calling start_web_server, matching the documented behavior; provider:modelprovider/model normalization is identical to the REPL path. User-side guide: docs/guides/web-ui.md (Troubleshooting + Architecture notes updated).

  • May 10, 2026: Small-context local models survive large workloads — 4-part fix: ctx cap, auto-fanout, stagnation-stop, output paths under ~/.cheetahclaws/. Repro that motivated the work: running /agent → 1 (Research Assistant) on a 6.6 MB PDF (AutoRedTeamer.pdf — ~70k tokens of extracted text) with custom/qwen2.5-72b (32k ctx). Old behavior: 400 BadRequest "context length 32768"; the agent_runner kept polling the template every 2 s; the model produced 1500+ identical "task complete" summaries before anything stopped it. New behavior, four cooperating layers: (1) Per-model context-window registry + dynamic max_tokens cap (providers._MODEL_CONTEXT_LIMITS + get_model_context_window + dynamic_cap_max_tokens) — covers Qwen 2.5/3, Llama 3.x, Mistral/Mixtral, Phi, Gemma, DeepSeek local variants; _fetch_custom_model_limit now backfills PROVIDERS["custom"]["context_limit"] so compaction sees the live /v1/models value; per-call shrink based on actual prompt size keeps input + output + 1024 safety ≤ ctx. compaction.get_context_limit gains an optional config arg so custom-endpoint detection works on the very first turn. (2) Auto-fanout for oversize tool outputs (multi_agent/fanout.py) — when a single tool result (Read on a huge PDF, Grep over a giant tree, WebFetch of a long article) exceeds 0.4 × ctx_window, split into chunks at paragraph boundaries with token-overlap, dispatch parallel sub-LLM map calls (one per chunk, default cap 5 subagents), merge with a single reduce call; substitutes the merged summary in conversation history instead of letting the next API call overflow. Hooked at the tool-result append site in agent.py; transparent UX prints [Auto-fanout: <Tool> returned ~N chars (>threshold) → dispatching K parallel sub-summaries]. Configurable: auto_fanout_enabled / _threshold / _max_subagents / _chunk_overlap_tokens. (3) Stagnation-stop in agent_runner.py — when the model emits the same summary N iterations in a row (default 3, whitespace/case-normalized), stop the loop with a clear notification instead of burning thousands of API calls; configurable via auto_agent_dup_summary_limit (0 disables). (4) Agent output paths under ~/.cheetahclaws//agent wizard now resolves relative output filenames (e.g. research_notes.md) to absolute paths under ~/.cheetahclaws/agents/<name>/output/ instead of CWD; AgentRunner exposes runner.output_dir, eagerly mkdir'd; Summary block + post-start info show the resolved path in green; absolute paths pass through unchanged. Tests: +47 new (fanout 23, ctx cap 18, dup-stop 13, output paths 8). Full suite: 2139 passing, zero regressions. User-side guide: docs/guides/extensions.md.

  • May 9, 2026: Read tool auto-redirects on overflow — defense-in-depth for the case where model ignores the template instruction. Re-running the same /agent + autodan.pdf failure showed two real-world problems with the prior fix: (1) The user was running the pip-installed binary (/home/shangdinggu/anaconda3/bin/cheetahclaws), not the source tree. New tools / templates added to source had no effect. (2) Even if the user reinstalled, qwen2.5-72b would likely still call Read instead of SummarizeLargeFile — models default to familiar tools no matter what the template says. The fix moves the routing decision into the Read tool itself. (a) New _maybe_redirect_to_summarize helper (tools/files.py). When Read or ReadPDF would return content too large to safely fit in the next API call, it instead returns a short redirect message like [ReadTooLarge: file is too large — call SummarizeLargeFile with file_path='X' instead] PREVIEW: …. The model sees the redirect, calls SummarizeLargeFile, gets a chunked-and-merged summary back. The raw content never enters the API call. (b) CJK-aware token estimation. CJK content tokenizes at ~1 token per character (vs ~2.8 chars/token for English). New _is_cjk_heavy() heuristic: ≥20% CJK characters → use 1:1 char-to-token estimate. A 24K-char Chinese file is 24K tokens, not 8.6K, and now triggers redirect on a 32K-context model. (c) Conservative ceiling for unreliable provider declarations. custom/<model> provider declares 128K context by default but the underlying model is often 32K (qwen2.5-72b, llama 3 8B, etc.). New safe_ctx = min(declared_ctx, 30000) caps the threshold at 30K tokens regardless of provider claims — the redirect now fires on the user's exact ~25K-token PDF case (would NOT have fired with the unconditional 128K ceiling, which is exactly the bug). (d) Wrapped Read registration (tools/__init__.py). New _read_with_overflow_check lambda calls _maybe_redirect_to_summarize after _read returns; for results <8KB it skips (not worth the check). ReadPDF gets the same treatment inline in _read_pdf. Why this works even on the old install: as soon as the user updates tools/files.py and tools/__init__.py, the redirect fires regardless of whether SummarizeLargeFile / template changes are present. The redirect's prose tells the model exactly which tool to call and with what args. Tests: 14 new pytest cases (tests/test_read_overflow_redirect.py) — CJK detection (English / Chinese / Japanese / mixed-minority / empty), threshold logic (small file → no redirect; user's exact failure case → redirect with right pointer; CJK at lower char count triggers vs same chars in English; conservative ceiling protects against overconfident provider; preview included for context). Plus 2 integration tests via execute_tool("Read", ...) confirming the wrapper applies the redirect ...

Read more

What Changed

09 May 01:37
61e618d

Choose a tag to compare

  • May 8, 2026 (v3.05.78): May 8, 2026: F-2/F-3 follow-ups + CI unblock (feature/fix-f2). Main has been red since 9c01237d (the trading-agent #99 merge) because tests/test_packaging.py::test_required_module_imports[modular.trading.ml] (issue #97 regression test) caught that modular/trading/ml/features.py and modular/trading/portfolio.py import numpy at module top while numpy is in the [trading] extra — pip install . shipped a broken wheel and #100 / #101 inherited the red. Two-commit fix on top of #101: (a) fix(ci) — drop the dead numpy import from features.py; defer numpy to inside stacker.py:train() / predict_proba() past their early-return paths; gate portfolio.py's numpy behind try/except; add pytest.mark.skipif on the optimizer / managed-portfolio / ML-training / factor-scan tests so lean-install CI skips them cleanly. Verified: clean venv with only [web,autosuggest] (the exact CI install) 1075 passed, 11 skipped; with full extras 1086 passed, no regressions. (b) fix(daemon) — five F-2/F-3 follow-ups: move monitor.scheduler.start(...) past the listener bind in cc_daemon/cli.py:cmd_serve (so a misconfigured fetch/deliver can't fail before the daemon is reachable); add _foreign_daemon_running() step-aside check at every scheduler loop tick to close the race where REPL /monitor start fires before the daemon writes its discovery file (both schedulers would otherwise race on last_run_at); flip cc_daemon/schema.py to PRAGMA synchronous=NORMAL (safe under WAL, 8× faster EventBus.publish — 305 μs/event → 39 μs/event, important for streaming agent output); clarify in jobs.py / monitor/store.py / docs/architecture.md that the JSON→SQLite migration is one-way (PR #101's wording implied a fallback read path that doesn't exist); update docs/RFC/0002-daemon-foundation-roadmap.md F-2/F-3 status from OPEN → MERGED. Branch: feature/fix-f2.

  • Research lab Phase A — autonomous multi-day research; WeChat smart-reply + /draft semi-auto reply; reliability + UX hardening across the lab pipeline. Two big surfaces shipped together: (a) the research lab is no longer single-shot — /lab resume <run_id> [<stage>] reconstructs LabState from SQLite to continue or rewind a run; /lab iterate <run_id> runs a 3-reviewer self-review on the final report (novelty / rigor / clarity / evidence, 1-10), routes the lowest-scoring dimension to the corresponding stage (novelty→QUESTIONING, rigor→IMPLEMENTATION, clarity→DRAFTING, evidence→EXPERIMENT), rewinds + re-runs, loops until target_score / max_iterations / plateau / budget; /lab backlog add <topic> --iterate --target=N --max=N --prio=N queues many topics, /lab daemon start runs them 24/7 in a single-worker loop with crash-recovery (reset_running_backlog unsticks stale rows on next start); /lab models prints the effective per-role model + which API key drove each pick + warns when reviewers span <N families (homogeneous review = no meta-loop signal); /lab migrate-paths [--apply] renames legacy lab_xxx/ output dirs to the human-readable <date>_<time>_<topic-slug>_<run_id_short> form (e.g. 2026-05-08_14-30_post-transformer-architectures-survey_b16036de/). (b) WeChat smart-reply panel — when a whitelisted contact sends an inbound message, an auxiliary cheap model drafts 3 candidate replies and pushes them as a panel to your filehelper (文件传输助手); reply with 1/2/3/AA 1 to send, freeform text to customise, x to skip, q for queue. SQLite-persisted at ~/.cheetahclaws/wx_smart_reply.db (in-memory fallback on init failure); contacts JSON at ~/.cheetahclaws/wx_contacts.json is mtime-hot-reloaded; bot-owner self-uid is auto-recorded on first inbound and excluded from smart-reply unconditionally, so your own messages always reach the agent regardless of whitelist contents. (c) /draft <message> slash command — semi-automatic reply suggestion path for cases where the bot can't intercept the inbound directly (bot account ≠ user main account on iLink ClawBot). 3 candidates drafted via the auxiliary model, optionally tone-conditioned via @<contact_uid_or_label> against wx_contacts.json; when invoked from a bridge channel (WeChat / Telegram / Slack), candidates are also echoed back to the originating uid + stashed in bridges.draft_cache so a digit-only reply (1/2/3) consumes the chosen text one-shot, no agent invocation, no smart-reply panel triggered. Reliability hardening on top of #88's MCP work: research/http.py now uses 429-aware backoff (10/30/60/120s vs 0.5/1/2/4s for 5xx) and honours Retry-After headers (capped at 180s); the lab surveyor stage grounds in real research.aggregator.research() hits before invoking the LLM (top-30 academic+tech results passed as context, persisted as survey_search_hits artifact for replay) — fabricated-citation rate drops sharply on tested topics; _dedupe_self_repeat() trims cheap-model degenerate sampling (text == text+text) before storage so reviewer prompts don't see doubled inputs; _extract_numbered dedupes by content (questioner emitting 1..5\n1..5 keeps 5, not 10); the citation verifier now has a per-citation 30s concurrent.futures hard wall-clock (kills slow-loris sockets that urllib's socket-timeout ignores) + a 5-min stage-level cap with progress callbacks surfaced to /lab logs (the 11-min hang we saw in the field is gone). REPL ergonomics: /lab daemon start and /lab start now print the eventual report.md path up front + live-stream stage transitions to the terminal as they happen; /lab status <run_id> shows both new + legacy paths so the user can find old reports too; /config parses JSON-style values (lists, dicts, signed numbers, quoted strings) — /config wechat_smart_reply_whitelist=["wxid_..."] no longer silently saved as a literal string; leading whitespace before / is now stripped before slash-dispatch (so a paste with a stray space still hits the dispatcher, not the agent). Tests: 884 passing (842 unit/integration + 22 e2e), zero regressions; ~80 new pytest cases covering iteration scoring, state reconstruction, backlog atomicity, verifier hard-timeout, slug edge cases, dedupe patterns, self-uid bypass.**

What Changed

07 May 21:32
530dbe7

Choose a tag to compare

  • May 7, 2026 (v3.05.77): MCP HTTP/SSE transport + OAuth 2.0 PKCE, .env loader, ANTHROPIC_ENDPOINT corporate-proxy override, AskUserQuestion UI polish (#88, #89)cc_mcp/client.py now speaks Streamable HTTP (POST → text/event-stream reply) in addition to stdio and pure SSE, with the Accept: application/json, text/event-stream header servers like sap-jira require to stop 406-ing. OAuth 2.0: new cc_mcp/oauth.py implements the full MCP Authorization spec — RFC 9728 resource-server discovery → RFC 8414 AS metadata → RFC 7591 dynamic client registration → Authorization Code + PKCE (S256) flow with browser redirect → automatic refresh-token rotation. Tokens persist atomically to ~/.cheetahclaws/mcp_oauth.json at mode 0600 with the parent directory locked to 0700. The redirect-URI port is picked once and reused for both registration and the local callback server, the OAuth scope is sourced from the AS's advertised scopes_supported (preferring mcp if listed, otherwise the first one, otherwise omitted entirely so servers without an mcp scope no longer reject with invalid_scope), and _ensure_oauth() is guarded by a dedicated lock so concurrent 401-retries can't race on the httpx client rebuild. REPL: /mcp add <name> --transport http <url> and /mcp add <name> --transport sse <url> for one-line HTTP server registration; explicit /mcp list subcommand with full-width tool descriptions wrapped at 72 cols. Server name sanitization: hyphenated names like github-tools now resolve correctly through the mcp__server__tool qualified-name path. .env loader: _load_env() runs at the very top of cheetahclaws.py before any other import reads os.environ, so .env keys are visible to every module without losing existing-shell-var precedence (os.environ.setdefault). MCP HTTP headers values are passed through os.path.expandvars, so "Authorization": "Bearer $GITHUB_TOKEN" works out of the box. ANTHROPIC_ENDPOINT env var (also reachable via .env) overrides the persisted anthropic_endpoint config and is used by both the streaming Anthropic client (providers.py passes base_url=... to anthropic.Anthropic) and the connectivity probes in /doctor / setup wizard, letting corporate proxies swap api.anthropic.com cleanly. UI: AskUserQuestion is auto-approved alongside EnterPlanMode/ExitPlanMode (it's an interactive tool by definition, a permission prompt was redundant), the spinner and result line are suppressed in print_tool_start/end, the question text is rendered through clr() with Markdown stripped (**bold**, `code`, *italic*), and option indices/descriptions are colorized. The REPL prompt now prints a full-width rule via os.get_terminal_size() (80-char fallback) before each input, matching Claude Code's visual rhythm.**
  • May 5, 2026: Telegram bridge file round-trip + cross-channel pickable permission prompts (#84) — bridges/telegram.py previously only had _tg_send (text via sendMessage), so when the model claimed it had "sent a file" it was just text and the [approve][reject] text in permission prompts only looked like buttons. Added _tg_send_document (multipart/form-data upload, 49 MB cap with explicit oversize/empty/missing/network/API-rejection error reporting), an inbound document handler that saves uploads to /workspace (or tempfile.gettempdir() outside Docker) with sanitized filenames and a path-aware prompt, a !sendfile <path> user command for explicit on-demand sends, and an auto-send hook in _bg_runner that mails any file written by the Write tool — FIFO-paired with the in-flight file_path, skipped on Error: / Denied: results, and de-duplicated per turn so parallel writes don't double-mail. Cross-channel permission UX: ask_input_interactive(options=[(label, value), …]) now renders an interactive picker on every bridge — Telegram gets a real inline_keyboard (callback_data="cc:<prompt_id>:<value>", _handle_callback_query does auth + stale-prompt-id drop + answerCallbackQuery + editMessageText "✓ Selected: y"), Slack and WeChat get a numbered menu in the message body (reply with digit / canonical letter / label word — all resolve via _resolve_choice), terminal prints the same numbered menu before the input cursor; ask_permission_interactive passes [(✅ Approve, y), (❌ Reject, n), (✅✅ Accept all, a)]. Backward-compatible: every existing ask_input_interactive call site (no options=) keeps free-text behavior. 49 new pytest cases (tests/test_telegram_bridge.py + tests/test_options_menu.py) — no real network calls. 718 passed, zero regressions on the 669 pre-existing. --accept-all was a red herring; the bridge simply lacked the upload code path.
  • May 2, 2026: Docker chat UI assets 404 follow-up (#73) — web/server.py now resolves _WEB_DIR via importlib.resources.files("web") instead of Path(__file__).parent, so static files are found whether the package is installed editable or non-editable. The dotfile guard in the static-file branch now only inspects path segments inside _WEB_DIR, so installs sitting under .venv/, .local/, etc. no longer 404 every asset. [tool.setuptools.package-data] for web widened to static/**/* so non-editable wheels reliably ship the full web/static/ subtree. Plus a new docs/guides/docker.md "Custom Dockerfile pitfalls" section covering the editable-install requirement and the most common 404 root cause for users rolling their own image.
  • Apr 30, 2026: Docker / home-server support (#73) — Dockerfile, docker-compose.yml, .env.example, host Ollama via host.docker.internal, workspace bind-mount for Samba sharing. --web mode now auto-starts configured Telegram / WeChat / Slack bridges in the same process so a single container delivers browser UI + phone bridge. Plus two terminal/agent fixes: AskUserQuestion no longer deadlocks the terminal (#69) — synchronous render+read instead of a queue/event the agent thread can't drain. messages_to_openai emits content: "" instead of null for tool-only assistant turns so Ollama's OpenAI-compat endpoint stops 400-ing with invalid message content type: <nil>; 400 / BadRequestError reclassified as a non-retryable INVALID_REQUEST so a malformed body no longer trips the circuit breaker (#71).
  • Apr 24, 2026: Support Deepseek V4 models, multi-model prompt adaptation — single shared default.md baseline + tiny per-family overlays (Anthropic XML tags · Gemini 3 explicit Agentic Mode · OpenAI o-series no-narration). Routing is by model family, not provider/runtime — same Qwen prompt whether served via DashScope, Ollama, or OpenRouter. Overlays must cite a vendor prompting guide (≤ 20 lines, enforced by tests). DeepSeek v4 thinking-mode protocol (reasoning_content round-trip + thinking: ON by default). fix(setup-wizard): tolerate api_key_env=None for ollama/lmstudio (#59)

v3.05.76.5: Merge pull request #74 from mxh1999/daemon-design-note

30 Apr 16:40
2899254

Choose a tag to compare

docs: daemon foundation design note (IPC, permission routing, local auth)

What Changed

22 Apr 15:44
00ab4c5

Choose a tag to compare

  • Apr 20, 2026 (v3.05.76): Research pipeline — 20 sources, time-range filter, cross-platform heat table, citations analysis, saved reports, Chinese platforms (B站 · 微博 · 小红书 · 知乎), /monitor trend-tracking, one-click /ssj wizard, entity extraction, multi-query expansion, side-by-side compare
    • /research <topic> — fans out to 20 sources in parallel: arXiv · Semantic Scholar · OpenAlex · HuggingFace Papers · alphaXiv · Google Scholar · HackerNews · GitHub · Reddit · StackOverflow · Google News · Polymarket · SEC EDGAR · Tavily · Brave · Twitter/X · 知乎 · B站 · 微博 · 小红书. 13 sources work zero-config; 7 optional (need keys or cookies).
    • Engagement-weighted ranking — each source's native signal (HN points, GitHub stars, Reddit upvotes, citations, HF upvotes, B站播放, 微博赞, 小红书赞, Twitter likes, Polymarket USD volume) is log-normalized against a per-source calibration to a shared 0-1 scale. Blended with a 14-day-half-life recency bonus. Cross-source dedup by URL keeps the highest-engagement entry on duplicates.
    • Time range filter--range 1d|3d|7d|14d|30d|60d|90d|6m|1y|2y|5y|all (or natural 30days, 6months, 2years) and explicit --since YYYY-MM-DD --until YYYY-MM-DD. Each source translates the window to its native filter: arXiv submittedDate:[...], Semantic Scholar year=LO-HI, OpenAlex from_publication_date:..., HN numericFilters=created_at_i>..., GitHub pushed:>..., Reddit t=hour|day|week|month|year|all, StackOverflow fromdate=/todate=, Google News after:/before:, SEC EDGAR dateRange=custom, Tavily start_published_date, Brave freshness=pd|pw|pm|py, Twitter v2 start_time/end_time, Google Scholar client-side year filter, HuggingFace / Bilibili / Weibo client-side. Polymarket and Zhihu have no date filter API and are documented as exceptions.
    • Cross-platform attention table — every brief renders a Markdown table: per-platform result count · top engagement label · median result age · domain. Skipped/failed sources appear too with clear reasons. The LLM synthesis prompt copies this table verbatim and adds 2-3 sentences comparing attention distribution (academic-heavy vs. social-heavy vs. news-heavy).
    • Publication trend sparkline + 12-month bar chart — a compact Unicode sparkline (▁▂▃▄▅▆▇█) across the last 24 months in the brief header; a full per-month bar chart lower down. Built from ALL dated results across academic/news/social sources, giving a single-glance view of where the buzz has moved.
    • Notable-citer analysis (--citations) — secondary Semantic Scholar calls on top academic results, pulling citing-paper authors and filtering to those with ≥10k total citations (configurable via --citation-threshold). Surfaces a table with name · affiliation · total cites · h-index · which papers they cited. Adds 2-10 API calls per run; recommended to pair with SEMANTIC_SCHOLAR_API_KEY to escape the anonymous 100-req/5-min limit.
    • Entity extraction — offline, zero-LLM pattern-matching that scans every pulled result for frequent named entities across four categories: models (GPT-5, Claude-Opus-5, Llama-4, Gemini-2.5-Pro, GLM-5.1, Qwen-3, DeepSeek-V3, Grok, Mistral, Phi, Yi, Kimi, …), benchmarks (MMLU, MMLU-Pro, GSM8K, MATH, HumanEval, HumanEval+, SWE-bench, LiveCodeBench, MMMU, MathVista, GAIA, AgentBench, WebArena, Arena-Hard, FrontierMath, ARC-AGI, GPQA-Diamond, HLE, C-Eval, CMMLU, RULER, LongBench, …), orgs (OpenAI, Anthropic, Google DeepMind, Meta, xAI, Mistral AI, DeepSeek, Moonshot, Alibaba, Zhipu, Tencent, ByteDance, Hugging Face, NVIDIA, 01.AI, AI2, Mila, Stanford, MIT, Berkeley, CMU, Tsinghua, …), and people (from academic result author fields). Counts dedupe within a single result so one spammy abstract doesn't skew the ranking. Renders as a "Top mentioned entities" section directly beneath the heat table — one glance answers "what's everyone talking about?" without the LLM round-trip.
    • Multi-query expansion (--expand or --expand N) — asks the active model to propose 2-6 sibling subqueries (different angles — theory vs. tooling vs. industry deployment vs. controversy — not paraphrases), then runs each in parallel across all sources with proportionally reduced per-source limits. Results merge into the main pipeline (dedup + rank + synth). Example: /research --expand "frontier LLM benchmarks" auto-expands to LLM evaluation methodology, benchmark saturation and contamination, capability measurement frontier models, human preference benchmarks evaluation. Coverage jumps several-fold for broad topics.
    • Side-by-side compare/research compare "topic A" vs "topic B" [vs "topic C"] runs 2 or 3 independent research queries in parallel and produces a unified comparative brief: verdict at a glance · side-by-side heat tables · shared themes · unique strengths per topic · open questions. Citations use prefixed [A-N] / [B-N] / [C-N] markers so readers can trace every claim back to the right topic's evidence pool. Falls back to a deterministic no-LLM rendering with all three heat tables + entity tables when no model is configured.
    • Auto-save to ~/.cheetahclaws/research_reports/ — every /research and /research compare run writes two files: <YYYY-MM-DD_HHMMSS>-<slug>.md (rendered brief) + .json sidecar (serialized Brief + notable citers + entities). Opt out with --no-save. Explicit export via --save-as PATH. New /reports command: list (50 most recent) · open <id> (print) · path <id> (print file path) · delete <id>.
    • Weekly trend tracking via /monitor — new topic prefix research:<query> (or research:<range>:<query> — e.g. research:30d:RLHF) dispatches to the full 20-source pipeline each scheduled run. Supports daily/weekly/12h/... schedules and --telegram/--slack/console channels. Each invocation: pulls all 20 sources · filters by the subscription's time window · renders the cross-platform heat table + sparkline as the first digest item · writes a full report · pushes to configured channels. Subscribe via /subscribe research:<topic> weekly or the /monitor wizard's new "Trend tracker" option.
    • /ssj wizard integration — 3 new menu items for zero-flag operation:
      • 16. 🔍 Research — asks topic → time range (1-5) → citations y/N → runs /research with right flags
      • 17. 📊 Trend Track — asks topic → tracking window → frequency → creates the /subscribe research:<range>:<topic> subscription
      • 18. 📁 Reports — opens /reports browser
    • Chinese platform sources (4 of them):
      • Bilibili (B站) — zero-config search-all endpoint; returns video + article results with 播放/点赞/弹幕/评论 engagement. [video · 11:55] 彻底搞懂 Transformer · 54,209 播放 · 2,430 赞 · 78 弹幕.
      • 知乎 Zhihu — v4 search_v3 API, requires ZHIHU_COOKIE (browser-extracted d_c0; z_c0); returns answers / articles / questions with 赞/评论/关注 engagement.
      • 微博 Weibo — m.weibo.cn getIndex endpoint, requires WEIBO_COOKIE (browser-extracted SUB; SUBP); returns posts with 赞/转/评 engagement. Parses relative Chinese time forms (刚刚, 5分钟前, 2小时前, 今天 HH:MM, MM-DD).
      • 小红书 Xiaohongshu — edith.xiaohongshu.com notes search, requires XHS_COOKIE (+ often XHS_X_S); returns notes with 赞/评/收藏 engagement. Note: Xiaohongshu anti-bot is aggressive; cookies may expire hourly. Fallback: use --sources tavily with <query> site:xiaohongshu.com.
    • Architecture:
      • research/ package: __init__.py, types.py, time_range.py, http.py, cache.py (24h SQLite at ~/.cheetahclaws/research_cache.db, keyed on source + query + limit + time range), classifier.py (keyword-based topic→domain routing, zero latency, zero LLM), ranker.py, aggregator.py, synthesizer.py, citations.py, entities.py, reports.py, sources/ (20 modules).
      • tools/research.py: exposes Research tool to agent (13 parameters: topic, domains, sources, limit, time_range, since, until, analyze_citations, citation_threshold, expand, save_as, auto_save, synthesize, use_cache).
      • commands/research_cmd.py: /research (with compare subcommand) and /reports.
      • monitor/fetchers.py: fetch_research() bridges /monitor subscriptions to the research pipeline.
      • commands/advanced.py: SSJ menu entries 16/17/18 delegate to the right /research / /subscribe / /reports command line.
    • Tests (tests/test_research.py) — 88 tests across 23 sections covering: types, classifier routing, engagement ranker, cross-source dedup, SQLite cache roundtrip + TTL expiry, each of the 20 sources (happy path + schema-shift resilience + missing-key skip behavior), aggregator parallel fan-out + failure isolation + cache integration, synthesizer LLM path + deterministic no-LLM fallback, heat table + sparkline + trend rendering, citations helper, time-range preset + ISO parsing + per-source native mapping, reports save/load/delete/path, Chinese platform parsing (including Zhihu answer/article/question shapes, Weibo relative-date parser, Xiaohongshu localized count parsing), monitor research: prefix dispatch + range-prefix form, entity extraction across all four categories + dedup-within-result guarantee, multi-query expansion producing distinct cache keys, compare mode running 2-3 parallel queries + correct prefixed citation markers.
    • Packagingpyproject.toml adds research and research.sources to the editable packages list so installed binaries can import the new module.
    • Version bumped to 3.05.76.

What Changed

21 Apr 03:12

Choose a tag to compare

  • Apr 18, 2026 (v3.05.75): External plugin discovery via CHEETAHCLAWS_PLUGIN_PATH + safer dependency management; end-to-end prompt-cache token tracking across providers
    • PluginScope.EXTERNAL — new scope for plugins discovered in-place (never copied to ~/.cheetahclaws/plugins/). Complements existing USER and PROJECT scopes. Use case: shared team/company plugin directories mounted at a common path.

    • CHEETAHCLAWS_PLUGIN_PATH env var — colon-separated (os.pathsep) list of directories scanned for plugin subdirs. Each immediate subdirectory that has a plugin.json or PLUGIN.md is surfaced as an external plugin. No new manifest format — reuses the existing PluginManifest.from_plugin_dir() loader. Missing or empty path segments are ignored; hidden directories (.git, .DS_Store, etc.) are skipped.

    • Default disabled — external plugins land in /plugin list as [external] disabled. User must run /plugin enable <name> once to activate. Enable state persists to ~/.cheetahclaws/plugins.json under a new external_enabled: {name: bool} map, so it survives restarts without the plugin being installed.

    • No silent pip install — unlike the original proposal in #49, cheetahclaws never installs plugin dependencies from an import-failure fallback. Dependency installation happens only at explicit user-consent points: /plugin install (existing flow), or the first /plugin enable of an external plugin that declares dependencies. The model cannot trick the runtime into mutating the Python environment.

    • Dependency check uses importlib.metadata.distribution() — new _missing_dependencies(deps) helper keys off the PyPI distribution name, not find_spec(name). This fixes the PyPI-vs-import-name trap that breaks common packages: Pillow (imports as PIL), PyYAML (imports as yaml), opencv-python (cv2), scikit-learn (sklearn), beautifulsoup4 (bs4). The old find_spec("pillow") approach returned None for installed Pillow and would loop-install forever.

    • Safety guardsuninstall_plugin on an EXTERNAL entry only drops the enable-state record; it never shutil.rmtrees the user's source directory. update_plugin refuses external plugins with "update the source directory directly" instead of attempting git pull. Malformed plugin.json files are logged to stderr and skipped, so one bad manifest can't crash /plugin list.

    • Dedupe on name collision — if a plugin name exists in both installed (USER/PROJECT) and external scopes, the installed entry wins. Within external scopes, the earliest directory in CHEETAHCLAWS_PLUGIN_PATH wins (consistent with $PATH semantics).

    • Tests (tests/test_plugin_external.py) — 16 tests covering: env var parsing with empty/nonexistent segments, plugin.json and PLUGIN.md discovery, hidden-directory skip, malformed-JSON resilience, path-order priority, installed-shadows-external dedupe, enable/disable persistence round-trip, PEP 508 requirement parsing (package[extra]>=1.0package), and a regression test for the PyPI-vs-import-name bug.

    • New public exportfrom plugin import PLUGIN_PATH_ENV gives the env var name for use in tooling/docs.

    • Not changed: existing USER/PROJECT install flow, plugin.json/PLUGIN.md manifest format, /plugin command subcommands. Fully backward compatible — unset CHEETAHCLAWS_PLUGIN_PATH and the system behaves exactly as before.

    • Fix (tool-history integrity for OpenAI-compatible providers) — resolves #57: after long sessions, DeepSeek (and other OpenAI-compatible endpoints) started rejecting requests with "Messages with role 'tool' must be a response to a preceding message with 'tool_calls'" (HTTP 400), only recoverable by rebooting which lost all context. Root cause: compaction.find_split_point() chose a split index by token count alone, so a split could land between an assistant(tool_calls) message and its tool response messages, leaving orphaned tool entries in the kept half. Three-layer defense:

      • compaction._respect_tool_pairs(messages, split) — post-processes the split index: if the last message in the old half is an assistant with tool_calls, advances the split forward past all consecutive tool responses; also skips any standalone tool message the split would land on. Falls back to returning 0 (skip compaction this turn) if no safe split exists — the threshold will re-trigger next turn.
      • compaction.sanitize_history(messages) — single-pass O(n) invariant enforcer. Tracks pending tool_call_ids from the most recent assistant(tool_calls) in a rolling set; drops any tool message whose tool_call_id is not in the set (orphan), and strips unanswered tool_calls entries from assistant messages when a non-tool message intervenes. If all tool_calls on an assistant are stripped, the tool_calls key is removed entirely and content is normalized to a non-null string (required by the OpenAI schema). Does not mutate input.
      • agent.run() — calls sanitize_history after every maybe_compact and before each stream() call. Any divergence (from compaction, crashed tool execution, checkpoint restore, or future code paths) is caught before it reaches the provider; emits a history_sanitized warn-log with the number of messages removed so regressions are visible.
      • Why three layers instead of one: the split-point fix prevents the primary source of orphans; the sanitizer is a defense-in-depth net that keeps the invariant regardless of where history corruption originates; the agent-loop wiring ensures the net is actually applied. No user-visible behavior change on well-formed histories — test_well_formed_history_unchanged pins this.
      • Tests (tests/test_compaction.py) — 15 new tests across three classes (TestFindSplitPoint.test_split_never_splits_tool_pair, TestRespectToolPairs × 4, TestSanitizeHistory × 7) covering split-boundary edge cases (split at every ratio from 0.2 to 0.5, multi-tool-call blocks, standalone orphan tool at split), sanitizer correctness (well-formed history unchanged, orphan drop, partial and full unanswered-tool_calls stripping, unanswered at end of list, wrong tool_call_id drop), and an input-immutability guarantee.
    • End-to-end prompt-cache token tracking (closes #43) — cache hit/miss counters now flow from provider → AgentState → checkpoint snapshots across every supported provider family. Two new default-0 fields cache_read_tokens / cache_write_tokens on AssistantTurn; AgentState.total_cache_read_tokens / total_cache_write_tokens accumulate via getattr(..., 0) so providers that never set the fields still work. Extraction centralized into two helpers in providers.py: _anthropic_cache_tokens(usage) reads cache_read_input_tokens + cache_creation_input_tokens; _openai_cached_read_tokens(usage) walks prompt_tokens_details.cached_tokens. Both coerce missing / None to 0 — older SDKs, non-cached calls, Bedrock-over-litellm wrappers all fall through instead of raising AttributeError. Provider coverage:

      Family Cache read Cache write Mechanism
      Anthropic (stream_anthropic) Both fields on final.usage when prompt-caching beta is active
      OpenAI-schema (stream_openai_compat — OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, Groq, xAI, any compatible endpoint) 0 (by design) OpenAI's schema has no separate "cache creation" counter; caching is implicit on their side
      Ollama (stream_ollama) 0 0 No prompt-caching in Ollama today
      Any future / custom provider 0 (default) 0 (default) getattr(event, "cache_read_tokens", 0) no-op fallback

      Persistence: checkpoint/store.make_snapshot writes token_snapshot["cache_read"] / ["cache_write"]; /checkpoint <id> (and /rewind) restores them alongside input/output totals so counters stay in lock-step with whatever snapshot the user rewound to. Structured logging: api_call_done records now include cache_read_tokens / cache_write_tokens alongside in_tokens / out_tokens. Note: not yet surfaced in /cost or /status output — the tracking layer landed first, a follow-up will expose it in the user-facing commands.

    • Tests (tests/test_cache_tokens.py) — 14 tests across 5 layers: AssistantTurn field defaults + explicit values; AgentState accumulation across increments; real make_snapshot on tmp_path with all four token fields; Anthropic + OpenAI extraction helpers against synthetic usage objects (populated / missing / None); end-to-end agent.run with a scripted stream — single-turn propagation and multi-turn accumulation; plus a test_rewind_restores_cache_tokens_from_snapshot regression test that asserts the round-trip. tests/e2e_checkpoint.py updated to keep the scripted rewind path in sync with production code.

    • Version bumped to 3.05.75.

What Changed

17 Apr 03:27
cf696e2

Choose a tag to compare

  • Apr 16, 2026 (v3.05.74): Web UI production hardening — persistence, multi-user auth, ops endpoints, JS module split, pytest suite

    • SQLite persistence (web/db.py, web/models.py) — SQLAlchemy-backed store with 4 tables: users, chat_sessions, messages, api_credentials. Sessions + message history now survive server restarts (previously in-memory only, lost on restart). DB file at ~/.cheetahclaws/web.db (0600). Config key CHEETAHCLAWS_WEB_DB overrides the path.
    • Multi-user auth (web/auth.py) — replaced single generated password with full accounts: bcrypt password hashing (passlib) + stateless JWT cookies (PyJWT, HS256, 7-day TTL). JWT signing secret persisted to ~/.cheetahclaws/web_secret (0600) so logins survive restarts. New endpoints: POST /api/auth/register (first user becomes admin), POST /api/auth/login, POST /api/auth/logout, GET /api/auth/whoami, GET /api/auth/bootstrap (first-run routing). Legacy POST /api/auth kept for the terminal password page.
    • Session CRUD — new PATCH /api/sessions/{id} to rename, DELETE /api/sessions/{id} to remove, GET /api/sessions/{id}/export to download conversation as Markdown. Auto-titling from first user message. Cross-user isolation enforced even on in-memory cache hits (one session hit patched after smoke test revealed the leak).
    • Structured JSON logging (web/logging_setup.py) — logging + custom JSON formatter emits one record per line to stderr, e.g. {"ts":..., "level":"info", "logger":"web.server", "msg":"req", "method":"POST", "path":"/api/auth/login", "status":200, "dur_ms":259, "user_id":1}. Every HTTP response auto-logs method/path/status/dur_ms/user_id/peer. Level controlled by CHEETAHCLAWS_LOG_LEVEL env (default INFO).
    • Ops endpointsGET /health returns {ok, db, uptime_s} (503 if DB unreachable); GET /metrics returns Prometheus v0.0.4 text with cheetahclaws_{uptime_seconds, requests_total, requests_4xx, requests_5xx, auth_logins_total, auth_logins_failed, auth_registrations_total, users_total, ws_connections_total}. Unauthenticated so Prometheus/k8s probes can hit them.
    • JS module split (web/static/js/) — monolithic 1813-line chat.html → 552 lines of HTML + 9 vanilla JS modules (chat.js core class, util.js, auth.js, sidebar.js, tools.js, approval.js, settings.js, welcome.js, init.js) loaded via plain <script src> tags. Prototype-mixin pattern (Object.assign(ChatApp.prototype, {...})) keeps app.foo() call sites unchanged. No bundler, no build step.
    • ETag + conditional caching — JS/CSS/HTML served with Cache-Control: no-cache, must-revalidate + weak ETag (mtime-size). Browser gets 304 when unchanged, fresh content after any edit. Binary assets keep 24h cache. Path traversal blocked by resolved-path is_relative_to check.
    • pytest suite (tests/test_web_api.py) — 21 end-to-end HTTP tests using httpx: bootstrap/register/login/whoami/logout, sessions CRUD + export + markdown, cross-user isolation, persistence after cache clear, /health, /metrics counter deltas, CORS preflight, auth gating of every endpoint. Spins the real server in a thread on a random port, DB truncated between tests. Runs in ~5s. pytest tests/test_web_api.py.
    • Sidebar UX — chat sessions now show title + relative time ("just now", "12m ago", "3d ago") + message count + busy dot. Search box filters by title/id on the client. Right-click (or long-press) gives a context menu: Rename / Export Markdown / Delete. Footer shows current username + Sign out link.
    • Register-or-login on first visit — chat UI now calls /api/auth/bootstrap on load; if no user exists it shows a "Create your first account" form (first registration becomes admin), otherwise the "Sign in" form. Username + password instead of a single server-generated password.
    • Theme: light default + system auto:root now carries the light palette; @media (prefers-color-scheme: dark) swaps in the dark palette when the user hasn't explicitly chosen a theme. Toggle button cycles system → light → dark → system, icon reflects the effective theme, title tooltip spells out the current mode. Inline pre-paint script in <head> sets data-theme before first paint to avoid FOUC.
    • Auto port selectioncheetahclaws --web (no --port) now tries 8080 first; on EADDRINUSE it binds :0 and lets the kernel pick a free port, banner reports the real URL. Explicit --port N binds exactly N or fails loudly (user intent preserved). --port argparse default changed from 8080None as a sentinel.
    • Favicon + MIME polishweb/static/favicon.{png,ico} cropped from docs/logo-5.png (leaping cheetah, transparent background, multi-size ICO 16/32/48). Served from root as /favicon.ico for browser defaults. MIME table extended with .ico (image/vnd.microsoft.icon), .svg, .jpg, .woff, .woff2.
    • Welcome dashboard rebalanced — old 5-card "Bridges & Media" row (ragged in 2×2 grid) split into two 4-card sections: Bridges (Telegram · WeChat · Slack · Monitor) and Multi-Modal Media (Voice Input · Vision · Copy Output · Export). /cwd added to Development Tools. Tagline changed to "Personal AI Assistant · Support Any Model · Autonomous 24/7".
    • Bridges commands in Chat UI/telegram, /wechat (+/weixin alias), /slack, /voice now registered in web/api.py's slash registry (previously only the terminal REPL had them), so clicking the dashboard cards actually runs the command.
    • New extraspip install 'cheetahclaws[web]' installs sqlalchemy>=2.0, passlib[bcrypt]>=1.7.4, PyJWT>=2.8.0. CLI-only installs remain dependency-free. [all] extra updated. Add web ui demos.
    • Version bumped to 3.05.74.
  • Apr 16, 2026 (v3.05.73): Web UI — browser-based Chat UI + structured event API

    • Web Chat UI (web/chat.html) — cheetahclaws --web now serves a rich browser-based chat interface at /chat alongside the existing PTY terminal at /. Features: real-time streaming via Server-Sent Events (SSE), collapsible tool cards with status badges, inline permission approval buttons (Allow/Deny), activity indicator (spinner + state labels for Thinking/Running/Processing), Markdown rendering with XSS sanitization (marked.js bundled), dark/light theme toggle with localStorage persistence, mobile-responsive layout with sidebar overlay.
    • Structured event API (web/api.py) — new ChatSession class bridges agent.run() generator to WebSocket/SSE event streams following the same pattern as the Telegram/Slack/WeChat bridges. Events: text_chunk, thinking_chunk, tool_start, tool_end, permission_request, permission_response, turn_done, command_result, interactive_menu, input_request, status, error. Event buffer with replay for late-joining subscribers.
    • 8 new API endpointsPOST /api/prompt (submit prompt or slash command), WS /api/events (real-time event stream), POST /api/approve (permission response), GET /api/sessions (list sessions), GET /api/sessions/{id} (session details + message history), GET/PATCH /api/config (read/write config), GET /api/models (list all 11 providers and models), POST /api/auth (login, sets HttpOnly cookie).
    • Settings panel — click ⚙ to open: model selector grouped by 11 providers (Anthropic, OpenAI, Gemini, Ollama, DeepSeek, Qwen, etc.), permission mode dropdown, thinking/verbose toggles, max tokens input, per-provider API key management with status indicators, quick action buttons (Compact/Status/Cost/Context), terminal link for fallback.
    • Slash command support in Chat UI — all 45+ commands work. Quick commands (/status, /help, /model, /context) return results instantly via POST response. Long-running commands (/brainstorm, /worker, /plan, /agent) stream events in real-time via SSE (server keeps HTTP connection open). /ssj renders a clickable 12-item interactive menu. /brainstorm (no args) shows a topic input box before starting.
    • SSJ sub-commands/ssj debate, /ssj commit, /ssj readme, /ssj scan, /ssj propose, /ssj review now run directly as agent queries without showing the interactive menu. The menu only appears for /ssj (no args).
    • Feature dashboard — welcome page shows 24 feature cards organized in 6 categories (Core, Agent Features, Session & Memory, Multi-Model, Development Tools, Bridges & Media) with 7 clickable quick-command chips.
    • Security hardeninghmac.compare_digest() for timing-safe token comparison, XSS sanitization (HTML tags escaped before Markdown rendering), CORS restricted to request Origin echo (no wildcard), HttpOnly + SameSite=Strict cookies, auth checked before WebSocket upgrade, _BufferedSocket wrapper replaces fragile sock.recv monkey-patching.
    • Session management — chat sessions with idle timeout (30 min), background reaper for orphaned sessions, session list in sidebar with message count and busy indicator, click to switch, "+" to create new.
    • Web bridge integrationRuntimeContext extended with web_input_event, web_input_value, in_web_turn fields. tools/interaction.py routes permission prompts to web bridge via threading.Event synchronization. commands/advanced.py detects web turns and skips interactive prompts (uses defaults like Telegram bridge).
    • Thread-safe stdout streaming_ThreadLocalStdout intercepts print() only from the target command thread, broadcasts as text_chunk events. Other threads unaffected.
    • pyproject.toml packagingweb package added to packages list, *.js, *.css, *.html added to package-data. Static assets (xterm.min.js, marked.min.js, chat.html) correctly included in pip install distributions.
    • Docs — new Web UI Guide (304 lines): quick start, full feature list, s...
Read more