Skip to content

What Changed

Choose a tag to compare

@chauncygu chauncygu released this 05 Jun 05:59
· 8 commits to main since this release
cb282bf
  • June 4, 2026 (v3.05.81) (latest): Claude-Code-style quiet output — hide tool execution, show one summary line per turn. Long analysis turns used to scroll the terminal with a ⚙ Bash(...) line and a ✓ → N lines (… chars) line for every tool call, and the permission prompt dumped the entire inline script (e.g. a 60-line python3 << 'PYEOF' heredoc). A new quiet mode (on by default) suppresses the per-tool lines — the spinner conveys live activity and a single summary line is emitted at the tool→text boundary, sitting just above the reply (Read 2 files, ran 3 shell commands), the way Claude Code does. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the permission prompt also collapses a multi-line command to one line (Run: python3 << 'PYEOF' … (+59 行)) instead of printing the whole script. /verbose overrides quiet (full per-tool lines + inputs + token counts); toggle with /quiet, or launch with --show-tools (alias --no-quiet). The startup banner gains an Output: quiet / Output: full line so the active mode is visible at a glance. Live status line: the spinner now shows elapsed time plus a running output-token estimate (Thinking… (7s · ↓ 435 tokens)) — char-based, since providers only report real usage at the end — and each quiet turn closes with a real-usage footer ✻ Worked for 7.2s · ↑ 1.2k · ↓ 435 built from the true TurnDone counts. Implemented in ui/render.py (turn-level tool accumulator + turn_summary_line(), spinner token meter, print_turn_stats()), wired through the REPL event loop in cheetahclaws.py, with the /quiet toggle in commands/config_cmd.py. See docs/guides/features.md.

  • June 4, 2026: Context-window override — the prompt % and compaction now follow a settable context length. The prompt's context-usage % (and the compaction trigger) derive from the model's context window, which previously could only be a hardcoded provider default — and max_tokens (the OUTPUT cap) doesn't change it, so /config max_tokens=… left the % unchanged (a common point of confusion). New per-session key context_window (/config context_window=<N>, 0 = model default) overrides it, kept deliberately distinct from max_tokens. A single parser (providers.context_window_override) feeds the prompt %, /context, the compaction trigger, and the per-call output-token cap, so all four stay consistent; it is bidirectional — a smaller value forces earlier compaction, a larger value corrects a stale default. The value is read live each prompt, so switching model or context_window updates the % with no restart. /config warns when the value exceeds the model's real window (which would disable compaction and let the API reject oversized prompts). No-op when unset, so existing behavior is unchanged. See docs/guides/reference.md.

  • June 4, 2026: Rich Live streaming — long responses stay live via a bounded tail window. Large streamed responses that would overflow the terminal's redraw area could leave duplicate or stale frames behind on some emulators (macOS Terminal, etc.), because Rich Live redraws the whole accumulated output in place and the cursor can't reach content that has scrolled into the scrollback. Building on the per-response fallback from PR #133, Rich Live now keeps the live region bounded to the viewport: a short response is shown in full, but once it would overflow, only the last screenful of rendered lines (a tail window) is redrawn — so the Live region can never exceed the terminal and cannot leave stale frames. The complete output is committed once when the response finishes (including on Ctrl-C, since the REPL flushes on interrupt), so the head that scrolled out of the window is never lost. Plain streaming is kept only as a safety net (precise render failed, or the terminal is too small to bound a window). A cheap per-line wrap estimate short-circuits the expensive full render_lines() measurement while a response stays well under the limit, so normal responses pay no extra Markdown re-render per chunk. Adds focused tests covering full-frame streaming, the full→tail transition, tail-window commit-on-flush, real Segments rendering, and both safety-net fallbacks. See docs/guides/features.md.

  • May 31, 2026: QQ bot bridge — /qq connects cheetahclaws to QQ groups + C2C private chats (PR #121). Uses the official qq-botpy WebSocket + HTTP SDK (pip install "cheetahclaws[qq]"). botpy's async client runs on a dedicated asyncio event loop inside a daemon thread, bridged to the synchronous main thread via thread-safe queues. Handles on_group_at_message_create (group @-mentions, prefix stripped) and on_c2c_message_create (private). Since QQ has no message-edit API, replies stream as new messages every ~2 s (2000-char chunking) instead of updating a placeholder; passive replies reference the original msg_id/event_id within QQ's 5-minute window, then fall back to active pushes. Per-target FIFO job queues, slash-command passthrough, !jobs/!retry/!cancel remote control, image input, and permission prompts scoped to the originating chat (no cross-chat approvals). A supervisor reconnects with exponential backoff (2 s → 120 s). Secret handling matches the hardening standard below: $QQ_SECRET (recommended) > REPL arg (deprecated, warns + scrubs history) > config; env-supplied secrets never touch ~/.cheetahclaws/config.json. /qq <appid>, /qq, /qq stop|status|logout. Two follow-up fixes over the original PR: image downloads moved off the event loop into loop.run_in_executor (a blocking urlopen would freeze the WebSocket heartbeat for up to 30 s), and the secret no longer gets written to disk unconditionally. See docs/guides/bridges.md.