You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
June 5, 2026 (v3.05.82) (latest): User-controllable token / cost budgets — set a spend cap; on hit the session auto-saves and you can resume or raise it. The quota engine (quota.py: per-session + per-day token/cost counters, enforced before each model call) already existed but had no friendly surface — you had to know four config keys (session_token_budget / session_cost_budget / daily_token_budget / daily_cost_budget) and there was no way to see how close you were, no warning before the wall, and the hard stop printed a bare [Quota exceeded]. This adds the UX layer on top of the unchanged engine: a /budget command — no args shows usage vs every budget as colored bars + percentages; /budget $5 sets a session cost cap (the $ means USD), /budget 200k a session token cap (parses 200k / 1.5m / 200000), /budget daily $20 / /budget daily 2m the daily caps, and /budget clear removes all. A --budget $5 / --budget 200k startup flag sets the session cap at launch. Proximity warnings fire at the end of any turn that crosses ≥80% (yellow) / ≥95% (red) of a cap, so the wall never arrives by surprise. On hit the agent now yields a QuotaPause event (instead of a plain text line): the REPL auto-saves the session (session_latest.json + daily backup, the same path /resume reads) and prints a friendly next-steps block — raise the same cap or remove it (/budget clear) then resend, or restart later and /resume. So a long task that runs out of budget is never lost: you analyze, adjust, and continue. Tight enforcement (no surprise overshoot): the check projects the next request's input (compaction.estimate_tokens) and stops before the call if it would cross the cap, and clamps that call's max_tokens to the remaining headroom (quota.output_room) — so a single tool-heavy turn can't blow 40k→49k past the budget the way a pure "already-spent ≥ limit" check let it. One budget per scope: setting a cap replaces the other unit for that scope (/budget $5 after /budget 200k switches the session cap to cost rather than stacking), so a leftover token cap can't silently keep blocking after you switch to a $ cap. Unit-matched hint:QuotaExceeded / QuotaPause carry which cap broke (key/scope/unit/limit), so the "raise it" suggestion is in the right unit — a token cap shows /budget 40k, a daily cost cap shows /budget daily $40 — instead of a generic $ amount that wouldn't lift a token cap. New helpers quota.parse_budget / fmt_amount / usage_vs_limits / warnings / output_room; command in commands/core.py:cmd_budget; QuotaPause in agent.py; REPL handling + --budget in cheetahclaws.py; 42-case tests/test_budget.py (isolated quota dir, incl. a regression that the hint matches the breached unit and that switching units clears the stale cap). The daemon's conservative serve-mode defaults (200k tok / $2 per session, 2M / $20 per day) are unchanged — interactive stays unlimited by default, the server stays guard-railed. See docs/guides/features.md · docs/guides/reference.md.
June 5, 2026 (v3.05.82): Adaptive Markdown streaming — live output that stays correct on every device. In-place Rich Live redraw is great on capable terminals but breaks elsewhere: it was disabled wholesale over SSH (so SSH users got raw tokens with no formatting), and where it did run it could leave duplicate or stale frames — on macOS Terminal (which can't erase above the scroll boundary), over laggy network PTYs, or with wide CJK / emoji text whose display width a naive line-count gets wrong. The renderer now selects a streaming tier per device in ui.render.auto_stream_mode(config): live — full in-place redraw, only on terminals known to handle cursor-up (local TTYs, and modern emulators even over SSH: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty, detected via TERM_PROGRAM / TERM / WT_SESSION / KITTY_WINDOW_ID / ALACRITTY_WINDOW_ID / WEZTERM_PANE); commit — append-only progressive Markdown, the safe default for unknown-SSH / Apple Terminal / pipes / non-TTY, where each completed block (split on blank lines, respecting open code fences so a fenced block renders atomically) is rendered and printed permanently and the cursor is never moved, making a duplicate frame structurally impossible regardless of terminal, latency, or character width; plain — raw tokens, only when rich is unavailable. The append-only floor is provably duplication-free; live is progressive enhancement on top. Override with /config stream_mode=live|commit|plain (legacy boolean /config rich_live=true|false still works → live/commit). Implemented in ui/render.py (set_stream_mode / auto_stream_mode / _safe_commit_point / _commit_stream / _commit_flush), wired in at REPL start in cheetahclaws.py, with a 26-case test suite in tests/test_stream_modes.py (device routing, code-fence-aware block boundaries, append-only commit, and a regression asserting commit mode emits zero cursor sequences even on a TTY with CJK text). Two related UX items shipped alongside: /context is now a visual grid — a Claude-Code-style 20×10 cell grid of context-window usage, colored and broken down by category (system prompt / system tools / memory files / skills / messages / free space) with per-category token counts and percentages, adapting to the model's real context window and falling back to #/. on non-UTF-8 terminals (commands/core.py:cmd_context); and deepseek-v4-flash is registered at its 1M context window in providers._MODEL_CONTEXT_LIMITS (overriding the 128K deepseek provider default, which still applies to deepseek-chat / deepseek-v4-pro), so the prompt %, /context, and the compaction trigger all reflect the true 1M window. See docs/guides/features.md · docs/guides/reference.md.