Skip to content

What Changed

Choose a tag to compare

@chauncygu chauncygu released this 21 Apr 03:12
· 236 commits to main since this release
  • Apr 18, 2026 (v3.05.75): External plugin discovery via CHEETAHCLAWS_PLUGIN_PATH + safer dependency management; end-to-end prompt-cache token tracking across providers
    • PluginScope.EXTERNAL — new scope for plugins discovered in-place (never copied to ~/.cheetahclaws/plugins/). Complements existing USER and PROJECT scopes. Use case: shared team/company plugin directories mounted at a common path.

    • CHEETAHCLAWS_PLUGIN_PATH env var — colon-separated (os.pathsep) list of directories scanned for plugin subdirs. Each immediate subdirectory that has a plugin.json or PLUGIN.md is surfaced as an external plugin. No new manifest format — reuses the existing PluginManifest.from_plugin_dir() loader. Missing or empty path segments are ignored; hidden directories (.git, .DS_Store, etc.) are skipped.

    • Default disabled — external plugins land in /plugin list as [external] disabled. User must run /plugin enable <name> once to activate. Enable state persists to ~/.cheetahclaws/plugins.json under a new external_enabled: {name: bool} map, so it survives restarts without the plugin being installed.

    • No silent pip install — unlike the original proposal in #49, cheetahclaws never installs plugin dependencies from an import-failure fallback. Dependency installation happens only at explicit user-consent points: /plugin install (existing flow), or the first /plugin enable of an external plugin that declares dependencies. The model cannot trick the runtime into mutating the Python environment.

    • Dependency check uses importlib.metadata.distribution() — new _missing_dependencies(deps) helper keys off the PyPI distribution name, not find_spec(name). This fixes the PyPI-vs-import-name trap that breaks common packages: Pillow (imports as PIL), PyYAML (imports as yaml), opencv-python (cv2), scikit-learn (sklearn), beautifulsoup4 (bs4). The old find_spec("pillow") approach returned None for installed Pillow and would loop-install forever.

    • Safety guardsuninstall_plugin on an EXTERNAL entry only drops the enable-state record; it never shutil.rmtrees the user's source directory. update_plugin refuses external plugins with "update the source directory directly" instead of attempting git pull. Malformed plugin.json files are logged to stderr and skipped, so one bad manifest can't crash /plugin list.

    • Dedupe on name collision — if a plugin name exists in both installed (USER/PROJECT) and external scopes, the installed entry wins. Within external scopes, the earliest directory in CHEETAHCLAWS_PLUGIN_PATH wins (consistent with $PATH semantics).

    • Tests (tests/test_plugin_external.py) — 16 tests covering: env var parsing with empty/nonexistent segments, plugin.json and PLUGIN.md discovery, hidden-directory skip, malformed-JSON resilience, path-order priority, installed-shadows-external dedupe, enable/disable persistence round-trip, PEP 508 requirement parsing (package[extra]>=1.0package), and a regression test for the PyPI-vs-import-name bug.

    • New public exportfrom plugin import PLUGIN_PATH_ENV gives the env var name for use in tooling/docs.

    • Not changed: existing USER/PROJECT install flow, plugin.json/PLUGIN.md manifest format, /plugin command subcommands. Fully backward compatible — unset CHEETAHCLAWS_PLUGIN_PATH and the system behaves exactly as before.

    • Fix (tool-history integrity for OpenAI-compatible providers) — resolves #57: after long sessions, DeepSeek (and other OpenAI-compatible endpoints) started rejecting requests with "Messages with role 'tool' must be a response to a preceding message with 'tool_calls'" (HTTP 400), only recoverable by rebooting which lost all context. Root cause: compaction.find_split_point() chose a split index by token count alone, so a split could land between an assistant(tool_calls) message and its tool response messages, leaving orphaned tool entries in the kept half. Three-layer defense:

      • compaction._respect_tool_pairs(messages, split) — post-processes the split index: if the last message in the old half is an assistant with tool_calls, advances the split forward past all consecutive tool responses; also skips any standalone tool message the split would land on. Falls back to returning 0 (skip compaction this turn) if no safe split exists — the threshold will re-trigger next turn.
      • compaction.sanitize_history(messages) — single-pass O(n) invariant enforcer. Tracks pending tool_call_ids from the most recent assistant(tool_calls) in a rolling set; drops any tool message whose tool_call_id is not in the set (orphan), and strips unanswered tool_calls entries from assistant messages when a non-tool message intervenes. If all tool_calls on an assistant are stripped, the tool_calls key is removed entirely and content is normalized to a non-null string (required by the OpenAI schema). Does not mutate input.
      • agent.run() — calls sanitize_history after every maybe_compact and before each stream() call. Any divergence (from compaction, crashed tool execution, checkpoint restore, or future code paths) is caught before it reaches the provider; emits a history_sanitized warn-log with the number of messages removed so regressions are visible.
      • Why three layers instead of one: the split-point fix prevents the primary source of orphans; the sanitizer is a defense-in-depth net that keeps the invariant regardless of where history corruption originates; the agent-loop wiring ensures the net is actually applied. No user-visible behavior change on well-formed histories — test_well_formed_history_unchanged pins this.
      • Tests (tests/test_compaction.py) — 15 new tests across three classes (TestFindSplitPoint.test_split_never_splits_tool_pair, TestRespectToolPairs × 4, TestSanitizeHistory × 7) covering split-boundary edge cases (split at every ratio from 0.2 to 0.5, multi-tool-call blocks, standalone orphan tool at split), sanitizer correctness (well-formed history unchanged, orphan drop, partial and full unanswered-tool_calls stripping, unanswered at end of list, wrong tool_call_id drop), and an input-immutability guarantee.
    • End-to-end prompt-cache token tracking (closes #43) — cache hit/miss counters now flow from provider → AgentState → checkpoint snapshots across every supported provider family. Two new default-0 fields cache_read_tokens / cache_write_tokens on AssistantTurn; AgentState.total_cache_read_tokens / total_cache_write_tokens accumulate via getattr(..., 0) so providers that never set the fields still work. Extraction centralized into two helpers in providers.py: _anthropic_cache_tokens(usage) reads cache_read_input_tokens + cache_creation_input_tokens; _openai_cached_read_tokens(usage) walks prompt_tokens_details.cached_tokens. Both coerce missing / None to 0 — older SDKs, non-cached calls, Bedrock-over-litellm wrappers all fall through instead of raising AttributeError. Provider coverage:

      Family Cache read Cache write Mechanism
      Anthropic (stream_anthropic) Both fields on final.usage when prompt-caching beta is active
      OpenAI-schema (stream_openai_compat — OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, Groq, xAI, any compatible endpoint) 0 (by design) OpenAI's schema has no separate "cache creation" counter; caching is implicit on their side
      Ollama (stream_ollama) 0 0 No prompt-caching in Ollama today
      Any future / custom provider 0 (default) 0 (default) getattr(event, "cache_read_tokens", 0) no-op fallback

      Persistence: checkpoint/store.make_snapshot writes token_snapshot["cache_read"] / ["cache_write"]; /checkpoint <id> (and /rewind) restores them alongside input/output totals so counters stay in lock-step with whatever snapshot the user rewound to. Structured logging: api_call_done records now include cache_read_tokens / cache_write_tokens alongside in_tokens / out_tokens. Note: not yet surfaced in /cost or /status output — the tracking layer landed first, a follow-up will expose it in the user-facing commands.

    • Tests (tests/test_cache_tokens.py) — 14 tests across 5 layers: AssistantTurn field defaults + explicit values; AgentState accumulation across increments; real make_snapshot on tmp_path with all four token fields; Anthropic + OpenAI extraction helpers against synthetic usage objects (populated / missing / None); end-to-end agent.run with a scripted stream — single-turn propagation and multi-turn accumulation; plus a test_rewind_restores_cache_tokens_from_snapshot regression test that asserts the round-trip. tests/e2e_checkpoint.py updated to keep the scripted rewind path in sync with production code.

    • Version bumped to 3.05.75.