Release What Changed · SafeRL-Lab/cheetahclaws

Apr 18, 2026 (v3.05.75): External plugin discovery via CHEETAHCLAWS_PLUGIN_PATH + safer dependency management; end-to-end prompt-cache token tracking across providers

PluginScope.EXTERNAL — new scope for plugins discovered in-place (never copied to ~/.cheetahclaws/plugins/). Complements existing USER and PROJECT scopes. Use case: shared team/company plugin directories mounted at a common path.
CHEETAHCLAWS_PLUGIN_PATH env var — colon-separated (os.pathsep) list of directories scanned for plugin subdirs. Each immediate subdirectory that has a plugin.json or PLUGIN.md is surfaced as an external plugin. No new manifest format — reuses the existing PluginManifest.from_plugin_dir() loader. Missing or empty path segments are ignored; hidden directories (.git, .DS_Store, etc.) are skipped.
Default disabled — external plugins land in /plugin list as [external] disabled. User must run /plugin enable <name> once to activate. Enable state persists to ~/.cheetahclaws/plugins.json under a new external_enabled: {name: bool} map, so it survives restarts without the plugin being installed.
No silent pip install — unlike the original proposal in #49, cheetahclaws never installs plugin dependencies from an import-failure fallback. Dependency installation happens only at explicit user-consent points: /plugin install (existing flow), or the first /plugin enable of an external plugin that declares dependencies. The model cannot trick the runtime into mutating the Python environment.
Dependency check uses importlib.metadata.distribution() — new _missing_dependencies(deps) helper keys off the PyPI distribution name, not find_spec(name). This fixes the PyPI-vs-import-name trap that breaks common packages: Pillow (imports as PIL), PyYAML (imports as yaml), opencv-python (cv2), scikit-learn (sklearn), beautifulsoup4 (bs4). The old find_spec("pillow") approach returned None for installed Pillow and would loop-install forever.
Safety guards — uninstall_plugin on an EXTERNAL entry only drops the enable-state record; it never shutil.rmtrees the user's source directory. update_plugin refuses external plugins with "update the source directory directly" instead of attempting git pull. Malformed plugin.json files are logged to stderr and skipped, so one bad manifest can't crash /plugin list.
Dedupe on name collision — if a plugin name exists in both installed (USER/PROJECT) and external scopes, the installed entry wins. Within external scopes, the earliest directory in CHEETAHCLAWS_PLUGIN_PATH wins (consistent with $PATH semantics).
Tests (tests/test_plugin_external.py) — 16 tests covering: env var parsing with empty/nonexistent segments, plugin.json and PLUGIN.md discovery, hidden-directory skip, malformed-JSON resilience, path-order priority, installed-shadows-external dedupe, enable/disable persistence round-trip, PEP 508 requirement parsing (package[extra]>=1.0 → package), and a regression test for the PyPI-vs-import-name bug.
New public export — from plugin import PLUGIN_PATH_ENV gives the env var name for use in tooling/docs.
Not changed: existing USER/PROJECT install flow, plugin.json/PLUGIN.md manifest format, /plugin command subcommands. Fully backward compatible — unset CHEETAHCLAWS_PLUGIN_PATH and the system behaves exactly as before.
Fix (tool-history integrity for OpenAI-compatible providers) — resolves #57: after long sessions, DeepSeek (and other OpenAI-compatible endpoints) started rejecting requests with "Messages with role 'tool' must be a response to a preceding message with 'tool_calls'" (HTTP 400), only recoverable by rebooting which lost all context. Root cause: compaction.find_split_point() chose a split index by token count alone, so a split could land between an assistant(tool_calls) message and its tool response messages, leaving orphaned tool entries in the kept half. Three-layer defense:
- compaction._respect_tool_pairs(messages, split) — post-processes the split index: if the last message in the old half is an assistant with tool_calls, advances the split forward past all consecutive tool responses; also skips any standalone tool message the split would land on. Falls back to returning 0 (skip compaction this turn) if no safe split exists — the threshold will re-trigger next turn.
- compaction.sanitize_history(messages) — single-pass O(n) invariant enforcer. Tracks pending tool_call_ids from the most recent assistant(tool_calls) in a rolling set; drops any tool message whose tool_call_id is not in the set (orphan), and strips unanswered tool_calls entries from assistant messages when a non-tool message intervenes. If all tool_calls on an assistant are stripped, the tool_calls key is removed entirely and content is normalized to a non-null string (required by the OpenAI schema). Does not mutate input.
- agent.run() — calls sanitize_history after every maybe_compact and before each stream() call. Any divergence (from compaction, crashed tool execution, checkpoint restore, or future code paths) is caught before it reaches the provider; emits a history_sanitized warn-log with the number of messages removed so regressions are visible.
- Why three layers instead of one: the split-point fix prevents the primary source of orphans; the sanitizer is a defense-in-depth net that keeps the invariant regardless of where history corruption originates; the agent-loop wiring ensures the net is actually applied. No user-visible behavior change on well-formed histories — test_well_formed_history_unchanged pins this.
- Tests (tests/test_compaction.py) — 15 new tests across three classes (TestFindSplitPoint.test_split_never_splits_tool_pair, TestRespectToolPairs × 4, TestSanitizeHistory × 7) covering split-boundary edge cases (split at every ratio from 0.2 to 0.5, multi-tool-call blocks, standalone orphan tool at split), sanitizer correctness (well-formed history unchanged, orphan drop, partial and full unanswered-tool_calls stripping, unanswered at end of list, wrong tool_call_id drop), and an input-immutability guarantee.

End-to-end prompt-cache token tracking (closes #43) — cache hit/miss counters now flow from provider → AgentState → checkpoint snapshots across every supported provider family. Two new default-0 fields cache_read_tokens / cache_write_tokens on AssistantTurn; AgentState.total_cache_read_tokens / total_cache_write_tokens accumulate via getattr(..., 0) so providers that never set the fields still work. Extraction centralized into two helpers in providers.py: _anthropic_cache_tokens(usage) reads cache_read_input_tokens + cache_creation_input_tokens; _openai_cached_read_tokens(usage) walks prompt_tokens_details.cached_tokens. Both coerce missing / None to 0 — older SDKs, non-cached calls, Bedrock-over-litellm wrappers all fall through instead of raising AttributeError. Provider coverage:

Family	Cache read	Cache write	Mechanism
Anthropic (`stream_anthropic`)	✓	✓	Both fields on `final.usage` when prompt-caching beta is active
OpenAI-schema (`stream_openai_compat` — OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, Groq, xAI, any compatible endpoint)	✓	0 (by design)	OpenAI's schema has no separate "cache creation" counter; caching is implicit on their side
Ollama (`stream_ollama`)	0	0	No prompt-caching in Ollama today
Any future / custom provider	0 (default)	0 (default)	`getattr(event, "cache_read_tokens", 0)` no-op fallback

Persistence: checkpoint/store.make_snapshot writes token_snapshot["cache_read"] / ["cache_write"]; /checkpoint <id> (and /rewind) restores them alongside input/output totals so counters stay in lock-step with whatever snapshot the user rewound to. Structured logging: api_call_done records now include cache_read_tokens / cache_write_tokens alongside in_tokens / out_tokens. Note: not yet surfaced in /cost or /status output — the tracking layer landed first, a follow-up will expose it in the user-facing commands.

Tests (tests/test_cache_tokens.py) — 14 tests across 5 layers: AssistantTurn field defaults + explicit values; AgentState accumulation across increments; real make_snapshot on tmp_path with all four token fields; Anthropic + OpenAI extraction helpers against synthetic usage objects (populated / missing / None); end-to-end agent.run with a scripted stream — single-turn propagation and multi-turn accumulation; plus a test_rewind_restores_cache_tokens_from_snapshot regression test that asserts the round-trip. tests/e2e_checkpoint.py updated to keep the scripted rewind path in sync with production code.
Version bumped to 3.05.75.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What Changed

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!