Cut input + output tokens across Claude Code, Cursor, Windsurf, and Codex.
Every turn, AI coding tools ship tool schemas — the JSON descriptions of every tool (Read, Edit, MCP tools, etc.) — to the model. With a handful of MCP servers enabled, this can be 20–40K tokens per turn, before you type anything. Across 20 turns that's 400–800K tokens paid just to describe tools you may not use.
agent-lean auto-detects which AI tools you have installed and manages MCP profiles, skill installation, and token measurement across all of them uniformly.
| Tool | Config path | MCP profiles | Memory measure | Skill install |
|---|---|---|---|---|
| Claude Code | ~/.claude.json |
✅ | ✅ | ✅ |
| Cursor | ~/.cursor/mcp.json |
✅ | ✅ | ✅ |
| Windsurf | ~/.codeium/windsurf/mcp_config.json |
✅ | ✅ | ✅ |
| Codex | ~/.codex/config.toml |
✅ | ✅ | ✅ |
Roadmap: Cline, GitHub Copilot, Aider, Antigravity.
agent-lean gives you five concrete levers:
- MCP profiles — swap which MCP servers are active by task. The big input-side win.
- Scoped agents — tool-limited subagents so tool-heavy work stays out of your main context.
- MCP measurement — real per-turn tool-schema cost (
--exactspawns MCPs and measures real bytes). - Memory measurement —
CLAUDE.md+ user memory per-turn cost (--memory), with special support for codebase-memory. - Output-side skills — one-command install of curated MIT-licensed skills like caveman (~65% fewer output tokens).
Pairs naturally with
codebase-memory: one writes your per-turn context (rules files so the AI skips re-scanning); the other measures what that context is costing you.
Two ways to use it:
# From inside Claude Code:
/plugin install RagavRida/agent-leanGives you slash commands inside the session:
/agent-lean:measure— combined MCP + memory token measurement/agent-lean:profile— list/switch MCP profiles/agent-lean:install— install curated output-savings skills/agent-lean:optimize— full advisory flow (measure → recommend → apply with consent)
Plus the scoped agents (explorer, editor, researcher, git-worker) ship with the plugin.
npm install -g agent-lean
# or run without install:
npx agent-lean measure# 1. See how much MCP schemas cost across ALL detected tools
agent-lean measure
# 2. See how much your CLAUDE.md + user memory cost
agent-lean measure --memory
# 3. See available lean profiles
agent-lean profile list
# 4. Apply a profile to every detected tool (auto-backs-up each)
agent-lean profile use minimal
# 5. Or apply to one specific tool
agent-lean profile use git-only --tool cursor
# 6. Add output-side savings (installs to every detected tool)
agent-lean install caveman
# 7. Restart the affected tool(s). Done.| Profile | MCP servers | Est. MCP tokens |
|---|---|---|
minimal |
none | 0 |
git-only |
github | ~12K |
research |
fetch, brave-search | ~6K |
full |
github, fetch, filesystem, slack, linear | ~36K |
The agents/ directory contains drop-in Claude Code agent definitions. Copy them into your project's .claude/agents/ directory (or your user-level ~/.claude/agents/) to use them.
| Agent | Tools scoped to | Use for |
|---|---|---|
explorer |
Read, Grep, Glob | Read-only codebase exploration |
editor |
Read, Edit, Write, Grep | Targeted edits (no exploration) |
researcher |
WebFetch, WebSearch, Read | External docs / research |
git-worker |
Bash, Read | Git-only workflows |
Why this helps: when Claude invokes an agent, only that agent's tool schemas load into its own context. Tool-heavy work doesn't bloat your main conversation.
Two modes:
Default (agent-lean measure) — parses your ~/.claude.json, counts MCP servers, and looks up empirical schema sizes in lib/mcp-sizes.js. Unknown MCPs default to 8K tokens. Fast (no subprocess spawn), approximate.
Exact (agent-lean measure --exact) — spawns each configured MCP server, completes the MCP handshake, calls tools/list, and measures real schema JSON bytes. Token count uses ~3.5 chars/token heuristic (not Anthropic's tokenizer). Slower (10–30s), accurate for your actual server versions.
Example real measurement of the reference @modelcontextprotocol/server-everything: 13 tools, 6,556 bytes, ~1,874 tokens — meaningfully less than the 8K default. The static estimates are conservative for unknown servers; use --exact when you want the real number.
See a more accurate number for your setup? Run --exact and PR the result into lib/mcp-sizes.js.
Input tokens are only half the bill — Claude's output tokens are 5× more expensive (Opus: $75/M vs $15/M). agent-lean install fetches vetted third-party skills that compress output:
agent-lean install --list
agent-lean install cavemanCurrently curated:
| Skill | Source | Effect |
|---|---|---|
caveman |
JuliusBrussee/caveman (MIT) | Terse caveman-style output (~65% fewer output tokens) |
caveman-compress |
JuliusBrussee/caveman (MIT) | Compression skill with benchmark scripts |
Skills land in ~/.claude/skills/<name>/ and activate after a Claude Code restart. All attribution points to the original maintainers.
Run the proof script yourself:
npm run proofIt spawns three real MCP servers (everything, memory, sequential-thinking — no auth required), completes the MCP handshake, and measures their actual tools/list byte sizes. Sample output from one run:
MCP Tools Bytes Tokens
everything 13 6,556 1,874
memory 9 9,808 2,803
sequential-thinking 1 4,503 1,287
TOTAL 23 20,867 5,964
Per-turn savings if disabled: ~5,964 input tokens
Over 20 turns, Opus ($15/M): ~$1.79 per session (no cache)
With prompt cache (90% hit): ~$0.18 per Opus session
The script writes proof.json with the raw measurements so you can share reproducible evidence. These are real bytes from real MCP servers — not estimates.
Want to see the savings on the wire? docs/verify-with-mitmproxy.md walks you through capturing actual Claude Code → api.anthropic.com requests, inspecting the tools array, and confirming that swapping profiles changes what you're billed for.
- Core Claude Code tools (Read, Edit, Bash, etc.) always load. You can't remove them via settings.
agent-leanreduces MCP tokens, not core tool tokens. - Prompt caching reduces the cost in practice. If your session stays active under 5 minutes between turns, schemas are cached and re-billed at ~10% rate. The savings compound when sessions span longer or cache is missed.
- Deferred tools (built into Claude Code) already help. If you see tools listed by name only with a
ToolSearchhint, they're already lazy-loaded — this tool complements that, doesn't replace it.
- Hook-based auto-profile switcher (detect task intent on first prompt)
- More accurate MCP size detection (actually spawn the server and count)
- Per-project profile overrides
- Integration with
.claude/settings.jsonpermissions
PRs welcome. Particularly useful contributions:
- Refined MCP schema sizes in
lib/mcp-sizes.js - Additional profiles for common workflows
- Additional scoped agent templates
MIT