Skip to content

RagavRida/agent-lean

Repository files navigation

agent-lean

Cut input + output tokens across Claude Code, Cursor, Windsurf, and Codex.

Every turn, AI coding tools ship tool schemas — the JSON descriptions of every tool (Read, Edit, MCP tools, etc.) — to the model. With a handful of MCP servers enabled, this can be 20–40K tokens per turn, before you type anything. Across 20 turns that's 400–800K tokens paid just to describe tools you may not use.

agent-lean auto-detects which AI tools you have installed and manages MCP profiles, skill installation, and token measurement across all of them uniformly.

Supported tools

Tool Config path MCP profiles Memory measure Skill install
Claude Code ~/.claude.json
Cursor ~/.cursor/mcp.json
Windsurf ~/.codeium/windsurf/mcp_config.json
Codex ~/.codex/config.toml

Roadmap: Cline, GitHub Copilot, Aider, Antigravity.

agent-lean gives you five concrete levers:

  1. MCP profiles — swap which MCP servers are active by task. The big input-side win.
  2. Scoped agents — tool-limited subagents so tool-heavy work stays out of your main context.
  3. MCP measurement — real per-turn tool-schema cost (--exact spawns MCPs and measures real bytes).
  4. Memory measurementCLAUDE.md + user memory per-turn cost (--memory), with special support for codebase-memory.
  5. Output-side skills — one-command install of curated MIT-licensed skills like caveman (~65% fewer output tokens).

Pairs naturally with codebase-memory: one writes your per-turn context (rules files so the AI skips re-scanning); the other measures what that context is costing you.

Install

Two ways to use it:

As a Claude Code plugin (recommended)

# From inside Claude Code:
/plugin install RagavRida/agent-lean

Gives you slash commands inside the session:

  • /agent-lean:measure — combined MCP + memory token measurement
  • /agent-lean:profile — list/switch MCP profiles
  • /agent-lean:install — install curated output-savings skills
  • /agent-lean:optimize — full advisory flow (measure → recommend → apply with consent)

Plus the scoped agents (explorer, editor, researcher, git-worker) ship with the plugin.

As a standalone CLI

npm install -g agent-lean
# or run without install:
npx agent-lean measure

Quick start

# 1. See how much MCP schemas cost across ALL detected tools
agent-lean measure

# 2. See how much your CLAUDE.md + user memory cost
agent-lean measure --memory

# 3. See available lean profiles
agent-lean profile list

# 4. Apply a profile to every detected tool (auto-backs-up each)
agent-lean profile use minimal

# 5. Or apply to one specific tool
agent-lean profile use git-only --tool cursor

# 6. Add output-side savings (installs to every detected tool)
agent-lean install caveman

# 7. Restart the affected tool(s). Done.

What each profile includes

Profile MCP servers Est. MCP tokens
minimal none 0
git-only github ~12K
research fetch, brave-search ~6K
full github, fetch, filesystem, slack, linear ~36K

Scoped agents

The agents/ directory contains drop-in Claude Code agent definitions. Copy them into your project's .claude/agents/ directory (or your user-level ~/.claude/agents/) to use them.

Agent Tools scoped to Use for
explorer Read, Grep, Glob Read-only codebase exploration
editor Read, Edit, Write, Grep Targeted edits (no exploration)
researcher WebFetch, WebSearch, Read External docs / research
git-worker Bash, Read Git-only workflows

Why this helps: when Claude invokes an agent, only that agent's tool schemas load into its own context. Tool-heavy work doesn't bloat your main conversation.

How token estimates work

Two modes:

Default (agent-lean measure) — parses your ~/.claude.json, counts MCP servers, and looks up empirical schema sizes in lib/mcp-sizes.js. Unknown MCPs default to 8K tokens. Fast (no subprocess spawn), approximate.

Exact (agent-lean measure --exact) — spawns each configured MCP server, completes the MCP handshake, calls tools/list, and measures real schema JSON bytes. Token count uses ~3.5 chars/token heuristic (not Anthropic's tokenizer). Slower (10–30s), accurate for your actual server versions.

Example real measurement of the reference @modelcontextprotocol/server-everything: 13 tools, 6,556 bytes, ~1,874 tokens — meaningfully less than the 8K default. The static estimates are conservative for unknown servers; use --exact when you want the real number.

See a more accurate number for your setup? Run --exact and PR the result into lib/mcp-sizes.js.

Output-side savings via curated skills

Input tokens are only half the bill — Claude's output tokens are 5× more expensive (Opus: $75/M vs $15/M). agent-lean install fetches vetted third-party skills that compress output:

agent-lean install --list
agent-lean install caveman

Currently curated:

Skill Source Effect
caveman JuliusBrussee/caveman (MIT) Terse caveman-style output (~65% fewer output tokens)
caveman-compress JuliusBrussee/caveman (MIT) Compression skill with benchmark scripts

Skills land in ~/.claude/skills/<name>/ and activate after a Claude Code restart. All attribution points to the original maintainers.

Proof it works

Run the proof script yourself:

npm run proof

It spawns three real MCP servers (everything, memory, sequential-thinking — no auth required), completes the MCP handshake, and measures their actual tools/list byte sizes. Sample output from one run:

MCP                        Tools       Bytes      Tokens
everything                    13       6,556       1,874
memory                         9       9,808       2,803
sequential-thinking            1       4,503       1,287
TOTAL                         23      20,867       5,964

Per-turn savings if disabled: ~5,964 input tokens
Over 20 turns, Opus ($15/M):  ~$1.79 per session (no cache)
With prompt cache (90% hit):  ~$0.18 per Opus session

The script writes proof.json with the raw measurements so you can share reproducible evidence. These are real bytes from real MCP servers — not estimates.

End-to-end verification with mitmproxy

Want to see the savings on the wire? docs/verify-with-mitmproxy.md walks you through capturing actual Claude Code → api.anthropic.com requests, inspecting the tools array, and confirming that swapping profiles changes what you're billed for.

Honest caveats

  • Core Claude Code tools (Read, Edit, Bash, etc.) always load. You can't remove them via settings. agent-lean reduces MCP tokens, not core tool tokens.
  • Prompt caching reduces the cost in practice. If your session stays active under 5 minutes between turns, schemas are cached and re-billed at ~10% rate. The savings compound when sessions span longer or cache is missed.
  • Deferred tools (built into Claude Code) already help. If you see tools listed by name only with a ToolSearch hint, they're already lazy-loaded — this tool complements that, doesn't replace it.

Roadmap

  • Hook-based auto-profile switcher (detect task intent on first prompt)
  • More accurate MCP size detection (actually spawn the server and count)
  • Per-project profile overrides
  • Integration with .claude/settings.json permissions

Contributing

PRs welcome. Particularly useful contributions:

  • Refined MCP schema sizes in lib/mcp-sizes.js
  • Additional profiles for common workflows
  • Additional scoped agent templates

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors