tokenwarden

ccusage tells you what you spent. tokenwarden stops you from overspending.

In a hurry? See QUICKSTART.md — install + use in 30 seconds.

tokenwarden is the only tool in the Claude Code ecosystem that acts on cost rather than reporting it. Every other tool in this space — ccusage, Claude-Code-Usage-Monitor, cc-budget — tells you what happened or warns you what might happen. tokenwarden hooks directly into Claude Code's event system and enforces hard budgets, intercepts expensive calls, caps argument bloat, and coaches compaction. It is designed to be complementary to ccusage: track with ccusage, enforce with tokenwarden.

Quick start

npx tokenwarden install

This writes hook entries into your Claude Code settings.json (user scope by default; use --project to install into the project scope instead) and points them at the installed binary. The install is idempotent (safe to re-run) and fully reversible.

By default tokenwarden starts in observe mode: it logs every intervention it would make without actually blocking anything. This lets you see what it would enforce for a full session before it starts refusing calls.

To switch to enforcement:

# In your project directory
npx tokenwarden init   # creates .tokenwarden.json with your settings
# edit .tokenwarden.json: set "mode": "enforce"

Or flip it globally:

echo '{"mode":"enforce"}' > ~/.tokenwarden.json

Features

1. Hard budget gate

Per-session, per-day, per-project, and per-model USD caps enforced via PreToolUse and UserPromptSubmit hooks. When cumulative spend reaches a cap, the hook returns a block decision and the reason is shown to the model and user. (v0.1 enforces dollar caps; the reliable live signal is cost.total_cost_usd, not a cumulative token count.)

tokenwarden: session budget reached — spent $8.50 of $5.00. Run /compact, start a
fresh session, or set TOKENWARDEN_FORCE=1 to override.

Subagent spend (isSidechain:true) counts toward the parent session/project budget — this is the configuration that addresses the runaway fan-out scenario.

2. Compaction coach

At 80% context usage, UserPromptSubmit injects an additionalContext nudge prompting the user to run /compact. The PreCompact hook can veto wasteful auto-compaction that fires below 50% context fill.

What tokenwarden does not do: it cannot trigger /compact programmatically. Hooks have no mechanism to issue commands. The compaction coach nudges; the human runs /compact.

3. Expensive-call escalation

When an expensive model (Opus by default) is about to make a genuinely expensive call — chiefly a subagent spawn (Task/Agent), the documented runaway-cost vector — PreToolUse returns an ask decision so the call needs your confirmation. The reason is calibrated to a dollar estimate:

tokenwarden: Est. $0.60 on claude-opus-4-7 for a subagent spawn — genuinely
expensive on this model. Switch to a cheaper model, narrow the request, or set
TOKENWARDEN_FORCE=1 to override.

Calls whose estimate falls below escalation.trivialThresholdUsd pass silently — ordinary Reads, Edits, and scoped searches never prompt, even on Opus. Friction is proportional to cost (and an oversized Read is silently capped by the bloat rule below, not escalated).

What tokenwarden does not do: it cannot silently downgrade the model. The model field is not reliably available in hook input, and hooks have no model-selection control. Tokenwarden escalates to a human decision instead.

4. Argument-bloat prevention

PreToolUse uses modifiedInput to rewrite tool arguments before they execute:

Read.limit capped to 2000 lines (prevents dumping entire large files into context)
Grep and Glob refused (ask) when called without path/scope constraints
Bash timeout capped to 600 000 ms to prevent runaway background commands

What tokenwarden does not do: it cannot prune or modify existing conversation history. It can only prevent future bloat at the point of the tool call, before output enters the context window.

5. Subscription usage windows (Pro / Max 5x / Max 20x)

If you're on a flat-fee plan, dollars aren't your constraint — you don't pay per token. You throttle against usage windows: a rolling 5-hour window and a rolling 7-day window. tokenwarden tracks your consumption against those windows so you can see how close you are to a throttle, and (in enforce mode) gate before you burn the rest of your quota on low-value work.

Set your plan and the windows turn on:

{ "subscription": { "plan": "max5x" } }

tokenwarden: max5x 5-hour usage window — ~$24.10 API-equivalent used of an est.
$25.00 ceiling. On a flat-fee plan this is throttle risk, not a bill (ceiling is
an estimate). Slow down, switch to a cheaper model, start a fresh session, or set
TOKENWARDEN_FORCE=1 to override.

There's no dollar budget here — the figure is API-equivalent: tokenwarden prices your token consumption at public API rates (the same number Claude Code's statusline reports for subscription accounts) and measures it within each rolling window. tokenwarden status shows both windows when a plan is set.

Two honest caveats, stated plainly:

The ceilings are estimates. Anthropic does not publish hard 5-hour/weekly numbers, and they drift. The built-in per-plan ceilings are deliberately conservative starting points — calibrate them to your own throttling by setting subscription.fiveHour.usd / subscription.weekly.usd once you learn where you actually hit the wall.
It's a local lower-bound. tokenwarden only counts windows it observed on this machine via its statusline capture. It cannot see usage from other machines, claude.ai, or sessions before you installed it. Treat it as a floor, not a precise meter.

plan: null (the default) keeps this off — API / pay-as-you-go users want the dollar budgets above instead.

Cost-telemetry layer

Tokenwarden reads the same ~/.claude/projects/**/*.jsonl transcripts that ccusage reads, deduplicated by requestId. It also captures the live statusline JSON Claude Code emits (the more reliable real-time cost signal). All dollar figures are estimates, always labeled as such.

How it works

Tokenwarden registers handlers for these Claude Code hook events:

Event	What tokenwarden does
`SessionStart`	Captures the active model id; initializes the session ledger entry
`UserPromptSubmit`	Checks budget caps before the prompt is processed; injects compaction-coach nudge at 80% context
`PreToolUse`	Enforces budget gate, escalation, and argument-bloat caps; returns `modifiedInput` to rewrite args
`PreCompact`	Vetoes auto-compaction that fires below the wasteful threshold
`Stop`	Records a session-finalize bookkeeping event (never blocks)
`SessionEnd`	Reconciles final session totals against the ledger

Operational guarantees:

Fail-open: any internal error in a hook exits 0 (proceed). Tokenwarden never blocks your work because of its own bug.
Hot-path performance: hooks complete in well under 100 ms in practice (~50 ms measured cold-start from a single bundled file), comfortably inside the <250 ms budget. A hard 2 s self-timeout guarantees a hung hook still fails open. The ledger is a small local JSON file read in O(1) — no JSONL re-scanning on each hook call. Note this is per tool call, so a turn with many tool calls adds roughly calls × ~50 ms of overhead (e.g. ~1 s across 20 calls) — small relative to model latency, but real.
Observe-only first session: the default mode: "observe" logs what would be blocked without blocking it. You opt into mode: "enforce" after seeing the logs.
Idempotent and reversible install: tokenwarden install does not duplicate hook entries; tokenwarden uninstall removes exactly what it added.

Config precedence

Project .tokenwarden.json overrides user ~/.tokenwarden.json, which overrides built-in defaults. All three layers merge; missing keys fall back to the defaults.

Config

Create .tokenwarden.json in your project (or ~/.tokenwarden.json for user-wide settings). All fields are optional and fall back to defaults.

{
  "mode": "observe",
  "budgets": {
    "session": { "usd": 5 },
    "daily": { "usd": 25 },
    "project": { "usd": null },
    "perModel": {
      "opus": { "usd": 10 }
    }
  },
  "subscription": {
    "plan": null,
    "fiveHour": { "usd": null },
    "weekly": { "usd": null }
  },
  "compaction": {
    "coachAtPercent": 80,
    "vetoAutoBelowPercent": 50
  },
  "escalation": {
    "enabled": true,
    "expensiveModels": ["opus"],
    "trivialThresholdUsd": 0.5
  },
  "bloat": {
    "readMaxLines": 2000,
    "refuseUnscopedSearch": true,
    "bashMaxTimeoutMs": 600000
  },
  "pricingOverrides": {}
}

Field reference:

Field	Default	Description
`mode`	`"observe"`	`"observe"` logs interventions; `"enforce"` blocks them
`budgets.session.usd`	`5`	Hard USD cap per Claude Code session
`budgets.daily.usd`	`25`	Hard USD cap per calendar day
`budgets.project.usd`	`null`	Hard USD cap for this project directory
`budgets.perModel`	`{}`	Per-model caps keyed by model id substring
`subscription.plan`	`null`	Flat-fee plan: `"pro"`, `"max5x"`, `"max20x"`, `"custom"`, or `null` (off)
`subscription.fiveHour.usd`	`null`	Rolling 5-hour ceiling (API-equiv USD); `null` → built-in estimate for the plan
`subscription.weekly.usd`	`null`	Rolling 7-day ceiling (API-equiv USD); `null` → built-in estimate for the plan
`compaction.coachAtPercent`	`80`	Inject /compact nudge at this context-fill %
`compaction.vetoAutoBelowPercent`	`50`	Block auto-compaction that fires below this %
`escalation.expensiveModels`	`["opus"]`	Model id substrings triggering escalation
`escalation.trivialThresholdUsd`	`0.5`	Calls below this pass silently
`bloat.readMaxLines`	`2000`	Cap Read.limit via modifiedInput
`bloat.refuseUnscopedSearch`	`true`	Refuse Grep/Glob with no path constraint
`bloat.bashMaxTimeoutMs`	`600000`	Cap Bash timeout

Subagent (isSidechain) spend always counts toward the parent session/project budget — this is inherent to the live cost.total_cost_usd signal and is what makes the runaway fan-out scenario enforceable.

The subscription ceilings are estimates — Anthropic publishes no hard 5-hour/weekly numbers. Leave usd as null to use the conservative built-in per-plan defaults, then set explicit values once you learn where you actually throttle. See Subscription usage windows.

Commands

npx tokenwarden <command>

Command	What it does
`install`	Writes hook entries into the chosen settings.json scope; idempotent
`uninstall`	Removes exactly the hook entries tokenwarden added; no other changes
`status`	Shows mode, active budgets, current session/day spend, and (if a plan is set) subscription usage windows
`report`	Prints a spend summary parsed from JSONL transcripts (same data source as ccusage)
`inspect`	Shows the raw ledger state and recent enforcement events
`doctor`	Checks that hook entries are wired correctly and the binary is reachable
`init`	Scaffolds a `.tokenwarden.json` in the current directory with the defaults

Data accuracy and estimates

All cost figures tokenwarden shows are estimates. Two reasons:

JSONL is a known lower bound. ccusage issue #866 documents that Claude Code writes token counts from early streaming events and never updates them to finals. Input tokens can be undercounted by 100–174x, output by 10–17x in affected cases. Tokenwarden reads the same JSONL, deduplicated by requestId, and inherits this limitation. JSONL is used for retrospective reporting and trend signals.

Statusline cost is the reliable live signal. Claude Code pipes a live JSON object to the configured statusline command including cost.total_cost_usd and context-window metrics. Tokenwarden uses this for real-time enforcement caps. This is more reliable than JSONL for hard-cap decisions.

Pricing is a versioned snapshot. The pricing table was verified against platform.claude.com on 2026-05-21. Rates can change; tokenwarden shows "pricing updated YYYY-MM-DD" in the report output. You can override per-model rates via pricingOverrides in your config (useful for negotiated enterprise rates).

Benchmarks

Status: harness shipped, numbers pending. tokenwarden ships the reproducible measurement harness in benchmarks/ — but it does not yet publish verified savings figures. We will not quote a savings percentage until real runs back it (the same discipline that makes a flat "60–80%" claim indefensible). benchmarks/results/ is populated by running the harness; it is intentionally empty in the repo until verified runs land before launch.

The harness measures token spend with and without tokenwarden across a fixed suite of representative tasks (bugfix, feature-add, refactor, subagent fan-out, MCP-heavy, long session) at N ≥ 6 runs each, same model and prompts, pinned Claude Code version, measured from the JSONL logs (the same source ccusage reads), reported as median + range per scenario with raw logs published.

Targets we expect the data to support (to be validated, not yet measured):

30–50% lower token spend on typical sessions
up to ~70% on MCP-heavy / long-running agentic sessions, where argument-bloat prevention has the largest effect
the qualitative win that needs no benchmark: a hard cap stops a runaway unattended session before it costs $1,000s

See benchmarks/README.md for the methodology and how to reproduce.

Comparison

Tool	What it does	Acts on cost?
ccusage	Reads JSONL; reports daily/weekly/session spend	No — read-only reporting
Claude-Code-Usage-Monitor	Live burn-rate display and spend predictions	No — warns, does not block
cc-budget	Budget alerts	No — alerts only, no blocking
Anthropic Console spend limits	Hard server-side org/workspace caps	Org-level only — no per-session, per-tool, or per-model control
tokenwarden	Enforces per-session/day/project/model caps via hooks	Yes — blocks, rewrites, and escalates

Tokenwarden is complementary to ccusage, not a replacement. Use ccusage for historical reporting and dashboards; use tokenwarden to enforce the budgets those reports reveal you need.

Why now

The sharpest version of Claude Code cost pain in 2026 is not "it's a bit expensive" — it is runaway unattended agent spend. A few verified examples:

Uber burned its entire 2026 AI budget in approximately four months, primarily on Claude Code.
Indie developer Jenny Ouyang woke up to an unexpected $1,600 bill from invisible context accumulation.
Anthropic's own docs baseline individual developer costs at roughly $13/active day and $150–250/month — before subagent fan-out.

The tools that exist today tell you what you spent after the fact. Tokenwarden is the first tool in this niche that refuses a call before it runs.

Requirements

Node.js >= 18
Claude Code (any recent version)
No other runtime dependencies

License

MIT. See LICENSE.

GitHub topics: claude-code claude-usage claude-code-hooks cost-optimization cli ai-agents

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
benchmarks		benchmarks
scripts		scripts
src		src
test		test
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PLAN.md		PLAN.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
RESEARCH.md		RESEARCH.md
compass_artifact_wf-65411950-57e3-474f-977b-5570d097581c_text_markdown.md		compass_artifact_wf-65411950-57e3-474f-977b-5570d097581c_text_markdown.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tokenwarden

Quick start

Features

1. Hard budget gate

2. Compaction coach

3. Expensive-call escalation

4. Argument-bloat prevention

5. Subscription usage windows (Pro / Max 5x / Max 20x)

Cost-telemetry layer

How it works

Config precedence

Config

Commands

Data accuracy and estimates

Benchmarks

Comparison

Why now

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tokenwarden

Quick start

Features

1. Hard budget gate

2. Compaction coach

3. Expensive-call escalation

4. Argument-bloat prevention

5. Subscription usage windows (Pro / Max 5x / Max 20x)

Cost-telemetry layer

How it works

Config precedence

Config

Commands

Data accuracy and estimates

Benchmarks

Comparison

Why now

Requirements

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages