Skip to content

cryptosaras/tokenward

Repository files navigation

tokenwarden

ccusage tells you what you spent. tokenwarden stops you from overspending.

In a hurry? See QUICKSTART.md — install + use in 30 seconds.

tokenwarden is the only tool in the Claude Code ecosystem that acts on cost rather than reporting it. Every other tool in this space — ccusage, Claude-Code-Usage-Monitor, cc-budget — tells you what happened or warns you what might happen. tokenwarden hooks directly into Claude Code's event system and enforces hard budgets, intercepts expensive calls, caps argument bloat, and coaches compaction. It is designed to be complementary to ccusage: track with ccusage, enforce with tokenwarden.

tokenwarden blocking an expensive Opus call


Quick start

npx tokenwarden install

This writes hook entries into your Claude Code settings.json (user scope by default; use --project to install into the project scope instead) and points them at the installed binary. The install is idempotent (safe to re-run) and fully reversible.

By default tokenwarden starts in observe mode: it logs every intervention it would make without actually blocking anything. This lets you see what it would enforce for a full session before it starts refusing calls.

To switch to enforcement:

# In your project directory
npx tokenwarden init   # creates .tokenwarden.json with your settings
# edit .tokenwarden.json: set "mode": "enforce"

Or flip it globally:

echo '{"mode":"enforce"}' > ~/.tokenwarden.json

Features

1. Hard budget gate

Per-session, per-day, per-project, and per-model USD caps enforced via PreToolUse and UserPromptSubmit hooks. When cumulative spend reaches a cap, the hook returns a block decision and the reason is shown to the model and user. (v0.1 enforces dollar caps; the reliable live signal is cost.total_cost_usd, not a cumulative token count.)

tokenwarden: session budget reached — spent $8.50 of $5.00. Run /compact, start a
fresh session, or set TOKENWARDEN_FORCE=1 to override.

Subagent spend (isSidechain:true) counts toward the parent session/project budget — this is the configuration that addresses the runaway fan-out scenario.

2. Compaction coach

At 80% context usage, UserPromptSubmit injects an additionalContext nudge prompting the user to run /compact. The PreCompact hook can veto wasteful auto-compaction that fires below 50% context fill.

What tokenwarden does not do: it cannot trigger /compact programmatically. Hooks have no mechanism to issue commands. The compaction coach nudges; the human runs /compact.

3. Expensive-call escalation

When an expensive model (Opus by default) is about to make a genuinely expensive call — chiefly a subagent spawn (Task/Agent), the documented runaway-cost vector — PreToolUse returns an ask decision so the call needs your confirmation. The reason is calibrated to a dollar estimate:

tokenwarden: Est. $0.60 on claude-opus-4-7 for a subagent spawn — genuinely
expensive on this model. Switch to a cheaper model, narrow the request, or set
TOKENWARDEN_FORCE=1 to override.

Calls whose estimate falls below escalation.trivialThresholdUsd pass silently — ordinary Reads, Edits, and scoped searches never prompt, even on Opus. Friction is proportional to cost (and an oversized Read is silently capped by the bloat rule below, not escalated).

What tokenwarden does not do: it cannot silently downgrade the model. The model field is not reliably available in hook input, and hooks have no model-selection control. Tokenwarden escalates to a human decision instead.

4. Argument-bloat prevention

PreToolUse uses modifiedInput to rewrite tool arguments before they execute:

  • Read.limit capped to 2000 lines (prevents dumping entire large files into context)
  • Grep and Glob refused (ask) when called without path/scope constraints
  • Bash timeout capped to 600 000 ms to prevent runaway background commands

What tokenwarden does not do: it cannot prune or modify existing conversation history. It can only prevent future bloat at the point of the tool call, before output enters the context window.

5. Subscription usage windows (Pro / Max 5x / Max 20x)

If you're on a flat-fee plan, dollars aren't your constraint — you don't pay per token. You throttle against usage windows: a rolling 5-hour window and a rolling 7-day window. tokenwarden tracks your consumption against those windows so you can see how close you are to a throttle, and (in enforce mode) gate before you burn the rest of your quota on low-value work.

Set your plan and the windows turn on:

{ "subscription": { "plan": "max5x" } }
tokenwarden: max5x 5-hour usage window — ~$24.10 API-equivalent used of an est.
$25.00 ceiling. On a flat-fee plan this is throttle risk, not a bill (ceiling is
an estimate). Slow down, switch to a cheaper model, start a fresh session, or set
TOKENWARDEN_FORCE=1 to override.

There's no dollar budget here — the figure is API-equivalent: tokenwarden prices your token consumption at public API rates (the same number Claude Code's statusline reports for subscription accounts) and measures it within each rolling window. tokenwarden status shows both windows when a plan is set.

Two honest caveats, stated plainly:

  • The ceilings are estimates. Anthropic does not publish hard 5-hour/weekly numbers, and they drift. The built-in per-plan ceilings are deliberately conservative starting points — calibrate them to your own throttling by setting subscription.fiveHour.usd / subscription.weekly.usd once you learn where you actually hit the wall.
  • It's a local lower-bound. tokenwarden only counts windows it observed on this machine via its statusline capture. It cannot see usage from other machines, claude.ai, or sessions before you installed it. Treat it as a floor, not a precise meter.

plan: null (the default) keeps this off — API / pay-as-you-go users want the dollar budgets above instead.

Cost-telemetry layer

Tokenwarden reads the same ~/.claude/projects/**/*.jsonl transcripts that ccusage reads, deduplicated by requestId. It also captures the live statusline JSON Claude Code emits (the more reliable real-time cost signal). All dollar figures are estimates, always labeled as such.


How it works

Tokenwarden registers handlers for these Claude Code hook events:

Event What tokenwarden does
SessionStart Captures the active model id; initializes the session ledger entry
UserPromptSubmit Checks budget caps before the prompt is processed; injects compaction-coach nudge at 80% context
PreToolUse Enforces budget gate, escalation, and argument-bloat caps; returns modifiedInput to rewrite args
PreCompact Vetoes auto-compaction that fires below the wasteful threshold
Stop Records a session-finalize bookkeeping event (never blocks)
SessionEnd Reconciles final session totals against the ledger

Operational guarantees:

  • Fail-open: any internal error in a hook exits 0 (proceed). Tokenwarden never blocks your work because of its own bug.
  • Hot-path performance: hooks complete in well under 100 ms in practice (~50 ms measured cold-start from a single bundled file), comfortably inside the <250 ms budget. A hard 2 s self-timeout guarantees a hung hook still fails open. The ledger is a small local JSON file read in O(1) — no JSONL re-scanning on each hook call. Note this is per tool call, so a turn with many tool calls adds roughly calls × ~50 ms of overhead (e.g. ~1 s across 20 calls) — small relative to model latency, but real.
  • Observe-only first session: the default mode: "observe" logs what would be blocked without blocking it. You opt into mode: "enforce" after seeing the logs.
  • Idempotent and reversible install: tokenwarden install does not duplicate hook entries; tokenwarden uninstall removes exactly what it added.

Config precedence

Project .tokenwarden.json overrides user ~/.tokenwarden.json, which overrides built-in defaults. All three layers merge; missing keys fall back to the defaults.


Config

Create .tokenwarden.json in your project (or ~/.tokenwarden.json for user-wide settings). All fields are optional and fall back to defaults.

{
  "mode": "observe",
  "budgets": {
    "session": { "usd": 5 },
    "daily": { "usd": 25 },
    "project": { "usd": null },
    "perModel": {
      "opus": { "usd": 10 }
    }
  },
  "subscription": {
    "plan": null,
    "fiveHour": { "usd": null },
    "weekly": { "usd": null }
  },
  "compaction": {
    "coachAtPercent": 80,
    "vetoAutoBelowPercent": 50
  },
  "escalation": {
    "enabled": true,
    "expensiveModels": ["opus"],
    "trivialThresholdUsd": 0.5
  },
  "bloat": {
    "readMaxLines": 2000,
    "refuseUnscopedSearch": true,
    "bashMaxTimeoutMs": 600000
  },
  "pricingOverrides": {}
}

Field reference:

Field Default Description
mode "observe" "observe" logs interventions; "enforce" blocks them
budgets.session.usd 5 Hard USD cap per Claude Code session
budgets.daily.usd 25 Hard USD cap per calendar day
budgets.project.usd null Hard USD cap for this project directory
budgets.perModel {} Per-model caps keyed by model id substring
subscription.plan null Flat-fee plan: "pro", "max5x", "max20x", "custom", or null (off)
subscription.fiveHour.usd null Rolling 5-hour ceiling (API-equiv USD); null → built-in estimate for the plan
subscription.weekly.usd null Rolling 7-day ceiling (API-equiv USD); null → built-in estimate for the plan
compaction.coachAtPercent 80 Inject /compact nudge at this context-fill %
compaction.vetoAutoBelowPercent 50 Block auto-compaction that fires below this %
escalation.expensiveModels ["opus"] Model id substrings triggering escalation
escalation.trivialThresholdUsd 0.5 Calls below this pass silently
bloat.readMaxLines 2000 Cap Read.limit via modifiedInput
bloat.refuseUnscopedSearch true Refuse Grep/Glob with no path constraint
bloat.bashMaxTimeoutMs 600000 Cap Bash timeout

Subagent (isSidechain) spend always counts toward the parent session/project budget — this is inherent to the live cost.total_cost_usd signal and is what makes the runaway fan-out scenario enforceable.

The subscription ceilings are estimates — Anthropic publishes no hard 5-hour/weekly numbers. Leave usd as null to use the conservative built-in per-plan defaults, then set explicit values once you learn where you actually throttle. See Subscription usage windows.


Commands

npx tokenwarden <command>
Command What it does
install Writes hook entries into the chosen settings.json scope; idempotent
uninstall Removes exactly the hook entries tokenwarden added; no other changes
status Shows mode, active budgets, current session/day spend, and (if a plan is set) subscription usage windows
report Prints a spend summary parsed from JSONL transcripts (same data source as ccusage)
inspect Shows the raw ledger state and recent enforcement events
doctor Checks that hook entries are wired correctly and the binary is reachable
init Scaffolds a .tokenwarden.json in the current directory with the defaults

Data accuracy and estimates

All cost figures tokenwarden shows are estimates. Two reasons:

JSONL is a known lower bound. ccusage issue #866 documents that Claude Code writes token counts from early streaming events and never updates them to finals. Input tokens can be undercounted by 100–174x, output by 10–17x in affected cases. Tokenwarden reads the same JSONL, deduplicated by requestId, and inherits this limitation. JSONL is used for retrospective reporting and trend signals.

Statusline cost is the reliable live signal. Claude Code pipes a live JSON object to the configured statusline command including cost.total_cost_usd and context-window metrics. Tokenwarden uses this for real-time enforcement caps. This is more reliable than JSONL for hard-cap decisions.

Pricing is a versioned snapshot. The pricing table was verified against platform.claude.com on 2026-05-21. Rates can change; tokenwarden shows "pricing updated YYYY-MM-DD" in the report output. You can override per-model rates via pricingOverrides in your config (useful for negotiated enterprise rates).


Benchmarks

Status: harness shipped, numbers pending. tokenwarden ships the reproducible measurement harness in benchmarks/ — but it does not yet publish verified savings figures. We will not quote a savings percentage until real runs back it (the same discipline that makes a flat "60–80%" claim indefensible). benchmarks/results/ is populated by running the harness; it is intentionally empty in the repo until verified runs land before launch.

The harness measures token spend with and without tokenwarden across a fixed suite of representative tasks (bugfix, feature-add, refactor, subagent fan-out, MCP-heavy, long session) at N ≥ 6 runs each, same model and prompts, pinned Claude Code version, measured from the JSONL logs (the same source ccusage reads), reported as median + range per scenario with raw logs published.

Targets we expect the data to support (to be validated, not yet measured):

  • 30–50% lower token spend on typical sessions
  • up to ~70% on MCP-heavy / long-running agentic sessions, where argument-bloat prevention has the largest effect
  • the qualitative win that needs no benchmark: a hard cap stops a runaway unattended session before it costs $1,000s

See benchmarks/README.md for the methodology and how to reproduce.


Comparison

Tool What it does Acts on cost?
ccusage Reads JSONL; reports daily/weekly/session spend No — read-only reporting
Claude-Code-Usage-Monitor Live burn-rate display and spend predictions No — warns, does not block
cc-budget Budget alerts No — alerts only, no blocking
Anthropic Console spend limits Hard server-side org/workspace caps Org-level only — no per-session, per-tool, or per-model control
tokenwarden Enforces per-session/day/project/model caps via hooks Yes — blocks, rewrites, and escalates

Tokenwarden is complementary to ccusage, not a replacement. Use ccusage for historical reporting and dashboards; use tokenwarden to enforce the budgets those reports reveal you need.


Why now

The sharpest version of Claude Code cost pain in 2026 is not "it's a bit expensive" — it is runaway unattended agent spend. A few verified examples:

  • Uber burned its entire 2026 AI budget in approximately four months, primarily on Claude Code.
  • Indie developer Jenny Ouyang woke up to an unexpected $1,600 bill from invisible context accumulation.
  • Anthropic's own docs baseline individual developer costs at roughly $13/active day and $150–250/month — before subagent fan-out.

The tools that exist today tell you what you spent after the fact. Tokenwarden is the first tool in this niche that refuses a call before it runs.


Requirements

  • Node.js >= 18
  • Claude Code (any recent version)
  • No other runtime dependencies

License

MIT. See LICENSE.


GitHub topics: claude-code claude-usage claude-code-hooks cost-optimization cli ai-agents

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors