ape together strong. token count small.
Before/After β’ Install β’ Levels β’ Skills β’ Benchmarks β’ Evals
A Claude Code skill/plugin and Codex plugin that makes agent answer in a compressed ape voice β cutting output tokens hard while keeping full technical accuracy. Now with terse commits, one-line code reviews, and a compression tool that cuts ~45% of input tokens every session.
Based on the viral observation that ape-speak dramatically reduces LLM token usage without losing technical substance. So we made it a one-line install.
|
|
|
|
Same fix. Less word. Brain still big.
Pick your troop level:
|
|
|
|
Same answer. You pick how many word.
βββββββββββββββββββββββββββββββββββββββ
β TOKENS SAVED ββββββββ 75% β
β TECHNICAL ACCURACY ββββββββ 100%β
β SPEED INCREASE ββββββββ ~3x β
β VIBES ββββββββ OOG β
βββββββββββββββββββββββββββββββββββββββ
- Faster response β less token to generate = speed go brrr
- Easier to read β no wall of text, just the answer
- Same accuracy β all technical info kept, only fluff removed (science say so)
- Save money β fewer output tokens = less cost
- Fun β every code review become comedy
Install as a plugin β includes skills + auto-loading hooks (ape activates every session, mode badge tracks /ape ultra etc.):
claude plugin marketplace add JuliusBrussee/ape
claude plugin install ape@apenpx skills add JuliusBrussee/apeFor a specific agent: npx skills add JuliusBrussee/ape -a cursor
Note
npx skills installs skills only (no hooks). For Claude Code auto-loading hooks, use the plugin install above or run bash hooks/install.sh.
- Clone repo β Open Codex in repo β
/pluginsβ SearchApeβ Install
Note
Windows Codex users: Clone repo β VS Code β Codex Settings β Plugins β find Ape under local marketplace β Install β Reload Window. Also enable git config core.symlinks true before cloning (requires developer mode or admin).
Install once. Use in all sessions after that. One troop. That it.
Add a [APE:ULTRA] badge to your statusline showing which mode is active. See hooks/README.md for the snippet.
Trigger with:
/apeor Codex$ape- "talk like ape"
- "ape mode"
- "less tokens please"
Stop with: "stop ape" or "normal mode"
| Level | Trigger | What it do |
|---|---|---|
| Lite | /ape lite |
Drop filler, keep grammar. Professional but no fluff |
| Full | /ape full |
Default ape. Drop articles, fragments, sparse ape tone |
| Ultra | /ape ultra |
Maximum compression. Telegraphic. Abbreviate hard |
| Micro | /ape micro |
Answer only. Minimal words. No framing |
Level stick until you change it or session end.
| Skill | What it do | Trigger |
|---|---|---|
| ape-commit | Terse commit messages. Conventional Commits. β€50 char subject. Why over what. | /ape-commit |
| ape-review | One-line PR comments: L42: π΄ bug: user null. Add guard. No throat-clearing. |
/ape-review |
Ape make Claude speak with fewer tokens. Compress make Claude read fewer tokens.
Your CLAUDE.md loads on every session start. Ape Compress rewrites memory files into ape-speak so Claude reads less β without you losing the human-readable original.
/ape:compress CLAUDE.md
CLAUDE.md β compressed (Claude reads this every session β fewer tokens)
CLAUDE.original.md β human-readable backup (you read and edit this)
| File | Original | Compressed | Saved |
|---|---|---|---|
claude-md-preferences.md |
706 | 285 | 59.6% |
project-notes.md |
1145 | 535 | 53.3% |
claude-md-project.md |
1122 | 687 | 38.8% |
todo-list.md |
627 | 388 | 38.1% |
mixed-with-code.md |
888 | 574 | 35.4% |
| Average | 898 | 494 | 45% |
Code blocks, URLs, file paths, commands, headings, dates, version numbers β anything technical passes through untouched. Only prose gets compressed. See the full ape-compress README for details. Security note: Snyk flags this as High Risk due to subprocess/file patterns β it's a false positive.
Historical token counts from an earlier benchmark run (reproduce it yourself):
| Task | Normal (tokens) | Ape (tokens) | Saved |
|---|---|---|---|
| Explain React re-render bug | 1180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2347 | 380 | 84% |
| Explain git rebase vs merge | 702 | 292 | 58% |
| Refactor callback to async/await | 387 | 301 | 22% |
| Architecture: microservices vs monolith | 446 | 310 | 30% |
| Review PR for security issues | 678 | 398 | 41% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Debug PostgreSQL race condition | 1200 | 232 | 81% |
| Implement React error boundary | 3454 | 456 | 87% |
| Average | 1214 | 294 | 65% |
Range: 22%β87% savings across prompts.
Important
Ape only affects output tokens β thinking/reasoning tokens are untouched. Ape no make brain smaller. Ape make mouth smaller. Biggest win is readability and speed, cost savings are a bonus.
A March 2026 paper "Brevity Constraints Reverse Performance Hierarchies in Language Models" found that constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks and completely reversed performance hierarchies. Verbose not always better. Sometimes less word = more correct.
Ape not just claim compression. Ape measure it.
The evals/ directory has a three-arm eval harness that measures real token compression against a proper control β not just "verbose vs skill" but "terse vs skill". Because comparing ape to verbose Claude conflate the skill with generic terseness. That cheating. Ape not cheat.
# Run the eval (needs claude CLI)
uv run python evals/llm_run.py
# Read results (no API key, runs offline)
uv run --with tiktoken python evals/measure.pySnapshots are local generated artifacts and are not committed. Run the eval when you want fresh numbers. Add a skill, add a prompt β harness picks it up automatically.
If ape save you mass token, mass money β leave mass star. β
- Cavekit β specification-driven development for Claude Code. Ape language β specs β parallel builds β working software.
- Revu β local-first macOS study app with FSRS spaced repetition, decks, exams, and study guides. revu.cards
MIT β free like mass mammoth on open plain.
