say more with less.
wren compresses prompts and tool output for coding agents. it plugs into Claude Code, Codex, and other MCP-capable local workflows. it makes verbose context smaller without dropping the details that change behavior: paths, line numbers, flags, negations, errors, and step ordering.
it runs locally on Apple Silicon via MLX. typical verbose text shrinks by 50-80%. short text passes through unchanged. the fluff disappears; the fragile details stay intact.
coding agents waste context on politeness, repetition, and noisy tool output. Wren is a narrow model built to compress that waste away while keeping the tokens that silently matter.
- system prompts stay directive
- grep and build output stay actionable
- exact values survive verbatim
- local-first workflows stay local
see Wren work before wiring it into anything:
wren demo
cat build.log | wren demo --outputwren demo shows:
- before / after text
- chars saved
- estimated tokens saved
- a quick preservation report for paths, flags, numbers, negations, errors, and step ordering
requires Python 3.10+ and Apple Silicon.
fastest path:
pipx install git+https://github.com/baahaus/wren.git
wren doctor
wren demofrom source:
git clone https://github.com/baahaus/wren.git
cd wren
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
wren doctor
wren demoWren defaults to Qwen/Qwen2.5-1.5B-Instruct, which matches the included adapters. if you want to override it, use config.json or WREN_BASE_MODEL.
cp config.example.json config.json
# optional: edit config.json if you want a different base model or thresholds
# add to PATH
ln -sf $(pwd)/bin/wren ~/.local/bin/wrenpipe text in, compressed text out:
echo "When implementing a REST API, make sure to use proper HTTP status codes..." | wren
wren "Your verbose prompt here"
wren --file ./my-system-prompt.md
wren demo
wren doctorwren also runs as a local stdio MCP server. if your agent can launch an MCP server process, wren works with it. Claude Code and Codex are the simplest examples, and other MCP-capable agents can use the same entrypoint. the model loads once and stays hot in memory, so compression becomes part of the workflow instead of a separate wait.
tools exposed:
| tool | use for | not for |
|---|---|---|
compressed_read |
exploring unfamiliar files | files you need to edit |
compressed_grep |
broad codebase searches | exact line numbers |
compressed_exec |
verbose read/build/test/log commands | arbitrary shell or destructive commands |
compress_text |
shrinking any large text blob | -- |
wren_status |
session compression stats | -- |
setup:
# Claude Code
claude mcp add wren -s user -- wren-mcp
# Codex
codex mcp add wren -- wren-mcp
# source checkout (same server, any stdio MCP client)
/path/to/wren/.venv/bin/python /path/to/wren/mcp_server.pyfor other MCP clients, point them at the same stdio command:
wren-mcp
# or
/path/to/wren/.venv/bin/python /path/to/wren/mcp_server.pyoptional auto-approve for the non-exec tools only:
do not auto-approve compressed_exec
add: "mcp__wren__compressed_read", "mcp__wren__compressed_grep",
"mcp__wren__compress_text", "mcp__wren__wren_status"
compressed_exec does not open a shell. it only allows a constrained set of inspection/build/test/log commands, and rejects arbitrary or destructive invocations.
| input | output | saved |
|---|---|---|
| "When implementing a REST API, make sure to use proper HTTP status codes for all responses. Use 200 for successful GET requests, 201 for successful POST requests, 204 for DELETE, 400 for bad client requests." | "REST API: 200 GET, 201 POST, 204 DELETE, 400 BAD." | 79% |
| "Before making any changes to the codebase, please read the relevant files first to understand the existing code structure. Do not create new files unless they are absolutely necessary." | "Read existing files, do not create new unless necessary." | 78% |
| "Database migration: 1) pg_dump --format=custom. 2) Maintenance mode. 3) Run db/migrations/0042_add_indexes.sql..." | "1) pg_dump --format=custom. 2) Maintenance mode. 3) db/migrations/0042_add_indexes.sql..." | 40% |
the hard stuff (exact numbers, file paths, flags, negations) stays intact. the fluff disappears.
- negations -- "NEVER do X unless Y" stays "NEVER X unless Y", not "do X"
- values -- numbers, status codes, paths, flags, error codes survive verbatim
- branches -- if/else/when/otherwise logic stays complete
- step ordering -- numbered procedures keep every step in order
- constraints -- limits, thresholds, requirements don't get softened
- file paths -- exact paths, line numbers, function signatures (tool output mode)
wren uses different system prompts depending on what it's compressing:
- input mode (
mode="input") -- for user prompts, system instructions, documentation. focuses on preserving meaning and instruction-following behavior. - output mode (
mode="output") -- for tool results (code, grep, build logs). focuses on preserving actionable information: paths, line numbers, errors, signatures.
the CLI uses input mode. the MCP server uses output mode.
wren is LoRA fine-tuned on compression pairs across 20+ categories. training data comes from two pipelines:
input prompts (existing):
python3 generate_data.py mine # mine prompts from conversation history
python3 generate_data.py compress # compress via Claude API
python3 generate_data.py merge # merge into train/valid splitstool output (new):
python3 generate_tool_output.py mine # mine Read/Grep/Bash results
python3 generate_tool_output.py compress # compress via Claude API
python3 generate_tool_output.py merge # merge into train/valid splitsboth pipelines write to the same train.jsonl / valid.jsonl. the system prompt in each training example tells the model which mode to use.
retrain:
BASE=$(python3 -c "import json; print(json.load(open('config.json'))['base_model'])")
python3 -m mlx_lm lora \
--model "$BASE" \
--train --data data \
--batch-size 1 --num-layers 8 --iters 800 \
--learning-rate 1e-5 --adapter-path adapters \
--steps-per-eval 100 --save-every 10031 test cases across 6 dimensions:
python3 eval.py # summary with letter grades
python3 eval.py -v # verbose per test case
python3 eval.py -c values # filter by category
python3 eval.py -j # JSON outputdimensions: compression ratio, value preservation, negation preservation, branch completeness, step ordering, required content retention.
- base: 1.5B parameter instruction-tuned model
- fine-tuning: LoRA (8 layers, 2.6M trainable params / 0.17%)
- inference: MLX native on Apple Silicon
- latency: ~2-5s per compression (CLI), near-instant after first call (MCP server)
- modes: input compression (prompts) + output compression (tool results)
smallest bird, loudest song.
made by baahaus
