wren

say more with less.

wren compresses prompts and tool output for coding agents. it plugs into Claude Code, Codex, and other MCP-capable local workflows. it makes verbose context smaller without dropping the details that change behavior: paths, line numbers, flags, negations, errors, and step ordering.

it runs locally on Apple Silicon via MLX. typical verbose text shrinks by 50-80%. short text passes through unchanged. the fluff disappears; the fragile details stay intact.

why it exists

coding agents waste context on politeness, repetition, and noisy tool output. Wren is a narrow model built to compress that waste away while keeping the tokens that silently matter.

system prompts stay directive
grep and build output stay actionable
exact values survive verbatim
local-first workflows stay local

quick proof

see Wren work before wiring it into anything:

wren demo
cat build.log | wren demo --output

wren demo shows:

before / after text
chars saved
estimated tokens saved
a quick preservation report for paths, flags, numbers, negations, errors, and step ordering

install

requires Python 3.10+ and Apple Silicon.

fastest path:

pipx install git+https://github.com/baahaus/wren.git
wren doctor
wren demo

from source:

git clone https://github.com/baahaus/wren.git
cd wren
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
wren doctor
wren demo

Wren defaults to Qwen/Qwen2.5-1.5B-Instruct, which matches the included adapters. if you want to override it, use config.json or WREN_BASE_MODEL.

cp config.example.json config.json
# optional: edit config.json if you want a different base model or thresholds

# add to PATH
ln -sf $(pwd)/bin/wren ~/.local/bin/wren

cli

pipe text in, compressed text out:

echo "When implementing a REST API, make sure to use proper HTTP status codes..." | wren
wren "Your verbose prompt here"
wren --file ./my-system-prompt.md
wren demo
wren doctor

MCP server

wren also runs as a local stdio MCP server. if your agent can launch an MCP server process, wren works with it. Claude Code and Codex are the simplest examples, and other MCP-capable agents can use the same entrypoint. the model loads once and stays hot in memory, so compression becomes part of the workflow instead of a separate wait.

tools exposed:

tool	use for	not for
`compressed_read`	exploring unfamiliar files	files you need to edit
`compressed_grep`	broad codebase searches	exact line numbers
`compressed_exec`	verbose read/build/test/log commands	arbitrary shell or destructive commands
`compress_text`	shrinking any large text blob	--
`wren_status`	session compression stats	--

setup:

# Claude Code
claude mcp add wren -s user -- wren-mcp

# Codex
codex mcp add wren -- wren-mcp

# source checkout (same server, any stdio MCP client)
/path/to/wren/.venv/bin/python /path/to/wren/mcp_server.py

for other MCP clients, point them at the same stdio command:

wren-mcp
# or
/path/to/wren/.venv/bin/python /path/to/wren/mcp_server.py

optional auto-approve for the non-exec tools only:

do not auto-approve compressed_exec
add: "mcp__wren__compressed_read", "mcp__wren__compressed_grep",
     "mcp__wren__compress_text", "mcp__wren__wren_status"

compressed_exec does not open a shell. it only allows a constrained set of inspection/build/test/log commands, and rejects arbitrary or destructive invocations.

what it does

input	output	saved
"When implementing a REST API, make sure to use proper HTTP status codes for all responses. Use 200 for successful GET requests, 201 for successful POST requests, 204 for DELETE, 400 for bad client requests."	"REST API: 200 GET, 201 POST, 204 DELETE, 400 BAD."	79%
"Before making any changes to the codebase, please read the relevant files first to understand the existing code structure. Do not create new files unless they are absolutely necessary."	"Read existing files, do not create new unless necessary."	78%
"Database migration: 1) pg_dump --format=custom. 2) Maintenance mode. 3) Run db/migrations/0042_add_indexes.sql..."	"1) pg_dump --format=custom. 2) Maintenance mode. 3) db/migrations/0042_add_indexes.sql..."	40%

the hard stuff (exact numbers, file paths, flags, negations) stays intact. the fluff disappears.

what it preserves

negations -- "NEVER do X unless Y" stays "NEVER X unless Y", not "do X"
values -- numbers, status codes, paths, flags, error codes survive verbatim
branches -- if/else/when/otherwise logic stays complete
step ordering -- numbered procedures keep every step in order
constraints -- limits, thresholds, requirements don't get softened
file paths -- exact paths, line numbers, function signatures (tool output mode)

two compression modes

wren uses different system prompts depending on what it's compressing:

input mode (mode="input") -- for user prompts, system instructions, documentation. focuses on preserving meaning and instruction-following behavior.
output mode (mode="output") -- for tool results (code, grep, build logs). focuses on preserving actionable information: paths, line numbers, errors, signatures.

the CLI uses input mode. the MCP server uses output mode.

training

wren is LoRA fine-tuned on compression pairs across 20+ categories. training data comes from two pipelines:

input prompts (existing):

python3 generate_data.py mine       # mine prompts from conversation history
python3 generate_data.py compress   # compress via Claude API
python3 generate_data.py merge      # merge into train/valid splits

tool output (new):

python3 generate_tool_output.py mine       # mine Read/Grep/Bash results
python3 generate_tool_output.py compress   # compress via Claude API
python3 generate_tool_output.py merge      # merge into train/valid splits

both pipelines write to the same train.jsonl / valid.jsonl. the system prompt in each training example tells the model which mode to use.

retrain:

BASE=$(python3 -c "import json; print(json.load(open('config.json'))['base_model'])")
python3 -m mlx_lm lora \
  --model "$BASE" \
  --train --data data \
  --batch-size 1 --num-layers 8 --iters 800 \
  --learning-rate 1e-5 --adapter-path adapters \
  --steps-per-eval 100 --save-every 100

eval

31 test cases across 6 dimensions:

python3 eval.py           # summary with letter grades
python3 eval.py -v        # verbose per test case
python3 eval.py -c values # filter by category
python3 eval.py -j        # JSON output

dimensions: compression ratio, value preservation, negation preservation, branch completeness, step ordering, required content retention.

under the hood

base: 1.5B parameter instruction-tuned model
fine-tuning: LoRA (8 layers, 2.6M trainable params / 0.17%)
inference: MLX native on Apple Silicon
latency: ~2-5s per compression (CLI), near-instant after first call (MCP server)
modes: input compression (prompts) + output compression (tool results)

why "wren"

smallest bird, loudest song.

_{made by baahaus}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
adapters		adapters
bin		bin
data		data
hooks		hooks
.gitignore		.gitignore
README.md		README.md
config.example.json		config.example.json
eval.py		eval.py
generate_data.py		generate_data.py
generate_tool_output.py		generate_tool_output.py
index.html		index.html
logo.png		logo.png
mcp_server.py		mcp_server.py
pyproject.toml		pyproject.toml
wren.py		wren.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wren

why it exists

quick proof

install

cli

MCP server

what it does

what it preserves

two compression modes

training

eval

under the hood

why "wren"

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wren

why it exists

quick proof

install

cli

MCP server

what it does

what it preserves

two compression modes

training

eval

under the hood

why "wren"

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages