Skip to content

v4.6.0: help, reluctantly

Choose a tag to compare

@DietrichGebert DietrichGebert released this 15 Jun 14:39
· 8 commits to main since this release
ce153bc

He has never explained a command in his life. You typed /ponytail-help and got
nothing, because the file was never actually there, only the promise of it in the
docs. The most senior-dev bug there is: works in the standup, missing from the repo.

Now it ships. /ponytail-help is wired up alongside the other commands on every
skill-capable host (Claude Code, Codex, OpenCode, Gemini CLI, pi): one command that
lists the rest. A new parity test makes sure no future command can be advertised
without the files to back it. He hates writing documentation. He hates broken
promises more.

We benchmarked him on a tiny local model and shipped the flop.

A contributor added an Ollama runner so you can test ponytail on local models. We
ran it on llama3.2 (3B) and the lines-of-code win turned out to be noise: one run
lands 17% under baseline, the next 50% over, the median shrugs. The skill is tuned
for models that actually follow instructions. A 3B model nods along and writes the
boilerplate anyway.

We published that instead of burying it. A benchmark you only show when it flatters
you is an ad. The frontier numbers (80-94% less code on Haiku, Sonnet, Opus) still
hold, and now there's an honest note on where they stop. Full reproduction in
benchmarks/results/.

Swept up on the way out: a counter that scored unfenced code as zero, and a Unicode
character that crashed the runner on Windows after the work was already done.

He'd call it a quiet release. Then he'd stop talking.