Releases: DietrichGebert/ponytail
v4.7.0: lazy in OpenClaw now
OpenClaw is the fastest-growing open-source agent out there, an always-on assistant that reads your messages and runs your workflows. The lazy senior dev now lives inside it.
clawhub install ponytail and he's there: an OpenClaw skill that kicks in on coding tasks and tells the agent to write less. The review, audit, debt, and help skills come along too. He does not care how big the house is. He still deletes more than he adds.
The skill is generated straight from ponytail's single source, so the OpenClaw copy cannot drift from every other platform. One ruleset, now on one more agent.
Tested the boring way: installed OpenClaw, loaded the skill, watched it come up ready and visible to the model. Then he went back to not talking.
v4.6.0: help, reluctantly
He has never explained a command in his life. You typed /ponytail-help and got
nothing, because the file was never actually there, only the promise of it in the
docs. The most senior-dev bug there is: works in the standup, missing from the repo.
Now it ships. /ponytail-help is wired up alongside the other commands on every
skill-capable host (Claude Code, Codex, OpenCode, Gemini CLI, pi): one command that
lists the rest. A new parity test makes sure no future command can be advertised
without the files to back it. He hates writing documentation. He hates broken
promises more.
We benchmarked him on a tiny local model and shipped the flop.
A contributor added an Ollama runner so you can test ponytail on local models. We
ran it on llama3.2 (3B) and the lines-of-code win turned out to be noise: one run
lands 17% under baseline, the next 50% over, the median shrugs. The skill is tuned
for models that actually follow instructions. A 3B model nods along and writes the
boilerplate anyway.
We published that instead of burying it. A benchmark you only show when it flatters
you is an ad. The frontier numbers (80-94% less code on Haiku, Sonnet, Opus) still
hold, and now there's an honest note on where they stop. Full reproduction in
benchmarks/results/.
Swept up on the way out: a counter that scored unfenced code as zero, and a Unicode
character that crashed the runner on Windows after the work was already done.
He'd call it a quiet release. Then he'd stop talking.
v4.5.0: lazy in Copilot
This release is mostly other people, and that's the point. GitHub Copilot CLI is now a full plugin host, contributed by @maxfelker (a Microsoft engineer) who built it with Copilot, tested it live, then reviewed his own PR in ponytail ultra (the headline finding was "delete a test"). CI, an npm test script, and a python3 fix came from @christophermayfield. Plus a fix for the hooks erroring when node isn't on PATH (Nix/nvm setups). The tool that does less got more thorough by getting more hands.
- GitHub Copilot CLI plugin:
copilot plugin marketplace add DietrichGebert/ponytailthencopilot plugin install ponytail@ponytail. - CI on every push and PR, plus
npm test. - Correctness checks work on macOS/CI now (python3 probe).
- Hooks degrade gracefully when
nodeisn't on PATH.
What's Changed
- Adding support for Copilot Marketplace plugin by @maxfelker in #47
- docs: bump the agents badge to 13 by @DietrichGebert in #56
- fix: use python3 for correctness checks and add CI by @christophermayfield in #50
- fix: hooks degrade gracefully when node is not on PATH by @DietrichGebert in #57
- docs: note the Codex install also covers the desktop app by @DietrichGebert in #59
- chore: bump version to 4.5.0 by @DietrichGebert in #60
New Contributors
- @maxfelker made their first contribution in #47
- @christophermayfield made their first contribution in #50
Full Changelog: v4.4.0...v4.5.0
v4.4.0: field-tested, still lazy
The headline of this release isn't a feature, it's a field test. A user ran ponytail across a from-scratch rewrite of a real system: nine phases, protocol plus desktop app plus simulator plus Raspberry Pi daemon plus ESP32 firmware. The verdict was "net win, kept it on the whole build," and across all nine phases "it never once trimmed a failsafe, validation, or auth check." It also flagged where the laziness needed a tighter leash. v4.4.0 is the result.
- Sharper rules from that feedback: hardware is never the spec ideal (leave the calibration knob), the one-runnable-check rule is now a headline ("lazy code without its check is unfinished"), and explanation you explicitly asked for isn't debt.
/ponytail-debt: harvests theponytail:shortcuts you've deferred into a ledger, so "later" doesn't quietly become "never."- A behavior-gate eval so those rules can't silently regress.
- A dark-background logo (community-contributed) and a cleaner README.
What's Changed
- feat: refine ruleset from a full-project field review by @DietrichGebert in #39
- feat: add ponytail-debt skill by @DietrichGebert in #40
- docs: commands reference + portability accuracy by @DietrichGebert in #41
- feat: add a dark-background logo by @DietrichGebert in #43
- docs: use the dark logo in the README header on dark themes by @DietrichGebert in #44
- chore: bump version to 4.4.0 by @DietrichGebert in #45
- chore: untrack one-off social images committed by mistake by @DietrichGebert in #46
Full Changelog: v4.3.0...v4.4.0
v4.3.0: more agents, still lazy
What's Changed
- feat: add ponytail-audit skill by @rygel in #20
- fix: Windows hooks fail under PowerShell (cmd.exe %VAR% not expanded) by @ousamabenyounes in #26
- feat: add Gemini CLI support by @ousamabenyounes in #25
- docs: document GitHub Copilot CLI support by @DietrichGebert in #30
- docs: document Antigravity and VS Code Codex extension support by @DietrichGebert in #36
- feat(benchmarks): add correctness assertion by @zamal-db in #31
- fix: honor CLAUDE_CONFIG_DIR in hooks by @DietrichGebert in #37
- chore: bump version to 4.3.0 by @DietrichGebert in #38
New Contributors
- @rygel made their first contribution in #20
- @ousamabenyounes made their first contribution in #26
- @zamal-db made their first contribution in #31
Full Changelog: v4.2.0...v4.3.0
v4.2.0: lazy in OpenCode now
Added
- OpenCode support: a plugin that injects the ponytail ruleset every turn and adds
/ponytail+/ponytail-review, with README install steps. (#16)
Fixed
v4.1.0: three more agents
v4.1.0: three more agents
Same lazy senior dev, three new places to put him. No change to the rules or the
benchmark numbers. This release is reach, not behavior.
New adapters
- Codex (#7): full plugin with a marketplace manifest, lifecycle hooks, and a
shared runtime that keeps the Claude path byte-identical. Invoke as@ponytail,
@ponytail-review,@ponytail-help. - Pi (#1): a Pi extension with
/ponytailmode control, per-session
persistence, and system-prompt injection. Install with
pi install git:github.com/DietrichGebert/ponytail. - Kiro (#6): a drop-in steering file at
.kiro/steering/ponytail.md.
Eight hosts now: Claude Code, Codex, Pi, Cursor, Windsurf, Cline, Copilot, Kiro.
Tooling and docs
- Rule-copy drift check (#3, #9):
node scripts/check-rule-copies.jskeeps
every adapter copy aligned with AGENTS.md and guards the SKILL.md source against
silently losing a rule. - Agent portability doc (#2, #10): one table mapping each host to its files.
- Cross-platform fix for the hook compatibility test (Windows os.homedir).
Under the hood
The hooks were factored into a shared runtime (ponytail-runtime.js) and
instruction builder (ponytail-instructions.js), so Claude, Codex, and Pi load
the same rules instead of each duplicating the logic. Every adapter is
unit-tested; the Codex and Pi extension APIs were verified against their official
docs.
Full changelog: v4.0.0...v4.1.0
v4.0.0: production grade, still lazy
The hardening release. Three new reflexes, about ten lines of prompt:
- One runnable check. Non-trivial logic leaves behind the smallest test that fails if the logic breaks. No frameworks, no fixtures. One-liners stay test-free.
- Ceilings are named. A
ponytail:shortcut with a known limit (global lock, O(n²) scan) must name the limit and the upgrade path in the comment. - Robust beats flimsy at equal size. Between two same-size stdlib options, take the one that is correct on edge cases.
Benchmarked
Six tasks, three arms, same model, adversarial security and concurrency probes. Every arm passes every probe. Then the agreement ends:
| No skill | Caveman | Ponytail v4 | |
|---|---|---|---|
| Lines of code | 3,629 | 1,440 | 490 |
| Agent tokens | 430,697 | 290,546 | 229,370 |
| Surprise-extension lines | 1,115 | 413 | 96 |
Full data and methodology: benchmarks/
Also
- All cross-agent rule files updated (Cursor, Windsurf, Cline, Copilot,
AGENTS.md) ponytail-reviewno longer flags the minimal check as bloat- README grew a chart
v1.0.0 — He ships.
The lazy senior dev, installable.
- YAGNI ladder, 3 intensity levels (
lite/full/ultra) - Claude Code plugin: auto-active sessions,
[PONYTAIL]statusline badge,/ponytail-review,/ponytail-help - Rules files for Cursor, Windsurf, Cline, Copilot, Aider
- Benchmarked: −16% tokens, ~4× faster, 293 → 47 lines of code (benchmarks)
Install
/plugin marketplace add DietrichGebert/ponytail
/plugin install ponytail@ponytail
The 246 lines nobody wrote have never caused an incident.