Skip to content

v4.0.0: production grade, still lazy

Choose a tag to compare

@DietrichGebert DietrichGebert released this 12 Jun 10:56
· 51 commits to main since this release

The hardening release. Three new reflexes, about ten lines of prompt:

  • One runnable check. Non-trivial logic leaves behind the smallest test that fails if the logic breaks. No frameworks, no fixtures. One-liners stay test-free.
  • Ceilings are named. A ponytail: shortcut with a known limit (global lock, O(n²) scan) must name the limit and the upgrade path in the comment.
  • Robust beats flimsy at equal size. Between two same-size stdlib options, take the one that is correct on edge cases.

Benchmarked

Six tasks, three arms, same model, adversarial security and concurrency probes. Every arm passes every probe. Then the agreement ends:

No skill Caveman Ponytail v4
Lines of code 3,629 1,440 490
Agent tokens 430,697 290,546 229,370
Surprise-extension lines 1,115 413 96

Full data and methodology: benchmarks/

Also

  • All cross-agent rule files updated (Cursor, Windsurf, Cline, Copilot, AGENTS.md)
  • ponytail-review no longer flags the minimal check as bloat
  • README grew a chart