Release v4.0.0: production grade, still lazy · DietrichGebert/ponytail

The hardening release. Three new reflexes, about ten lines of prompt:

One runnable check. Non-trivial logic leaves behind the smallest test that fails if the logic breaks. No frameworks, no fixtures. One-liners stay test-free.
Ceilings are named. A ponytail: shortcut with a known limit (global lock, O(n²) scan) must name the limit and the upgrade path in the comment.
Robust beats flimsy at equal size. Between two same-size stdlib options, take the one that is correct on edge cases.

Benchmarked

Six tasks, three arms, same model, adversarial security and concurrency probes. Every arm passes every probe. Then the agreement ends:

Full data and methodology: benchmarks/

All cross-agent rule files updated (Cursor, Windsurf, Cline, Copilot, AGENTS.md)
ponytail-review no longer flags the minimal check as bloat
README grew a chart