Skip to content

v1.7.5 — Adversarial-Training Convergence

Latest

Choose a tag to compare

@Caldis Caldis released this 22 May 15:37
· 26 commits to master since this release

Convergence achieved. Six rounds of adversarial subagent testing brought the critical-defect count to zero. Round-6 ran a 6-layer falsification attack and could not break the framework's core promises.

The trajectory

Round Mode Critical Released Lesson
Author E2E 5 v1.7.1 original quality wave
Subagent 1 verify 3 v1.7.1.x scaffold blockers
Subagent 2 verify 3 v1.7.1.x R2-bug class (JSON contract)
Subagent 3 verify 1 v1.7.2 Git Bash MSys trap
Subagent 4 falsify 2 v1.7.3 R3-regression in sibling
Subagent 5 falsify + audit 1 v1.7.4 yet another sibling
Subagent 6 falsify 6-layer 0 v1.7.5 convergence

What round-6 caught (all Lesson-15 sibling regressions)

Two blocking + three minor — all instances of "defence applied to one file, missed the sibling":

  • smoke.sh pytest gate didn't accept "2 passed, 1 skipped" (smoke.ps1 did since v1.7.4)
  • smoke.sh died under set -e + pipefail when sim binary absent
  • smoke.sh MSys trap missed run --no-build (smoke.ps1 had it)
  • run.py build phase lacked the stale-ELF mtime gate (build.py had it since v1.7.2)
  • tools/release.ps1 bumped pyproject but not README

All five fixed. Both shells now at parity.

What the framework now contains

  • 22-case smoke.ps1 + 6-case smoke.sh — every defect that ever shipped has a permanent regression gate
  • tools/release.ps1 — structural prevention of v1.7.2-style "tag without bump", now auto-bumps both pyproject AND README
  • 18 lessons in docs/lessons-v1.7.md — every defect's What broke / Root cause / Why we missed it / Process change, anchored to a smoke case
  • 6-round adversarial-training transcript behind it — every defence pattern has been attacked by a fresh subagent and held (round-6) or been replaced by a stronger pattern (rounds 1-5)

Production-ready convergence

Bar reached: 0 critical, 0 blocking, 3 minor (none escalating to production-blocking). The maintainer's hypothesis at round-6 launch — "if you find zero new critical we've converged" — was tested fairly and held.

🤖 Generated with Claude Code