Releases: Caldis/esp-harness
v1.7.5 — Adversarial-Training Convergence
Convergence achieved. Six rounds of adversarial subagent testing brought the critical-defect count to zero. Round-6 ran a 6-layer falsification attack and could not break the framework's core promises.
The trajectory
| Round | Mode | Critical | Released | Lesson |
|---|---|---|---|---|
| Author E2E | — | 5 | v1.7.1 | original quality wave |
| Subagent 1 | verify | 3 | v1.7.1.x | scaffold blockers |
| Subagent 2 | verify | 3 | v1.7.1.x | R2-bug class (JSON contract) |
| Subagent 3 | verify | 1 | v1.7.2 | Git Bash MSys trap |
| Subagent 4 | falsify | 2 | v1.7.3 | R3-regression in sibling |
| Subagent 5 | falsify + audit | 1 | v1.7.4 | yet another sibling |
| Subagent 6 | falsify 6-layer | 0 | v1.7.5 | convergence |
What round-6 caught (all Lesson-15 sibling regressions)
Two blocking + three minor — all instances of "defence applied to one file, missed the sibling":
smoke.shpytest gate didn't accept "2 passed, 1 skipped" (smoke.ps1 did since v1.7.4)smoke.shdied underset -e + pipefailwhen sim binary absentsmoke.shMSys trap missedrun --no-build(smoke.ps1 had it)run.pybuild phase lacked the stale-ELF mtime gate (build.py had it since v1.7.2)tools/release.ps1bumped pyproject but not README
All five fixed. Both shells now at parity.
What the framework now contains
- 22-case
smoke.ps1+ 6-casesmoke.sh— every defect that ever shipped has a permanent regression gate tools/release.ps1— structural prevention of v1.7.2-style "tag without bump", now auto-bumps both pyproject AND README- 18 lessons in
docs/lessons-v1.7.md— every defect's What broke / Root cause / Why we missed it / Process change, anchored to a smoke case - 6-round adversarial-training transcript behind it — every defence pattern has been attacked by a fresh subagent and held (round-6) or been replaced by a stronger pattern (rounds 1-5)
Production-ready convergence
Bar reached: 0 critical, 0 blocking, 3 minor (none escalating to production-blocking). The maintainer's hypothesis at round-6 launch — "if you find zero new critical we've converged" — was tested fairly and held.
🤖 Generated with Claude Code
v1.7.4 — Round-5 Falsification Convergence
Round-5 falsified v1.7.3. Process audit caught the same R3-CRIT defect class alive in yet another sibling code path — `run --no-build`'s flash phase. Round-4 patched build phase + flash command; round-5 found the back door round-4's fix didn't cover.
Critical
- `esp-harness run --no-build` from Git Bash silently no-op'd. `idf.py` exits 0 with the MSys/Mingw refusal; `wrote_bytes:0`, `verified:false`, but composite-command JSON reports `ok:true`. Now mirrors flash.py's two-tier defence in run.py's flash phase.
Blocking
- `run.py` lost build.py's `patches.apply_all()` retry — AI agent calling `run` on a fresh checkout hit qmi8658 build failure that `build` would auto-patch. Now run has full parity.
- `smoke.ps1` triple-trap case hardcoded the maintainer's D:\ path — bypassed the checked-out tree. Now uses `$RepoRoot`.
- `smoke.ps1` pytest gate insisted on "3 passed" — broke fresh-clone gates that legitimately skip sim diff. Now accepts "2 passed, 1 skipped".
Smoke gate
7/7 host cases green. Triple-trap case now exercises four invocation forms: `build`, `flash`, `run` (with build), `run --no-build`.
Convergence trajectory
| Round | Mode | Critical | Released |
|---|---|---|---|
| Author E2E | — | 5 | v1.7.1 |
| Subagent 1 | verify | 3 | v1.7.1.x |
| Subagent 2 | verify | 3 | v1.7.1.x |
| Subagent 3 | verify | 1 | v1.7.2 |
| Subagent 4 | falsify | 2 | v1.7.3 |
| Subagent 5 | falsify + process audit | 1 (R3-regression in run --no-build) | v1.7.4 |
Critical dropped 5→3→3→1→2→1. Lesson 15 (defence covers ALL entry points) reinforced — every adversarial round so far has caught the previous round's defence missing a sibling code path. v1.7.4 explicitly enumerates all four idf_runner entry points in smoke. If round-6 falsifies and finds zero critical, we've converged.
🤖 Generated with Claude Code
v1.7.3 — Round-4 Falsification Convergence
Round-4 falsified the v1.7.2 release. A convergence-verification subagent given an explicit falsification mandate (vs verify-mode for rounds 1-3) caught six concrete defects on the v1.7.2 tag — including the original R3-CRIT 'silent flash from Git Bash' reborn in `run` and `flash` because R3's patch only touched `build`.
Critical fixes
| Defect | Status pre-round-4 | Fixed in v1.7.3 |
|---|---|---|
| `esp-harness run` silent flash from Git Bash | R3-CRIT regression on v1.7.2 tag | ✓ MSys trap in run.py |
| `esp-harness flash` silent zero-byte success | R3-CRIT regression on v1.7.2 tag | ✓ MSys trap in flash.py |
| `esp-harness init` unbuildable scaffold (dead path) | Broken since v1.5 (3 releases) | ✓ Forwards to `new --link` |
Blocking fixes
- `?keys press pwr` never auto-released — wired pwr into keys_task override + PMIC handler.
- `v1.7.2` tag actually reported as `1.7.1` — pyproject.toml bumped to `1.7.3`.
- `install.ps1` missed `[test]` extras — fresh-clone smoke jumped from 19/20 → 21/21.
- `esp32-harness-showcase` reference sweep — round-3 fix was narrow (1 file); round-4 grep returned 24 hits in 18 files. All non-historical references now point at the esp-harness monorepo.
Smoke gate
21/21 cases green (6 host + 15 device). Material new cases:
- Triple-trap MSys/Mingw refusal (build + flash + run).
- All-3-buttons keys-press synth (boot/user/pwr).
- `--wait-evt` no-match returns timed-out evt_wait_ms (preempted round-4 edge task).
Convergence trajectory
| Round | Critical | Found by | Released as |
|---|---|---|---|
| Author E2E | 5 | — | v1.7.1 |
| Subagent 1 (verify) | 3 | new + scaffold blockers | v1.7.1.x |
| Subagent 2 (verify) | 3 | R2 bugs (manifest/EVT/scaffold) | v1.7.1.x |
| Subagent 3 (verify) | 1 | Git Bash trap | v1.7.2 |
| Subagent 4 (falsify) | 2 | R3-regression + scaffold-rot | v1.7.3 |
The falsification round was the most valuable round — it caught the R3-regression that the previous verify rounds had missed because they didn't test the back-door codepath. A falsify-then-verify pattern is the convergence model going forward.
🤖 Generated with Claude Code
v1.7.2 — Adversarial Convergence
Adversarial-subagent convergence pass. Three rounds of evaluation by minimal-context subagents simulating first-time users — round 1 found 8 issues (3 critical), round 2 found 4 (3 critical), round 3 found 1 critical + 4 blocking + 5 minor. Critical-defect count: 8 → 3 → 1 per round. All fixed; the 20-case smoke gate now locks every defect class in.
Round-3 highlights
- Git Bash silent-build trap (critical) — `idf.py` exited 0 with `MSys/Mingw is no longer supported` from Git Bash. AI agents flashed stale binaries with no warning. `build.py` now detects the message + ELF-mtime gate; fails closed with `exit_code=100`.
- `?keys press [HOLD_MS]` (new) — synthesize physical BOOT / USER / PWR button presses from host. Synth-override window + count integration; AI agents can now exercise button-gated flows.
- PORTING.md gains the `bsp/esp-bsp.h` convention + `bsp_display_start()` entry-point documentation that round-3's port-pretend subagent had to discover by grep.
- 5 post-monorepo path islands swept: `sim/README.md` 3-scenes claim, `aurora-harness/README.md` `esp32-harness-showcase` refs, `tests/README.md` stale pytest path, `PR_TEMPLATE.md` double-prefixed relative links.
- Last `1.5.0` stragglers in `docs/index.html`, repo-layout SVG, `architecture.md` → `1.7.x`.
Smoke gate
20 / 20 cases green (6 host + 14 device). New since v1.7.1:
- `build refuses MSys/Mingw exit-0 (R3-CRIT regression)`
- `?keys press boot synth (R3-bug regression)`
Convergence trajectory
| Round | Defects | Critical | Blocking | Minor |
|---|---|---|---|---|
| Author E2E (v1.7.0→1) | 8 | 5 | — | 3 |
| Subagent 1 (1.7.1→) | 8 | 3 | 2 | 3 |
| Subagent 2 | 4 | 3 | — | 1 |
| Subagent 3 | 10 | 1 | 4 | 5 |
20 distinct defects across 4 rounds of validation on a single codebase. The framework is at the point where further adversarial rounds find papercuts rather than blockers.
🤖 Generated with Claude Code
v1.7.1 — Quality Convergence
Quality convergence wave. Hardware verification of v1.7.0 surfaced eight separate defects across audio / wifi / build infrastructure. Fixing them was straightforward; the more important deliverable is the framework that catches the next instance of each defect class.
Defects fixed (anchored to lessons)
| Defect | Where | Lesson |
|---|---|---|
| `audio tone` reported `bytes:0` despite playing | audio.c |
L1 — `esp_codec_dev_write` returns `esp_err_t`, not POSIX bytes |
| `audio mic peak_dbfs:0.0` in silent rooms | audio.c × 3 sites |
L2 — first DMA buffer is stale; needs throwaway read |
| Peak just above 0 dBFS on one-sample dropouts | audio.c × 3 sites |
L3 — `abs(INT16_MIN) = 32768` overflows |
| `wifi disconnect` immediately reconnected | wifi.c |
L4 — event handler needed user-intent flag |
| Console couldn't parse `ssid="…"` | console_protocol.c |
— — tokenizer didn't honour quotes |
| `MBEDTLS_CRYPTO=n` ignored | sdkconfig | L6 — kconfig `select` trap |
| `wifi_init ESP_ERR_NO_MEM` with 39 KB free | sdkconfig | L7 — `STATIC_TX_BUFFER_NUM=16` is huge |
| `tap_hit` EVT unobservable from host | — | L9 — async EVT framework gap (v1.8 backlog) |
Quality infrastructure
- `docs/lessons-v1.7.md` — ten lessons in the format What broke / Root cause / Why we missed it / Process change. Reference for every future release.
- `tools/smoke.ps1` — pre-release gate. 13 cases, all regression-anchored: 4 host (`doctor 8/8`, `pytest 3/3`, `sim diff 13/13`, `manifest ≥17`) + 9 device (`?ping`, `?stat` fps/scene_count, `?sys`, audio tone bytes > 0, audio mic peak in [-90,-10], `?ota info`, scene switch). Each device case that traces back to a lesson carries an `(L regression)` tag — failures surface both symptom and root-cause doc reference.
Verification
Smoke gate: 13 / 13 green on hardware. Sim diff regression: 13 / 13 identical.
🤖 Generated with Claude Code
v1.7.0 — Connectivity + Real OTA
Connectivity + real OTA. Closes the gap between the v1.6 `?ota` skeleton (info / mark-valid / rollback only) and a device that can actually pull a new image over the air. Pairs with WiFi STA connect + NVS-backed credential persistence so the loop is one command per side: connect once, then `?ota download url=…` whenever a new build lands.
Aurora firmware
- `peripherals/wifi_creds.{h,c}` — NVS-backed credential store (namespace `wifi_cred`, 5-function API).
- `peripherals/wifi.c` — STA-mode connect: `wifi_connect(ssid, pass, timeout_ms)` with event-group + IP_GOT gate + 3 automatic retries. Plus `wifi_disconnect / wifi_is_connected / wifi_get_status`.
- `wifi` console command — rewritten as multi-subcommand: `scan | connect ssid=… pass=… save=1 | disconnect | forget | status`.
- `?ota` console command — gains `download url=…` via `esp_https_ota` streaming with integer-percent EVT progress. Does not auto-reboot — host decides when to flip slots.
- `partitions.csv` — switched from `factory 8M + storage 7M` to dual `ota_0 5M + ota_1 5M + storage 5M`. `otadata` (2 sectors) added.
- `sdkconfig.defaults` — `CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y`, HTTP client HTTPS, dev-friendly `OTA_ALLOW_HTTP`.
Toolkit + build infra
- `core/patches.py` — centralizes `managed_components/` patches. First entry: the `waveshare__qmi8658` upstream bug pair (missing `esp_driver_i2c` REQUIRES + `M_PI` without `#ifndef` guard).
- `build.py` — applies patches pre-build; on failure inspects stderr for known retry signatures and retries once.
- `examples/aurora/CMakeLists.txt` — `esp_harness_apply_known_patches()` CMake fallback. A fresh clone now builds without needing the toolkit at all.
Verification
- Target build: clean rebuild 65.5s, 0 warnings, all artifacts produced
- Sim diff: 13 / 13 scenes identical (no UI regression)
- pytest: 3 / 3 passing (doctor / manifest / sim-diff)
- Manifest: 17 toolkit commands surfaced
Deferred to v1.8
- BLE-peripheral WiFi provisioning UI (NimBLE GATT receiver writing to the v1.7 `wifi_creds` API). API surface is ready; only the BLE side is missing.
🤖 Generated with Claude Code
v1.6.0 — Project maturity polish
Phase J: full docs ecosystem + brand + homepage + contributing infrastructure.
Brand identity — Logo, wordmark, social card, favicon at docs/brand/. The mark is an "H" reframed as a control harness.
Docs ecosystem — manifesto / FAQ / troubleshooting / 3 SVG diagrams / homepage HTML. Built so a fresh reader gets oriented in under 10 minutes.
Root README — proper landing page with badges, what-this-is, 30-second quickstart, three-path onboarding, comparison table.
Contributing — .github/CONTRIBUTING.md + 3 issue templates + PR template + config.
New example — examples/hello-minimal/ proves the new-template integrity.
Per-artifact polish — badges on every README, cross-linked. Old esp32-harness-showcase sibling language removed.
GitHub — description / homepage / 14 topics set via gh repo edit. Pages enabled at https://caldis.github.io/esp-harness/.
Full changelog: CHANGELOG.md