Skip to content

Releases: Caldis/esp-harness

v1.7.5 — Adversarial-Training Convergence

22 May 15:37

Choose a tag to compare

Convergence achieved. Six rounds of adversarial subagent testing brought the critical-defect count to zero. Round-6 ran a 6-layer falsification attack and could not break the framework's core promises.

The trajectory

Round Mode Critical Released Lesson
Author E2E 5 v1.7.1 original quality wave
Subagent 1 verify 3 v1.7.1.x scaffold blockers
Subagent 2 verify 3 v1.7.1.x R2-bug class (JSON contract)
Subagent 3 verify 1 v1.7.2 Git Bash MSys trap
Subagent 4 falsify 2 v1.7.3 R3-regression in sibling
Subagent 5 falsify + audit 1 v1.7.4 yet another sibling
Subagent 6 falsify 6-layer 0 v1.7.5 convergence

What round-6 caught (all Lesson-15 sibling regressions)

Two blocking + three minor — all instances of "defence applied to one file, missed the sibling":

  • smoke.sh pytest gate didn't accept "2 passed, 1 skipped" (smoke.ps1 did since v1.7.4)
  • smoke.sh died under set -e + pipefail when sim binary absent
  • smoke.sh MSys trap missed run --no-build (smoke.ps1 had it)
  • run.py build phase lacked the stale-ELF mtime gate (build.py had it since v1.7.2)
  • tools/release.ps1 bumped pyproject but not README

All five fixed. Both shells now at parity.

What the framework now contains

  • 22-case smoke.ps1 + 6-case smoke.sh — every defect that ever shipped has a permanent regression gate
  • tools/release.ps1 — structural prevention of v1.7.2-style "tag without bump", now auto-bumps both pyproject AND README
  • 18 lessons in docs/lessons-v1.7.md — every defect's What broke / Root cause / Why we missed it / Process change, anchored to a smoke case
  • 6-round adversarial-training transcript behind it — every defence pattern has been attacked by a fresh subagent and held (round-6) or been replaced by a stronger pattern (rounds 1-5)

Production-ready convergence

Bar reached: 0 critical, 0 blocking, 3 minor (none escalating to production-blocking). The maintainer's hypothesis at round-6 launch — "if you find zero new critical we've converged" — was tested fairly and held.

🤖 Generated with Claude Code

v1.7.4 — Round-5 Falsification Convergence

22 May 15:17

Choose a tag to compare

Round-5 falsified v1.7.3. Process audit caught the same R3-CRIT defect class alive in yet another sibling code path — `run --no-build`'s flash phase. Round-4 patched build phase + flash command; round-5 found the back door round-4's fix didn't cover.

Critical

  • `esp-harness run --no-build` from Git Bash silently no-op'd. `idf.py` exits 0 with the MSys/Mingw refusal; `wrote_bytes:0`, `verified:false`, but composite-command JSON reports `ok:true`. Now mirrors flash.py's two-tier defence in run.py's flash phase.

Blocking

  • `run.py` lost build.py's `patches.apply_all()` retry — AI agent calling `run` on a fresh checkout hit qmi8658 build failure that `build` would auto-patch. Now run has full parity.
  • `smoke.ps1` triple-trap case hardcoded the maintainer's D:\ path — bypassed the checked-out tree. Now uses `$RepoRoot`.
  • `smoke.ps1` pytest gate insisted on "3 passed" — broke fresh-clone gates that legitimately skip sim diff. Now accepts "2 passed, 1 skipped".

Smoke gate

7/7 host cases green. Triple-trap case now exercises four invocation forms: `build`, `flash`, `run` (with build), `run --no-build`.

Convergence trajectory

Round Mode Critical Released
Author E2E 5 v1.7.1
Subagent 1 verify 3 v1.7.1.x
Subagent 2 verify 3 v1.7.1.x
Subagent 3 verify 1 v1.7.2
Subagent 4 falsify 2 v1.7.3
Subagent 5 falsify + process audit 1 (R3-regression in run --no-build) v1.7.4

Critical dropped 5→3→3→1→2→1. Lesson 15 (defence covers ALL entry points) reinforced — every adversarial round so far has caught the previous round's defence missing a sibling code path. v1.7.4 explicitly enumerates all four idf_runner entry points in smoke. If round-6 falsifies and finds zero critical, we've converged.

🤖 Generated with Claude Code

v1.7.3 — Round-4 Falsification Convergence

22 May 14:50

Choose a tag to compare

Round-4 falsified the v1.7.2 release. A convergence-verification subagent given an explicit falsification mandate (vs verify-mode for rounds 1-3) caught six concrete defects on the v1.7.2 tag — including the original R3-CRIT 'silent flash from Git Bash' reborn in `run` and `flash` because R3's patch only touched `build`.

Critical fixes

Defect Status pre-round-4 Fixed in v1.7.3
`esp-harness run` silent flash from Git Bash R3-CRIT regression on v1.7.2 tag ✓ MSys trap in run.py
`esp-harness flash` silent zero-byte success R3-CRIT regression on v1.7.2 tag ✓ MSys trap in flash.py
`esp-harness init` unbuildable scaffold (dead path) Broken since v1.5 (3 releases) ✓ Forwards to `new --link`

Blocking fixes

  • `?keys press pwr` never auto-released — wired pwr into keys_task override + PMIC handler.
  • `v1.7.2` tag actually reported as `1.7.1` — pyproject.toml bumped to `1.7.3`.
  • `install.ps1` missed `[test]` extras — fresh-clone smoke jumped from 19/20 → 21/21.
  • `esp32-harness-showcase` reference sweep — round-3 fix was narrow (1 file); round-4 grep returned 24 hits in 18 files. All non-historical references now point at the esp-harness monorepo.

Smoke gate

21/21 cases green (6 host + 15 device). Material new cases:

  • Triple-trap MSys/Mingw refusal (build + flash + run).
  • All-3-buttons keys-press synth (boot/user/pwr).
  • `--wait-evt` no-match returns timed-out evt_wait_ms (preempted round-4 edge task).

Convergence trajectory

Round Critical Found by Released as
Author E2E 5 v1.7.1
Subagent 1 (verify) 3 new + scaffold blockers v1.7.1.x
Subagent 2 (verify) 3 R2 bugs (manifest/EVT/scaffold) v1.7.1.x
Subagent 3 (verify) 1 Git Bash trap v1.7.2
Subagent 4 (falsify) 2 R3-regression + scaffold-rot v1.7.3

The falsification round was the most valuable round — it caught the R3-regression that the previous verify rounds had missed because they didn't test the back-door codepath. A falsify-then-verify pattern is the convergence model going forward.

🤖 Generated with Claude Code

v1.7.2 — Adversarial Convergence

22 May 14:25

Choose a tag to compare

Adversarial-subagent convergence pass. Three rounds of evaluation by minimal-context subagents simulating first-time users — round 1 found 8 issues (3 critical), round 2 found 4 (3 critical), round 3 found 1 critical + 4 blocking + 5 minor. Critical-defect count: 8 → 3 → 1 per round. All fixed; the 20-case smoke gate now locks every defect class in.

Round-3 highlights

  • Git Bash silent-build trap (critical) — `idf.py` exited 0 with `MSys/Mingw is no longer supported` from Git Bash. AI agents flashed stale binaries with no warning. `build.py` now detects the message + ELF-mtime gate; fails closed with `exit_code=100`.
  • `?keys press [HOLD_MS]` (new) — synthesize physical BOOT / USER / PWR button presses from host. Synth-override window + count integration; AI agents can now exercise button-gated flows.
  • PORTING.md gains the `bsp/esp-bsp.h` convention + `bsp_display_start()` entry-point documentation that round-3's port-pretend subagent had to discover by grep.
  • 5 post-monorepo path islands swept: `sim/README.md` 3-scenes claim, `aurora-harness/README.md` `esp32-harness-showcase` refs, `tests/README.md` stale pytest path, `PR_TEMPLATE.md` double-prefixed relative links.
  • Last `1.5.0` stragglers in `docs/index.html`, repo-layout SVG, `architecture.md` → `1.7.x`.

Smoke gate

20 / 20 cases green (6 host + 14 device). New since v1.7.1:

  • `build refuses MSys/Mingw exit-0 (R3-CRIT regression)`
  • `?keys press boot synth (R3-bug regression)`

Convergence trajectory

Round Defects Critical Blocking Minor
Author E2E (v1.7.0→1) 8 5 3
Subagent 1 (1.7.1→) 8 3 2 3
Subagent 2 4 3 1
Subagent 3 10 1 4 5

20 distinct defects across 4 rounds of validation on a single codebase. The framework is at the point where further adversarial rounds find papercuts rather than blockers.

🤖 Generated with Claude Code

v1.7.1 — Quality Convergence

21 May 18:02

Choose a tag to compare

Quality convergence wave. Hardware verification of v1.7.0 surfaced eight separate defects across audio / wifi / build infrastructure. Fixing them was straightforward; the more important deliverable is the framework that catches the next instance of each defect class.

Defects fixed (anchored to lessons)

Defect Where Lesson
`audio tone` reported `bytes:0` despite playing audio.c L1 — `esp_codec_dev_write` returns `esp_err_t`, not POSIX bytes
`audio mic peak_dbfs:0.0` in silent rooms audio.c × 3 sites L2 — first DMA buffer is stale; needs throwaway read
Peak just above 0 dBFS on one-sample dropouts audio.c × 3 sites L3 — `abs(INT16_MIN) = 32768` overflows
`wifi disconnect` immediately reconnected wifi.c L4 — event handler needed user-intent flag
Console couldn't parse `ssid="…"` console_protocol.c — — tokenizer didn't honour quotes
`MBEDTLS_CRYPTO=n` ignored sdkconfig L6 — kconfig `select` trap
`wifi_init ESP_ERR_NO_MEM` with 39 KB free sdkconfig L7 — `STATIC_TX_BUFFER_NUM=16` is huge
`tap_hit` EVT unobservable from host L9 — async EVT framework gap (v1.8 backlog)

Quality infrastructure

  • `docs/lessons-v1.7.md` — ten lessons in the format What broke / Root cause / Why we missed it / Process change. Reference for every future release.
  • `tools/smoke.ps1` — pre-release gate. 13 cases, all regression-anchored: 4 host (`doctor 8/8`, `pytest 3/3`, `sim diff 13/13`, `manifest ≥17`) + 9 device (`?ping`, `?stat` fps/scene_count, `?sys`, audio tone bytes > 0, audio mic peak in [-90,-10], `?ota info`, scene switch). Each device case that traces back to a lesson carries an `(L regression)` tag — failures surface both symptom and root-cause doc reference.

Verification

Smoke gate: 13 / 13 green on hardware. Sim diff regression: 13 / 13 identical.

🤖 Generated with Claude Code

v1.7.0 — Connectivity + Real OTA

21 May 16:43

Choose a tag to compare

Connectivity + real OTA. Closes the gap between the v1.6 `?ota` skeleton (info / mark-valid / rollback only) and a device that can actually pull a new image over the air. Pairs with WiFi STA connect + NVS-backed credential persistence so the loop is one command per side: connect once, then `?ota download url=…` whenever a new build lands.

Aurora firmware

  • `peripherals/wifi_creds.{h,c}` — NVS-backed credential store (namespace `wifi_cred`, 5-function API).
  • `peripherals/wifi.c` — STA-mode connect: `wifi_connect(ssid, pass, timeout_ms)` with event-group + IP_GOT gate + 3 automatic retries. Plus `wifi_disconnect / wifi_is_connected / wifi_get_status`.
  • `wifi` console command — rewritten as multi-subcommand: `scan | connect ssid=… pass=… save=1 | disconnect | forget | status`.
  • `?ota` console command — gains `download url=…` via `esp_https_ota` streaming with integer-percent EVT progress. Does not auto-reboot — host decides when to flip slots.
  • `partitions.csv` — switched from `factory 8M + storage 7M` to dual `ota_0 5M + ota_1 5M + storage 5M`. `otadata` (2 sectors) added.
  • `sdkconfig.defaults` — `CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y`, HTTP client HTTPS, dev-friendly `OTA_ALLOW_HTTP`.

Toolkit + build infra

  • `core/patches.py` — centralizes `managed_components/` patches. First entry: the `waveshare__qmi8658` upstream bug pair (missing `esp_driver_i2c` REQUIRES + `M_PI` without `#ifndef` guard).
  • `build.py` — applies patches pre-build; on failure inspects stderr for known retry signatures and retries once.
  • `examples/aurora/CMakeLists.txt` — `esp_harness_apply_known_patches()` CMake fallback. A fresh clone now builds without needing the toolkit at all.

Verification

  • Target build: clean rebuild 65.5s, 0 warnings, all artifacts produced
  • Sim diff: 13 / 13 scenes identical (no UI regression)
  • pytest: 3 / 3 passing (doctor / manifest / sim-diff)
  • Manifest: 17 toolkit commands surfaced

Deferred to v1.8

  • BLE-peripheral WiFi provisioning UI (NimBLE GATT receiver writing to the v1.7 `wifi_creds` API). API surface is ready; only the BLE side is missing.

🤖 Generated with Claude Code

v1.6.0 — Project maturity polish

21 May 16:22

Choose a tag to compare

Phase J: full docs ecosystem + brand + homepage + contributing infrastructure.

Brand identity — Logo, wordmark, social card, favicon at docs/brand/. The mark is an "H" reframed as a control harness.

Docs ecosystem — manifesto / FAQ / troubleshooting / 3 SVG diagrams / homepage HTML. Built so a fresh reader gets oriented in under 10 minutes.

Root README — proper landing page with badges, what-this-is, 30-second quickstart, three-path onboarding, comparison table.

Contributing — .github/CONTRIBUTING.md + 3 issue templates + PR template + config.

New example — examples/hello-minimal/ proves the new-template integrity.

Per-artifact polish — badges on every README, cross-linked. Old esp32-harness-showcase sibling language removed.

GitHub — description / homepage / 14 topics set via gh repo edit. Pages enabled at https://caldis.github.io/esp-harness/.

Full changelog: CHANGELOG.md