Skip to content

Releases: dsj1984/mandrel-bench

mandrel-bench: v0.5.0

26 Jun 11:47
de81bbb

Choose a tag to compare

0.5.0 (2026-06-24)

Added

  • bench: differential-trap spike apparatus — auth-trap scenario (refs #57) (#63) (b60e2f1)
  • results: 1.75.0 cohort — mandrel@1.75.0 / claude-opus-4-8 (#62) (e888d0c)

Fixed

  • bench: git-exclude the framework overlay so it never enters the deliverable diff (#58) (97e5e1e)
  • bench: security scanner measured the overlaid framework, not the deliverable (#53) (54802d8)
  • bench: stop counting test-fixture creds as secrets; proportional secret penalty (refs #55) (#59) (e1a7c40)

mandrel-bench: v0.4.0

19 Jun 18:14
c55a73c

Choose a tag to compare

0.4.0 (2026-06-19)

Added

  • agents: add durable /benchmark workflow under .agents/local (#45) (c43ee5f)
  • bench: instrument the standalone path so its value dims are measured (#48) (#51) (bd5e517)
  • Epic #32 (#43) (955684a)
  • project-api as the 1.75.0 Epic rung + first complete 1.75.0 cohort (closes #50) (#52) (e152ab3)

Fixed

  • bench: skip npm audit without a lockfile; allow project-api scenario (#49) (57e40b3)
  • score: null (not a default) for ledger-derived dims when no ledger (#47) (c3f4a32)

mandrel-bench: v0.3.0

18 Jun 00:47
3eb5da6

Choose a tag to compare

0.3.0 (2026-06-18)

Added

  • bench: batch-ready run orchestrator — resumable, cost-bounded loop (refs #22) (#24) (9d4d871)
  • bench: drive the mandrel arm via /plan --idea --yes (headless, fresh Epic per run) (#28) (81d5093)
  • bench: make mandrel-arm runs clean and repeatable (#27) (4aaf208)
  • restructure results/ into per-cohort directories and add a generated zero-dep results.html dashboard (#17) (#19) (dfe8c13)
  • results: first N=8 baseline cohort — mandrel@1.72.0 / claude-opus-4-8 (refs #23) (#29) (5100d9d)

Fixed

  • bench: render the value-add report over the full cohort store (resume-safe) (#31) (e564b3d)
  • bench: sanitize GITHUB_TOKEN before gh in resetSandboxBaseline (#30) (a50cfe5)

mandrel-bench: v0.2.0

17 Jun 01:23
005d731

Choose a tag to compare

0.2.0 (2026-06-17)

Added

  • bench: wire harness end-to-end + first benchmark result (Epic #2) (#15) (e21c42f)
  • bootstrap mandrel-bench — re-home self-benchmark harness from mandrel#4211 (1287546)

Fixed

  • ci: exclude generated CHANGELOG from markdownlint (refs #14) (#18) (88aba4c)
  • ci: green up test discovery, markdown lint, and biome config (d6d3e9e)
  • docs: stop MD004 reading "+ noise-band" as a list bullet (f4ddd0f)