Releases: Dmatut7/shoppay-audit-benchmark
Releases · Dmatut7/shoppay-audit-benchmark
v0.1.2 - Reference run and outreach update
ShopPay Audit Benchmark v0.1.2
Outreach and reference-run update.
Added
- First scored Codex reference audit run.
- JSON scorecard example for benchmark comparisons.
- Outreach log update with four public directory PR submissions.
Links
- Reference audit report: https://github.com/Dmatut7/shoppay-audit-benchmark/blob/main/examples/runs/2026-06-05-codex-reference-audit.md
- Scorecard JSON: https://github.com/Dmatut7/shoppay-audit-benchmark/blob/main/examples/runs/2026-06-05-codex-reference-scorecard.json
- Outreach log: https://github.com/Dmatut7/shoppay-audit-benchmark/blob/main/docs/OUTREACH.md
v0.1.1 - Promotion and discovery update
ShopPay Audit Benchmark v0.1.1
Promotion and discovery update for the initial public benchmark.
Added
- GitHub Pages landing page: https://dmatut7.github.io/shoppay-audit-benchmark/
- Social card artwork for sharing.
- Promotion kit with launch copy for X, Hacker News, Reddit, LinkedIn, and blogs.
llms.txt,robots.txt, sitemap, and structured metadata.- Security policy and citation metadata.
- GitHub profile README link to the project.
Validation
npm testExpected result: 9 passing baseline tests.
v0.1.0 - Initial public benchmark baseline
ShopPay Audit Benchmark v0.1.0
Initial public baseline release for evaluating whether AI coding agents can find business-logic defects from written product rules.
Included
SPEC.mdbusiness-rule source of truth.- Intentionally flawed ShopPay service implementation.
- Baseline tests documenting seeded defects.
BENCHMARK.mdaudit task and maintainer answer key.docs/SCORING.md100-point scoring rubric.examples/audit-report.mdsample report format.- GitHub Actions test workflow.
- Contribution guide, code of conduct, issue templates, PR template, and roadmap.
How to run
npm testExpected baseline result: all tests pass against the intentionally flawed benchmark implementation.