Skip to content

v0.1.0 - Initial public benchmark baseline

Choose a tag to compare

@Dmatut7 Dmatut7 released this 05 Jun 18:24
· 7 commits to main since this release

ShopPay Audit Benchmark v0.1.0

Initial public baseline release for evaluating whether AI coding agents can find business-logic defects from written product rules.

Included

  • SPEC.md business-rule source of truth.
  • Intentionally flawed ShopPay service implementation.
  • Baseline tests documenting seeded defects.
  • BENCHMARK.md audit task and maintainer answer key.
  • docs/SCORING.md 100-point scoring rubric.
  • examples/audit-report.md sample report format.
  • GitHub Actions test workflow.
  • Contribution guide, code of conduct, issue templates, PR template, and roadmap.

How to run

npm test

Expected baseline result: all tests pass against the intentionally flawed benchmark implementation.