v0.1.0 - Initial public benchmark baseline
ShopPay Audit Benchmark v0.1.0
Initial public baseline release for evaluating whether AI coding agents can find business-logic defects from written product rules.
Included
SPEC.mdbusiness-rule source of truth.- Intentionally flawed ShopPay service implementation.
- Baseline tests documenting seeded defects.
BENCHMARK.mdaudit task and maintainer answer key.docs/SCORING.md100-point scoring rubric.examples/audit-report.mdsample report format.- GitHub Actions test workflow.
- Contribution guide, code of conduct, issue templates, PR template, and roadmap.
How to run
npm testExpected baseline result: all tests pass against the intentionally flawed benchmark implementation.