Skip to content

Releases: Dmatut7/shoppay-audit-benchmark

v0.1.2 - Reference run and outreach update

05 Jun 18:44

Choose a tag to compare

ShopPay Audit Benchmark v0.1.2

Outreach and reference-run update.

Added

  • First scored Codex reference audit run.
  • JSON scorecard example for benchmark comparisons.
  • Outreach log update with four public directory PR submissions.

Links

v0.1.1 - Promotion and discovery update

05 Jun 18:31

Choose a tag to compare

ShopPay Audit Benchmark v0.1.1

Promotion and discovery update for the initial public benchmark.

Added

  • GitHub Pages landing page: https://dmatut7.github.io/shoppay-audit-benchmark/
  • Social card artwork for sharing.
  • Promotion kit with launch copy for X, Hacker News, Reddit, LinkedIn, and blogs.
  • llms.txt, robots.txt, sitemap, and structured metadata.
  • Security policy and citation metadata.
  • GitHub profile README link to the project.

Validation

npm test

Expected result: 9 passing baseline tests.

v0.1.0 - Initial public benchmark baseline

05 Jun 18:24

Choose a tag to compare

ShopPay Audit Benchmark v0.1.0

Initial public baseline release for evaluating whether AI coding agents can find business-logic defects from written product rules.

Included

  • SPEC.md business-rule source of truth.
  • Intentionally flawed ShopPay service implementation.
  • Baseline tests documenting seeded defects.
  • BENCHMARK.md audit task and maintainer answer key.
  • docs/SCORING.md 100-point scoring rubric.
  • examples/audit-report.md sample report format.
  • GitHub Actions test workflow.
  • Contribution guide, code of conduct, issue templates, PR template, and roadmap.

How to run

npm test

Expected baseline result: all tests pass against the intentionally flawed benchmark implementation.