05 Jun 18:44

Dmatut7

a59d1ef

v0.1.2 - Reference run and outreach update Latest

Latest

ShopPay Audit Benchmark v0.1.2

Outreach and reference-run update.

Added

First scored Codex reference audit run.
JSON scorecard example for benchmark comparisons.
Outreach log update with four public directory PR submissions.

ShopPay Audit Benchmark v0.1.1

Promotion and discovery update for the initial public benchmark.

Added

GitHub Pages landing page: https://dmatut7.github.io/shoppay-audit-benchmark/
Social card artwork for sharing.
Promotion kit with launch copy for X, Hacker News, Reddit, LinkedIn, and blogs.
llms.txt, robots.txt, sitemap, and structured metadata.
Security policy and citation metadata.
GitHub profile README link to the project.

Validation

npm test

Expected result: 9 passing baseline tests.

Assets 2

05 Jun 18:24

Dmatut7

v0.1.0

2c8244c

v0.1.0 - Initial public benchmark baseline

ShopPay Audit Benchmark v0.1.0

Initial public baseline release for evaluating whether AI coding agents can find business-logic defects from written product rules.

Included

SPEC.md business-rule source of truth.
Intentionally flawed ShopPay service implementation.
Baseline tests documenting seeded defects.
BENCHMARK.md audit task and maintainer answer key.
docs/SCORING.md 100-point scoring rubric.
examples/audit-report.md sample report format.
GitHub Actions test workflow.
Contribution guide, code of conduct, issue templates, PR template, and roadmap.

How to run

npm test

Expected baseline result: all tests pass against the intentionally flawed benchmark implementation.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

ShopPay Audit Benchmark v0.1.2

Added

Links

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

ShopPay Audit Benchmark v0.1.1

Added

Validation

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

ShopPay Audit Benchmark v0.1.0

Included

How to run

Uh oh!

Releases: Dmatut7/shoppay-audit-benchmark

v0.1.2 - Reference run and outreach update

ShopPay Audit Benchmark v0.1.2

Added

Links

Uh oh!

v0.1.1 - Promotion and discovery update

ShopPay Audit Benchmark v0.1.1

Added

Validation

Uh oh!

v0.1.0 - Initial public benchmark baseline

ShopPay Audit Benchmark v0.1.0

Included

How to run

Uh oh!