Skip to content

Add retrieval benchmark harness#15

Merged
TheGreenCedar merged 1 commit into
mainfrom
codex/accuracy-pass-05-benchmark-eval
Jun 3, 2026
Merged

Add retrieval benchmark harness#15
TheGreenCedar merged 1 commit into
mainfrom
codex/accuracy-pass-05-benchmark-eval

Conversation

@TheGreenCedar
Copy link
Copy Markdown
Owner

Summary

  • Adds holdout/local-real benchmark task manifests and expands the task schema/readme.
  • Updates the agent A/B benchmark for mandatory sidecar provenance, packet runtime scoring, and cache preparation.
  • Adds value scoring and benchmark-contract scripts with tests, plus a small bench compile fix.

Tests

  • cargo check -p codestory-bench --benches
  • node --check scripts/codestory-agent-ab-benchmark.mjs
  • node --check scripts/codestory-agent-value-score.mjs
  • node --check scripts/codestory-benchmark-contract.mjs
  • node --check scripts/fetch-holdout-repos.mjs
  • node --test scripts/tests/codestory-agent-ab-analyzer.test.mjs scripts/tests/codestory-agent-value-score.test.mjs scripts/tests/codestory-benchmark-contract.test.mjs
  • cargo fmt --check
  • git diff --check

Stack

Base: codex/accuracy-pass-04-cli-retrieval (#14)
Next: codex/accuracy-pass-06-docs-ci

@TheGreenCedar TheGreenCedar changed the base branch from codex/accuracy-pass-04-cli-retrieval to main June 3, 2026 00:17
@TheGreenCedar TheGreenCedar merged commit bbe36fa into main Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant