Parent tracker: #209
Goal
Replace the one-example downstream fixture with a public-safe benchmark pack containing at least 100 labeled candidate-reranking examples.
Scope
- Define source selection and license policy for benchmark tasks and candidate patches.
- Preserve fixture mode for tests and local development.
- Emit source/license, split-leakage, readiness, manifest, and secret-scan reports.
- Keep candidate code as untrusted text; do not import or execute it.
Acceptance Criteria
codelewm eval downstream-pack builds the scaled pack from a checked-in config.
- Readiness reports
example_count >= 100 and no source/license or split-leakage blockers.
- Artifacts are JSON-native, checksum-verifiable, and secret-scanned.
- Docs state exactly what labels mean and which claims are still blocked before evaluation.
Parent tracker: #209
Goal
Replace the one-example downstream fixture with a public-safe benchmark pack containing at least 100 labeled candidate-reranking examples.
Scope
Acceptance Criteria
codelewm eval downstream-packbuilds the scaled pack from a checked-in config.example_count >= 100and no source/license or split-leakage blockers.