Skip to content

data: build public-safe 100-example downstream reranking set #210

@AbdelStark

Description

@AbdelStark

Parent tracker: #209

Goal

Replace the one-example downstream fixture with a public-safe benchmark pack containing at least 100 labeled candidate-reranking examples.

Scope

  • Define source selection and license policy for benchmark tasks and candidate patches.
  • Preserve fixture mode for tests and local development.
  • Emit source/license, split-leakage, readiness, manifest, and secret-scan reports.
  • Keep candidate code as untrusted text; do not import or execute it.

Acceptance Criteria

  • codelewm eval downstream-pack builds the scaled pack from a checked-in config.
  • Readiness reports example_count >= 100 and no source/license or split-leakage blockers.
  • Artifacts are JSON-native, checksum-verifiable, and secret-scanned.
  • Docs state exactly what labels mean and which claims are still blocked before evaluation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:dataArea: dataarea:evaluationArea: evaluationeffort:lLarge multi-file implementation changepriority:p1Required for v1.0 or core follow-throughspec:rfc-0013Derived from RFC-0013type:featureFeature implementation work

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions