Skip to content

[P2] Build Code2TestBench harness + validate metrics #14

@pro-utkarshM

Description

@pro-utkarshM

Problem

Proposal §7 promises Code2TestBench and headline metrics (70% acceptance, 80% first-run pass). No benchmark harness exists; metrics are unvalidated.

Tasks

  • Harness: run generation against sample repos (flask, requests) with their real tests hidden.
  • Measure acceptance rate, first-run pass rate, diagnostic accuracy.
  • Record results in docs; reconcile with proposal targets.

Acceptance

Reproducible benchmark command + a results table committed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Validation / polish

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions