Skip to content

spike: sample merged PRs in agentv and classify eval-case yield #1156

@christso

Description

@christso

Part of #1155.

Objective

Validate the premise behind the PR/issue-mining direction by sampling real merged PRs in this repo and checking how many convert to useful eval cases. If yield is too low, the dependent work (sub-issue on skill extension) should be rescoped or dropped.

Scope

  • Take the 20 most recent merged PRs in main.
  • For each, classify as one of:
    • useful: a plausible eval case — the PR title/body is a task spec, the diff represents a behavioral change an agent could reproduce.
    • not useful: typo fix, version bump, dep update, pure refactor with no behavior change, doc-only, or too small to be meaningful.
  • Record classification with a one-line reason per PR.
  • Report yield percentage.
  • Recommendation: proceed, rescope, or drop.

Acceptance signals

  • A short markdown note (in this issue, as a comment, or linked from the research repo) with a table: PR number, one-line summary, classification, reason.
  • Yield percentage computed.
  • Yes/no/rescope recommendation with rationale.
  • No code changes.

Non-goals

  • Not building any tooling to mine PRs programmatically — this is manual classification.
  • Not evaluating quality of generated cases (we aren't generating any here).
  • Not extending beyond 20 PRs unless initial signal is borderline.

Rule of thumb

  • ≥50% useful: proceed with the skill extension as proposed.
  • 30-50%: proceed but narrow the scope (e.g., filter by label, commit message pattern, PR size).
  • <30%: rescope or drop — the premise doesn't hold for this codebase.

Blocks

Sub-issue for agentv-eval-writer extension (see #1155).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesteval-writerWork on or enabling the agentv-eval-writer skill

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions