Skip to content

fix: apply before_each reset-only hooks for static workspace evals #1058

@christso

Description

@christso

Summary

workspace.hooks.before_each.reset is parsed and documented, but reset-only hooks are ignored during normal eval execution. In static workspaces this leaves the shared git baseline and prior run state in place, so file_changes can diff against stale committed content from a previous run.

Repro

  1. Configure an eval with workspace.path and hooks.before_each.reset: fast.
  2. Run the eval twice against the same static workspace.
  3. Inspect the workspace git history and the grader file_changes payload.

Expected

A reset-only before_each hook should actually reset the workspace before the agent run, and file-change capture should compare against a fresh baseline for that reset state.

Actual

before_each.reset has no effect unless a script/command is also present. Static workspaces accumulate agentv-baseline commits and later runs can produce diffs relative to stale prior-run state.

Notes

Observed against a real eval that writes a CSV artifact into a static workspace path on Windows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions