Summary
workspace.hooks.before_each.reset is parsed and documented, but reset-only hooks are ignored during normal eval execution. In static workspaces this leaves the shared git baseline and prior run state in place, so file_changes can diff against stale committed content from a previous run.
Repro
- Configure an eval with
workspace.path and hooks.before_each.reset: fast.
- Run the eval twice against the same static workspace.
- Inspect the workspace git history and the grader
file_changes payload.
Expected
A reset-only before_each hook should actually reset the workspace before the agent run, and file-change capture should compare against a fresh baseline for that reset state.
Actual
before_each.reset has no effect unless a script/command is also present. Static workspaces accumulate agentv-baseline commits and later runs can produce diffs relative to stale prior-run state.
Notes
Observed against a real eval that writes a CSV artifact into a static workspace path on Windows.
Summary
workspace.hooks.before_each.resetis parsed and documented, but reset-only hooks are ignored during normal eval execution. In static workspaces this leaves the shared git baseline and prior run state in place, sofile_changescan diff against stale committed content from a previous run.Repro
workspace.pathandhooks.before_each.reset: fast.file_changespayload.Expected
A reset-only
before_eachhook should actually reset the workspace before the agent run, and file-change capture should compare against a fresh baseline for that reset state.Actual
before_each.resethas no effect unless a script/command is also present. Static workspaces accumulateagentv-baselinecommits and later runs can produce diffs relative to stale prior-run state.Notes
Observed against a real eval that writes a CSV artifact into a static workspace path on Windows.