Context
Follow-up from #431 (execution status classification).
Request
Add agentv run --retry-errors <previous-output.jsonl> that reads a previous eval output, filters for results with execution_status === 'execution_error', and re-runs only those test cases.
Why
When infrastructure issues (missing runtime, network timeout) cause execution errors, users shouldn't have to re-run the entire eval suite. They should be able to fix the environment and re-run just the failed cases.
Design
- Read JSONL output file
- Filter for
execution_status === 'execution_error'
- Extract test IDs
- Re-run eval with
--filter set to those test IDs
- Merge results into a new output file
Related
Context
Follow-up from #431 (execution status classification).
Request
Add
agentv run --retry-errors <previous-output.jsonl>that reads a previous eval output, filters for results withexecution_status === 'execution_error', and re-runs only those test cases.Why
When infrastructure issues (missing runtime, network timeout) cause execution errors, users shouldn't have to re-run the entire eval suite. They should be able to fix the environment and re-run just the failed cases.
Design
execution_status === 'execution_error'--filterset to those test IDsRelated