Summary
AgentV should finish the transition to canonical run workspaces and consistent terminology.
Current active design uses:
.agentv/results/runs/<run-id>/index.jsonl as the canonical persisted run layout
dataset as the persisted result field and intended filter term
test case as the more accurate concept name for individual cases
Follow-up cleanup should remove remaining support for legacy flat results.jsonl inputs and align mixed internal naming that still uses terms like suite, evalSetName, and evalCase.
Why
This reduces design drift and avoids carrying legacy compatibility into new commands like agentv trend.
Scope
Remove legacy result layout support
Audit CLI/result-loading surfaces and remove support for legacy flat result file inputs where we still accept them.
Target canonical input shape:
.agentv/results/runs/<run-id>/index.jsonl
Align terminology
Proposed renames:
| Current |
Proposed |
Why |
suite |
dataset |
aligns with persisted result field and CLI filter term |
evalSetName |
datasetName |
same reason |
evalCase |
testCase |
object is a test case, not an eval run |
evalCases |
testCases |
same reason |
rawTestcases |
rawTestCases |
same reason plus consistent casing |
legacy evalId fallback naming |
testId only where possible |
reduce dual-term confusion |
Non-Goals
- Do not change external wire format away from
dataset/test_id
- Do not block feature work like
#913
- Do not bundle risky behavioral changes unrelated to result input layout or naming cleanup
Acceptance Signals
- remaining legacy flat result-file compatibility is removed or explicitly isolated behind a deliberate compatibility boundary
- new commands only document and accept canonical run workspace inputs
- variable naming in touched areas is aligned toward
dataset and testCase
- docs and code comments use consistent terminology
Summary
AgentV should finish the transition to canonical run workspaces and consistent terminology.
Current active design uses:
.agentv/results/runs/<run-id>/index.jsonlas the canonical persisted run layoutdatasetas the persisted result field and intended filter termtest caseas the more accurate concept name for individual casesFollow-up cleanup should remove remaining support for legacy flat
results.jsonlinputs and align mixed internal naming that still uses terms likesuite,evalSetName, andevalCase.Why
This reduces design drift and avoids carrying legacy compatibility into new commands like
agentv trend.Scope
Remove legacy result layout support
Audit CLI/result-loading surfaces and remove support for legacy flat result file inputs where we still accept them.
Target canonical input shape:
Align terminology
Proposed renames:
suitedatasetevalSetNamedatasetNameevalCasetestCaseevalCasestestCasesrawTestcasesrawTestCasesevalIdfallback namingtestIdonly where possibleNon-Goals
dataset/test_id#913Acceptance Signals
datasetandtestCase