feat: align repeat config with attempt artifacts#1608
Conversation
Deploying agentv with
|
| Latest commit: |
35456a0
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://68d7783a.agentv.pages.dev |
| Branch Preview URL: | https://feat-av-d64j-repeat-config.agentv.pages.dev |
|
Review verdict: changes requested. Findings:
Checks run: local diff and targeted |
3b1642c to
36e9680
Compare
|
Review verdict: changes requested. Findings:
Checks run: |
|
Addressed the re-review vocabulary finding in 35456a0.\n\nChanges:\n- selected-attempt Dashboard detail now uses |
|
Final re-review verdict: clean. The prior vocabulary finding is resolved at head I did not find new blockers in the latest three-file fix. Fresh PR CI is green for this head: Build, Typecheck, Lint, Test, Check Links, Validate Marketplace, Validate Evals, and Cloudflare Pages all succeeded. The orchestrator may proceed to ready/merge. Checks run: |
Summary
Repeat authoring now follows the Promptfoo-style shape while AgentV keeps its richer produced-attempt artifacts. YAML authors can put
evaluate_options.repeat: 3or a richer repeat object inevaluate_options.repeat, and individual cases can override the repeat policy withtests[].options.repeat.This also finishes the vocabulary split:
repeatis configuration,attempts[]is produced execution metadata, and per-execution sidecars live inattempt-N/directories. Result readers keep compatibility fallbacks for oldertrials[]/run_pathmanifests while new writers emit the canonical attempt shape.evaluate_options.repeat; rejects removed top-levelrepeatwith migration guidancetests[].options.repeatover the global repeat object/countattempts[],attempt_path,attempt-N/, andtotal_attempts/passed_attemptssummariesattemptswhile still reading legacytrialsandrun_pathValidation
bun run buildbun run lintbun run typecheckbun run testbun run validate:examplesbun test packages/core/test/evaluation/experiment.test.ts packages/core/test/evaluation/eval-inline-experiment.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/results/serve.test.ts apps/dashboard/src/lib/result-table.test.ts packages/sdk/test/eval-authoring.test.tsbun test apps/cli/test/eval.integration.test.ts apps/cli/test/commands/prepare/prepare.test.tsEvidence
http://127.0.0.1:10531/v1gpt-5.3-codex-sparkEntityProcess/agentv-private:evidence/av-d64j-repeat-configd42dcebcontract-check.jsonshowing:global-repeat-shorthandemitted 2 attempts withattempt-1,attempt-2per-case-repeat-objectemitted 3 attempts withattempt-1,attempt-2,attempt-3attempts[]and do not emit legacy top-leveltrialsCode Review
Simplify/code-review pass completed before PR. No actionable residual findings.
Post-Deploy Monitoring & Validation
No production service deployment is required for this package/schema/docs change. After publishing or merging, validate by watching:
repeat.agentv results serve/validate/exportcannot read older manifests withtrials[]orrun_path.Healthy signal: new eval files with
evaluate_options.repeatvalidate, new repeat runs writeattempts[]andattempt-N/, and older run bundles remain readable. Rollback trigger: CI or dogfood shows newly written bundles missing attempt sidecars or existing legacy bundles becoming unreadable.Related: Bead
av-d64j.