feat: align repeat config with attempt artifacts by christso · Pull Request #1608 · EntityProcess/agentv

christso · 2026-07-02T15:38:08Z

Summary

Repeat authoring now follows the Promptfoo-style shape while AgentV keeps its richer produced-attempt artifacts. YAML authors can put evaluate_options.repeat: 3 or a richer repeat object in evaluate_options.repeat, and individual cases can override the repeat policy with tests[].options.repeat.

This also finishes the vocabulary split: repeat is configuration, attempts[] is produced execution metadata, and per-execution sidecars live in attempt-N/ directories. Result readers keep compatibility fallbacks for older trials[] / run_path manifests while new writers emit the canonical attempt shape.

Area	Outcome
YAML/schema	Accepts numeric and object `evaluate_options.repeat`; rejects removed top-level `repeat` with migration guidance
Per-case overrides	Applies `tests[].options.repeat` over the global repeat object/count
Artifacts	Emits `attempts[]`, `attempt_path`, `attempt-N/`, and `total_attempts`/`passed_attempts` summaries
Readers	CLI results and Dashboard prefer `attempts` while still reading legacy `trials` and `run_path`
Docs/examples	Updates public docs, examples, verification notes, and eval-writer schema guidance

Validation

bun run build
bun run lint
bun run typecheck
bun run test
bun run validate:examples
Focused parser/schema/artifact/dashboard/sdk tests during development:
- bun test packages/core/test/evaluation/experiment.test.ts packages/core/test/evaluation/eval-inline-experiment.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/results/serve.test.ts apps/dashboard/src/lib/result-table.test.ts packages/sdk/test/eval-authoring.test.ts
- bun test apps/cli/test/eval.integration.test.ts apps/cli/test/commands/prepare/prepare.test.ts

Evidence

Live dogfood run through the local OpenAI-compatible endpoint: http://127.0.0.1:10531/v1
Live model/target and grader model: gpt-5.3-codex-spark
Evidence branch: EntityProcess/agentv-private:evidence/av-d64j-repeat-config
Evidence commit: d42dceb
Evidence contents: source eval/targets, canonical run bundle, artifact tree, and contract-check.json showing:
- global-repeat-shorthand emitted 2 attempts with attempt-1, attempt-2
- per-case-repeat-object emitted 3 attempts with attempt-1, attempt-2, attempt-3
- manifest rows use attempts[] and do not emit legacy top-level trials

Code Review

Simplify/code-review pass completed before PR. No actionable residual findings.

Post-Deploy Monitoring & Validation

No production service deployment is required for this package/schema/docs change. After publishing or merging, validate by watching:

CI checks for schema sync, example validation, package build, CLI tests, and Dashboard tests.
Any docs/example validation failures mentioning removed top-level repeat.
Any consumer reports where Dashboard or agentv results serve/validate/export cannot read older manifests with trials[] or run_path.

Healthy signal: new eval files with evaluate_options.repeat validate, new repeat runs write attempts[] and attempt-N/, and older run bundles remain readable. Rollback trigger: CI or dogfood shows newly written bundles missing attempt sidecars or existing legacy bundles becoming unreadable.

Related: Bead av-d64j.

cloudflare-workers-and-pages · 2026-07-02T15:38:44Z

Deploying agentv with Cloudflare Pages

Latest commit:	`35456a0`
Status:	✅ Deploy successful!
Preview URL:	https://68d7783a.agentv.pages.dev
Branch Preview URL:	https://feat-av-d64j-repeat-config.agentv.pages.dev

View logs

christso · 2026-07-02T15:49:13Z

Review verdict: changes requested.

Findings:

P2: Remaining user-facing repeat-attempt vocabulary still says runs/trials. This PR establishes repeat as authored configuration and attempts / attempt-N as produced executions, but the Dashboard repeat UI still renders Run success, Passed runs, runs passed, and runs at apps/dashboard/src/components/EvalDetail.tsx:621, apps/dashboard/src/components/EvalDetail.tsx:623, apps/dashboard/src/components/ResultTable.tsx:714, and apps/dashboard/src/components/ResultTable.tsx:721. The CLI also exposes the internal name in warnings, e.g. trials.count at packages/core/src/evaluation/orchestrator.ts:816 and packages/core/src/evaluation/orchestrator.ts:880. These should use attempts / repeat count / evaluate_options.repeat wording so new Dashboard and CLI output does not reintroduce the legacy public vocabulary.

Checks run: local diff and targeted rg inspection across schema, parser, artifact writer, CLI readers, Dashboard, docs/examples/skills; git diff --check origin/main...HEAD.

christso · 2026-07-02T16:04:22Z

Review verdict: changes requested.

Findings:

P2: Remaining public repeat-attempt vocabulary still says run/trial in the attempt-facing surfaces. The previous Dashboard aggregate labels and CLI warnings are fixed, and CI is green for 36e9680, but the selected attempt detail still renders Run score in apps/dashboard/src/components/EvalDetail.tsx:732 and says This run does not include a transcript artifact from the selected-attempt transcript tab at apps/dashboard/src/components/EvalDetail.tsx:897. Public docs/API comments also still describe the new attempt artifact layout with old terms: apps/web/src/content/docs/docs/tools/results.mdx:130 says attempt details live under run-N/, and the exported core types still say Configuration for running multiple trials per eval case, Result of a single trial attempt, and run-N folders at packages/core/src/evaluation/types.ts:1075, packages/core/src/evaluation/types.ts:1086, and packages/core/src/evaluation/types.ts:1103. These should use attempts / attempt-N wording so the new public contract does not continue leaking the legacy vocabulary.

Checks run: git fetch origin --prune; git status --short --branch; gh pr view 1608 --json ...; fetched PR head into refs/review/pr-1608-head; inspected git diff origin/main...refs/review/pr-1608-head; targeted git grep/git show inspection across Dashboard, CLI warnings, docs, artifacts, tests, and exported type comments; git diff --check origin/main...refs/review/pr-1608-head. No tests/builds run per research-only review instructions and because the fresh PR CI is green.

christso · 2026-07-02T16:11:05Z

Addressed the re-review vocabulary finding in 35456a0.\n\nChanges:\n- selected-attempt Dashboard detail now uses Attempt score and This attempt does not include a transcript artifact.\n- results docs now describe per-attempt artifacts under attempt-N/ instead of run-N/\n- exported core comments now describe repeated attempts / attempt-N folders while preserving compatibility type names\n\nLocal validation:\n- bun run --cwd apps/dashboard test\n- bun run --cwd apps/dashboard build\n- bun run --cwd packages/core lint\n- bun run --cwd packages/core typecheck\n- bun run --cwd apps/web build\n- bun run lint\n- git diff --check\n- targeted greps for reviewed stale strings

christso · 2026-07-02T16:15:32Z

Final re-review verdict: clean.

The prior vocabulary finding is resolved at head 35456a00e1b6be9c1b91d5a19851bece34a576ea. The selected-attempt Dashboard detail now says Attempt score and This attempt does not include a transcript artifact; the tools/results docs now use attempt-N/ for per-attempt artifacts; and the exported core comments now describe repeated attempts / attempt-N folders.

I did not find new blockers in the latest three-file fix. Fresh PR CI is green for this head: Build, Typecheck, Lint, Test, Check Links, Validate Marketplace, Validate Evals, and Cloudflare Pages all succeeded. The orchestrator may proceed to ready/merge.

Checks run: git fetch origin --prune; git status --short --branch; inspected origin/main...origin/feat/av-d64j-repeat-config; reviewed 36e96801..35456a00 for EvalDetail.tsx, tools/results.mdx, and packages/core/src/evaluation/types.ts; targeted git grep for the stale strings from the prior finding; git diff --check origin/main...origin/feat/av-d64j-repeat-config; gh pr view 1608 --json .... No local builds/tests/evals run per research-only instructions and because CI evidence was sufficient.

feat: align repeat config with attempt artifacts

46e9e61

fix(core): use attempt wording for repeat output

36e9680

christso force-pushed the feat/av-d64j-repeat-config branch from 3b1642c to 36e9680 Compare July 2, 2026 15:57

fix(dashboard): use attempt wording in repeat details

35456a0

christso marked this pull request as ready for review July 2, 2026 16:16

christso merged commit 74b961c into main Jul 2, 2026
8 checks passed

christso deleted the feat/av-d64j-repeat-config branch July 2, 2026 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: align repeat config with attempt artifacts#1608

feat: align repeat config with attempt artifacts#1608
christso merged 3 commits into
mainfrom
feat/av-d64j-repeat-config

christso commented Jul 2, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jul 2, 2026

Summary

Validation

Evidence

Code Review

Post-Deploy Monitoring & Validation

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading