feat(results): materialize per-test task bundles by christso · Pull Request #1330 · EntityProcess/agentv

christso · 2026-06-08T23:55:18Z

Summary

AgentV runs can now carry a native, per-test task bundle beside each result row. Instead of a parallel run-source schema, each completed result can point from index.jsonl to task/EVAL.yaml, task/targets.yaml, copied input files, and copied grader assets, giving future audit/rerun flows a normal AgentV eval contract to consume.

This hard-deprecates the unreleased run-source.json surface rather than preserving a compatibility reader. Historical runs still load without task metadata; new bundle-capable runs expose focused index links (artifact_dir, task_dir, eval_path, targets_path, files_path, graders_path) and Dashboard file APIs can traverse those task paths.

Key design choices:

Area	Decision
Portable unit	Per-test `task/` folder, not run-level source/recipe JSON
Rerunnable source	Native `EVAL.yaml` plus selected `targets.yaml`
Secret handling	Preserve `${{ ENV_VAR }}` placeholders; redact literal secret-looking values and secret files
Output safety	Materialization never creates `.agentv/results` under the test artifact folder or `task/`
Dry-run aliases	Result rows may use resolved target names, while `task/EVAL.yaml` preserves the selected target name

Verification

bun test apps/cli/test/commands/eval/task-bundle.test.ts apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/results/serve.test.ts apps/cli/test/commands/results/combine.test.ts (128 pass)
bun --filter agentv typecheck
Pre-push hook: bun --filter @agentv/core typecheck && bun --filter @agentv/phoenix-adapter typecheck && bun --filter agentv typecheck, then biome check .

Red/Green UAT

Used the same dry-run eval with input_files on origin/main and this branch. The dry-run quality score was the same on both sides; the checked behavior was artifact materialization.

State	Evidence
Red (`origin/main`)	`index.jsonl` had no `task_dir`; no `task/EVAL.yaml`; no copied task file (`/tmp/agentv-red-uat.nlqOYC`)
Green (this branch)	`index.jsonl` included `task_dir`, `eval_path`, `targets_path`, and `files_path`; `task/EVAL.yaml` preserved `target: mock-target`; the fixture was copied under `task/files/external/...`; no nested `.agentv/results` (`/tmp/agentv-green-uat.SKAgFt`)

christso added 3 commits June 8, 2026 23:52

chore(beads): finalize task bundle artifact design

84dee2f

chore(beads): dispatch task bundle worker

94b4561

feat(results): materialize per-test task bundles

2d80dfb

christso merged commit d678615 into main Jun 9, 2026
8 checks passed

christso deleted the feat/av-wy0.3-task-bundles branch June 9, 2026 01:35

christso mentioned this pull request Jun 9, 2026

feat(cli): rerun captured task bundles #1335

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(results): materialize per-test task bundles#1330

feat(results): materialize per-test task bundles#1330
christso merged 3 commits into
mainfrom
feat/av-wy0.3-task-bundles

christso commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Jun 8, 2026

Summary

Verification

Red/Green UAT

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant