Skip to content

feat(results): materialize per-test task bundles#1330

Merged
christso merged 3 commits into
mainfrom
feat/av-wy0.3-task-bundles
Jun 9, 2026
Merged

feat(results): materialize per-test task bundles#1330
christso merged 3 commits into
mainfrom
feat/av-wy0.3-task-bundles

Conversation

@christso

@christso christso commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

AgentV runs can now carry a native, per-test task bundle beside each result row. Instead of a parallel run-source schema, each completed result can point from index.jsonl to task/EVAL.yaml, task/targets.yaml, copied input files, and copied grader assets, giving future audit/rerun flows a normal AgentV eval contract to consume.

This hard-deprecates the unreleased run-source.json surface rather than preserving a compatibility reader. Historical runs still load without task metadata; new bundle-capable runs expose focused index links (artifact_dir, task_dir, eval_path, targets_path, files_path, graders_path) and Dashboard file APIs can traverse those task paths.

Key design choices:

Area Decision
Portable unit Per-test task/ folder, not run-level source/recipe JSON
Rerunnable source Native EVAL.yaml plus selected targets.yaml
Secret handling Preserve ${{ ENV_VAR }} placeholders; redact literal secret-looking values and secret files
Output safety Materialization never creates .agentv/results under the test artifact folder or task/
Dry-run aliases Result rows may use resolved target names, while task/EVAL.yaml preserves the selected target name

Verification

  • bun test apps/cli/test/commands/eval/task-bundle.test.ts apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/results/serve.test.ts apps/cli/test/commands/results/combine.test.ts (128 pass)
  • bun --filter agentv typecheck
  • Pre-push hook: bun --filter @agentv/core typecheck && bun --filter @agentv/phoenix-adapter typecheck && bun --filter agentv typecheck, then biome check .

Red/Green UAT

Used the same dry-run eval with input_files on origin/main and this branch. The dry-run quality score was the same on both sides; the checked behavior was artifact materialization.

State Evidence
Red (origin/main) index.jsonl had no task_dir; no task/EVAL.yaml; no copied task file (/tmp/agentv-red-uat.nlqOYC)
Green (this branch) index.jsonl included task_dir, eval_path, targets_path, and files_path; task/EVAL.yaml preserved target: mock-target; the fixture was copied under task/files/external/...; no nested .agentv/results (/tmp/agentv-green-uat.SKAgFt)

Compound Engineering
Codex

@christso christso merged commit d678615 into main Jun 9, 2026
8 checks passed
@christso christso deleted the feat/av-wy0.3-task-bundles branch June 9, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant