Objective
Add an agentv bundle command that compiles an EVAL.yaml and all its referenced assets into a self-contained, portable directory that can be run without access to the original repo.
Problem
An EVAL.yaml references external assets scattered across the filesystem:
tests: ./cases.jsonl — external test data
workspace: ./template/ — workspace template directory
use_target: default — delegation chains in targets.yaml
${{ ENV_VAR }} — environment variable interpolation
command: ./scripts/verify.sh — code-grader scripts
hooks.before_each.command: ./setup.sh — setup/teardown scripts
Reproducing a run requires the entire repo at the correct commit, the right env vars, and knowledge of which CLI flags were used. Sharing results with a colleague means "clone my repo, checkout commit abc123, set these env vars, run this command."
Design
agentv bundle is a compiler step that resolves and inlines all external references into a self-contained output directory.
agentv bundle my-eval.yaml --output ./bundled-eval/
What it does
- Inlines test data — resolves
tests: ./cases.jsonl into inline test cases
- Copies workspace assets — copies workspace template directory contents into the bundle
- Flattens target config — resolves
use_target chains into concrete target definitions
- Copies scripts — copies code-grader scripts, setup/teardown hooks, prompt files
- Records provenance — AgentV version, git commit (if in a repo), timestamp, CLI flags
- Optionally resolves env vars —
--capture-env flag to snapshot resolved ${{ VAR }} values (redacted by default for secrets)
Output structure
bundled-eval/
eval.yaml # fully resolved, all references inlined or relative to bundle
workspace/ # copied workspace template
scripts/ # copied grader and hook scripts
data/ # inlined test data (if external)
manifest.json # provenance: agentv version, git commit, timestamp, source paths
Usage
# Bundle an eval
agentv bundle evals/my-benchmark.eval.yaml -o benchmark-v1/
# Run from a bundle (just a normal eval run — the bundle IS an eval directory)
agentv eval benchmark-v1/eval.yaml
# Share it
zip -r benchmark-v1.zip benchmark-v1/
# Inspect provenance
cat benchmark-v1/manifest.json
Key design decisions
- The bundle is just a directory with a resolved EVAL.yaml — no new format, no special runtime.
agentv eval runs it like any other eval.
- Opt-in, not required — normal
agentv eval continues to work with repo-based eval files. Bundling is for sharing and reproducibility.
- No secrets by default — env var values are NOT captured unless
--capture-env is explicitly passed. Manifest records env var names only.
- Idempotent — bundling an already-bundled directory is a no-op.
Non-goals
- Not a Docker image or container
- Not a replacement for EVAL.yaml as the authoring format
- Not required for normal eval workflows
Acceptance signals
Objective
Add an
agentv bundlecommand that compiles an EVAL.yaml and all its referenced assets into a self-contained, portable directory that can be run without access to the original repo.Problem
An EVAL.yaml references external assets scattered across the filesystem:
tests: ./cases.jsonl— external test dataworkspace: ./template/— workspace template directoryuse_target: default— delegation chains in targets.yaml${{ ENV_VAR }}— environment variable interpolationcommand: ./scripts/verify.sh— code-grader scriptshooks.before_each.command: ./setup.sh— setup/teardown scriptsReproducing a run requires the entire repo at the correct commit, the right env vars, and knowledge of which CLI flags were used. Sharing results with a colleague means "clone my repo, checkout commit abc123, set these env vars, run this command."
Design
agentv bundleis a compiler step that resolves and inlines all external references into a self-contained output directory.What it does
tests: ./cases.jsonlinto inline test casesuse_targetchains into concrete target definitions--capture-envflag to snapshot resolved${{ VAR }}values (redacted by default for secrets)Output structure
Usage
Key design decisions
agentv evalruns it like any other eval.agentv evalcontinues to work with repo-based eval files. Bundling is for sharing and reproducibility.--capture-envis explicitly passed. Manifest records env var names only.Non-goals
Acceptance signals
agentv bundleproduces a self-contained directory from an EVAL.yamlagentv evalon a different machine without the original repomanifest.jsonrecords provenance (agentv version, git commit, timestamp)