feat: immutable run bundle / reproducibility artifact

## Objective

Add an `agentv bundle` command that compiles an EVAL.yaml and all its referenced assets into a self-contained, portable directory that can be run without access to the original repo.

## Problem

An EVAL.yaml references external assets scattered across the filesystem:
- `tests: ./cases.jsonl` — external test data
- `workspace: ./template/` — workspace template directory
- `use_target: default` — delegation chains in targets.yaml
- `${{ ENV_VAR }}` — environment variable interpolation
- `command: ./scripts/verify.sh` — code-grader scripts
- `hooks.before_each.command: ./setup.sh` — setup/teardown scripts

Reproducing a run requires the entire repo at the correct commit, the right env vars, and knowledge of which CLI flags were used. Sharing results with a colleague means "clone my repo, checkout commit abc123, set these env vars, run this command."

## Design

`agentv bundle` is a compiler step that resolves and inlines all external references into a self-contained output directory.

```bash
agentv bundle my-eval.yaml --output ./bundled-eval/
```

### What it does

1. **Inlines test data** — resolves `tests: ./cases.jsonl` into inline test cases
2. **Copies workspace assets** — copies workspace template directory contents into the bundle
3. **Flattens target config** — resolves `use_target` chains into concrete target definitions
4. **Copies scripts** — copies code-grader scripts, setup/teardown hooks, prompt files
5. **Records provenance** — AgentV version, git commit (if in a repo), timestamp, CLI flags
6. **Optionally resolves env vars** — `--capture-env` flag to snapshot resolved `${{ VAR }}` values (redacted by default for secrets)

### Output structure

```
bundled-eval/
  eval.yaml              # fully resolved, all references inlined or relative to bundle
  workspace/             # copied workspace template
  scripts/               # copied grader and hook scripts
  data/                  # inlined test data (if external)
  manifest.json          # provenance: agentv version, git commit, timestamp, source paths
```

### Usage

```bash
# Bundle an eval
agentv bundle evals/my-benchmark.eval.yaml -o benchmark-v1/

# Run from a bundle (just a normal eval run — the bundle IS an eval directory)
agentv eval benchmark-v1/eval.yaml

# Share it
zip -r benchmark-v1.zip benchmark-v1/

# Inspect provenance
cat benchmark-v1/manifest.json
```

## Key design decisions

1. **The bundle is just a directory with a resolved EVAL.yaml** — no new format, no special runtime. `agentv eval` runs it like any other eval.
2. **Opt-in, not required** — normal `agentv eval` continues to work with repo-based eval files. Bundling is for sharing and reproducibility.
3. **No secrets by default** — env var values are NOT captured unless `--capture-env` is explicitly passed. Manifest records env var names only.
4. **Idempotent** — bundling an already-bundled directory is a no-op.

## Non-goals

- Not a Docker image or container
- Not a replacement for EVAL.yaml as the authoring format
- Not required for normal eval workflows

## Acceptance signals

- [ ] `agentv bundle` produces a self-contained directory from an EVAL.yaml
- [ ] The bundled directory can be run with `agentv eval` on a different machine without the original repo
- [ ] `manifest.json` records provenance (agentv version, git commit, timestamp)
- [ ] External test data, workspace templates, and scripts are all resolved into the bundle
- [ ] Target delegation chains are flattened
- [ ] Env var values are redacted by default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: immutable run bundle / reproducibility artifact #1133

Objective

Problem

Design

What it does

Output structure

Usage

Key design decisions

Non-goals

Acceptance signals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: immutable run bundle / reproducibility artifact #1133

Description

Objective

Problem

Design

What it does

Output structure

Usage

Key design decisions

Non-goals

Acceptance signals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions