feat(core): replay normalized transcripts by christso · Pull Request #1658 · EntityProcess/agentv

christso · 2026-07-05T03:40:48Z

Summary

Recorded coding-agent trajectories can now be replayed through AgentV's provider-agnostic replay target from normalized transcript JSONL. Users import expensive Copilot, Codex, or Claude sessions once, match them by stable test_id plus source_target, and rerun graders without invoking the live agent again.

This hard-removes the authored provider: copilot-log target surface. Copilot events.jsonl parsing remains available as an import/normalization adapter through agentv import copilot, while graders and Dashboard-facing artifacts consume AgentV transcript/replay output instead of provider-native logs.

Key decisions:

Area	Contract
Replay source	`provider: replay` accepts exactly one of `fixtures`, `execution_traces`, or `transcripts`
Matching	Transcript replay matches `test_id` and `source_target`; direct `--transcript` evals validate exact eval `test_id` coverage before running
Provenance	Raw provider logs stay importer/debug input; replay responses expose normalized `Message[]` plus `raw.replay_transcript` provenance
Copilot logs	Removed as authored target provider; `agentv import copilot` is the supported path

Validation

bun test packages/core/test/import/transcript-provider.test.ts packages/core/test/evaluation/providers/replay-transcripts.test.ts packages/core/test/evaluation/providers/targets.test.ts packages/core/test/evaluation/validation/targets-validator.test.ts packages/core/test/evaluation/providers/normalize-tool-call.test.ts packages/core/test/evaluation/providers/copilot-log-parser.test.ts -> 198 pass
bun run typecheck -> pass
bun run lint -> pass
bun run validate:examples -> 108 valid / 0 invalid
git diff --check and git diff --cached --check -> pass
Manual simplify/review fallback completed; no remaining actionable findings. The subagent review path was not used because the available Codex spawn tool only permits delegation when explicitly requested by the user.

Evidence

Replay target path: bun apps/cli/src/cli.ts eval examples/showcase/trace-evaluation/evals/transcript-import.eval.yaml --target replay_imported_codex_transcript --output .agentv/results/replay-contract -> PASS 1/1, mean 100%; bundle at .agentv/results/replay-contract
Direct transcript path: bun apps/cli/src/cli.ts eval examples/showcase/trace-evaluation/evals/transcript-import.eval.yaml --transcript examples/showcase/trace-evaluation/fixtures/imported-codex-transcript.jsonl --output .agentv/results/transcript-direct -> PASS 1/1, mean 100%; bundle at .agentv/results/transcript-direct

No live provider was needed for this proof because the changed path is explicitly replay-only: it verifies rerunning graders over normalized recorded trajectories without agent invocation. The replay-target run wrote its local bundle and manifest; its optional results-ref export warned on a local agentv/results/v1 ref lock after another run updated that ref. The direct transcript run exported successfully.

Post-Deploy Monitoring & Validation

No additional production/runtime monitoring required. This is local/CI eval infrastructure and docs behavior, not a deployed service path.

Healthy signals after merge:

CI typecheck, lint, focused tests, and example validation remain green.
Users importing Copilot sessions see normalized transcript JSONL and replay them with provider: replay + transcripts.
Authored provider: copilot-log configs fail with the explicit migration error.

Failure signals and rollback trigger:

Reports that replay grades the wrong transcript or silently ignores test_id mismatches.
Example validation failures around copilot-transcript-replay or trace-evaluation replay targets.
Unexpected Dashboard/grader consumption of provider-native raw logs instead of AgentV artifacts.

Validation window/owner: next CI run and first reviewer smoke test; PR author owns follow-up before merge.

Deploying agentv with Cloudflare Pages

Latest commit:	`35622a8`
Status:	✅ Deploy successful!
Preview URL:	https://344c0a37.agentv.pages.dev
Branch Preview URL:	https://coding-replay-contract.agentv.pages.dev

View logs

feat(core): replay normalized transcripts

35622a8

christso merged commit 1d882db into main Jul 5, 2026
8 checks passed

christso deleted the coding-replay-contract branch July 5, 2026 04:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): replay normalized transcripts#1658

feat(core): replay normalized transcripts#1658
christso merged 1 commit into
mainfrom
coding-replay-contract

christso commented Jul 5, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jul 5, 2026

Summary

Validation

Evidence

Post-Deploy Monitoring & Validation

Related

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 5, 2026

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant