Skip to content

feat(core): replay normalized transcripts#1658

Merged
christso merged 1 commit into
mainfrom
coding-replay-contract
Jul 5, 2026
Merged

feat(core): replay normalized transcripts#1658
christso merged 1 commit into
mainfrom
coding-replay-contract

Conversation

@christso

@christso christso commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Recorded coding-agent trajectories can now be replayed through AgentV's provider-agnostic replay target from normalized transcript JSONL. Users import expensive Copilot, Codex, or Claude sessions once, match them by stable test_id plus source_target, and rerun graders without invoking the live agent again.

This hard-removes the authored provider: copilot-log target surface. Copilot events.jsonl parsing remains available as an import/normalization adapter through agentv import copilot, while graders and Dashboard-facing artifacts consume AgentV transcript/replay output instead of provider-native logs.

Key decisions:

Area Contract
Replay source provider: replay accepts exactly one of fixtures, execution_traces, or transcripts
Matching Transcript replay matches test_id and source_target; direct --transcript evals validate exact eval test_id coverage before running
Provenance Raw provider logs stay importer/debug input; replay responses expose normalized Message[] plus raw.replay_transcript provenance
Copilot logs Removed as authored target provider; agentv import copilot is the supported path

Validation

  • bun test packages/core/test/import/transcript-provider.test.ts packages/core/test/evaluation/providers/replay-transcripts.test.ts packages/core/test/evaluation/providers/targets.test.ts packages/core/test/evaluation/validation/targets-validator.test.ts packages/core/test/evaluation/providers/normalize-tool-call.test.ts packages/core/test/evaluation/providers/copilot-log-parser.test.ts -> 198 pass
  • bun run typecheck -> pass
  • bun run lint -> pass
  • bun run validate:examples -> 108 valid / 0 invalid
  • git diff --check and git diff --cached --check -> pass
  • Manual simplify/review fallback completed; no remaining actionable findings. The subagent review path was not used because the available Codex spawn tool only permits delegation when explicitly requested by the user.

Evidence

  • Replay target path: bun apps/cli/src/cli.ts eval examples/showcase/trace-evaluation/evals/transcript-import.eval.yaml --target replay_imported_codex_transcript --output .agentv/results/replay-contract -> PASS 1/1, mean 100%; bundle at .agentv/results/replay-contract
  • Direct transcript path: bun apps/cli/src/cli.ts eval examples/showcase/trace-evaluation/evals/transcript-import.eval.yaml --transcript examples/showcase/trace-evaluation/fixtures/imported-codex-transcript.jsonl --output .agentv/results/transcript-direct -> PASS 1/1, mean 100%; bundle at .agentv/results/transcript-direct

No live provider was needed for this proof because the changed path is explicitly replay-only: it verifies rerunning graders over normalized recorded trajectories without agent invocation. The replay-target run wrote its local bundle and manifest; its optional results-ref export warned on a local agentv/results/v1 ref lock after another run updated that ref. The direct transcript run exported successfully.

Post-Deploy Monitoring & Validation

No additional production/runtime monitoring required. This is local/CI eval infrastructure and docs behavior, not a deployed service path.

Healthy signals after merge:

  • CI typecheck, lint, focused tests, and example validation remain green.
  • Users importing Copilot sessions see normalized transcript JSONL and replay them with provider: replay + transcripts.
  • Authored provider: copilot-log configs fail with the explicit migration error.

Failure signals and rollback trigger:

  • Reports that replay grades the wrong transcript or silently ignores test_id mismatches.
  • Example validation failures around copilot-transcript-replay or trace-evaluation replay targets.
  • Unexpected Dashboard/grader consumption of provider-native raw logs instead of AgentV artifacts.

Validation window/owner: next CI run and first reviewer smoke test; PR author owns follow-up before merge.

Related

Related: av-t2o5.2


Compound Engineering
GPT_5

@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 35622a8
Status: ✅  Deploy successful!
Preview URL: https://344c0a37.agentv.pages.dev
Branch Preview URL: https://coding-replay-contract.agentv.pages.dev

View logs

@christso christso merged commit 1d882db into main Jul 5, 2026
8 checks passed
@christso christso deleted the coding-replay-contract branch July 5, 2026 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant