feat(core): replay normalized transcripts#1658
Merged
Merged
Conversation
Deploying agentv with
|
| Latest commit: |
35622a8
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://344c0a37.agentv.pages.dev |
| Branch Preview URL: | https://coding-replay-contract.agentv.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Recorded coding-agent trajectories can now be replayed through AgentV's provider-agnostic
replaytarget from normalized transcript JSONL. Users import expensive Copilot, Codex, or Claude sessions once, match them by stabletest_idplussource_target, and rerun graders without invoking the live agent again.This hard-removes the authored
provider: copilot-logtarget surface. Copilotevents.jsonlparsing remains available as an import/normalization adapter throughagentv import copilot, while graders and Dashboard-facing artifacts consume AgentV transcript/replay output instead of provider-native logs.Key decisions:
provider: replayaccepts exactly one offixtures,execution_traces, ortranscriptstest_idandsource_target; direct--transcriptevals validate exact evaltest_idcoverage before runningMessage[]plusraw.replay_transcriptprovenanceagentv import copilotis the supported pathValidation
bun test packages/core/test/import/transcript-provider.test.ts packages/core/test/evaluation/providers/replay-transcripts.test.ts packages/core/test/evaluation/providers/targets.test.ts packages/core/test/evaluation/validation/targets-validator.test.ts packages/core/test/evaluation/providers/normalize-tool-call.test.ts packages/core/test/evaluation/providers/copilot-log-parser.test.ts-> 198 passbun run typecheck-> passbun run lint-> passbun run validate:examples-> 108 valid / 0 invalidgit diff --checkandgit diff --cached --check-> passEvidence
bun apps/cli/src/cli.ts eval examples/showcase/trace-evaluation/evals/transcript-import.eval.yaml --target replay_imported_codex_transcript --output .agentv/results/replay-contract-> PASS 1/1, mean 100%; bundle at.agentv/results/replay-contractbun apps/cli/src/cli.ts eval examples/showcase/trace-evaluation/evals/transcript-import.eval.yaml --transcript examples/showcase/trace-evaluation/fixtures/imported-codex-transcript.jsonl --output .agentv/results/transcript-direct-> PASS 1/1, mean 100%; bundle at.agentv/results/transcript-directNo live provider was needed for this proof because the changed path is explicitly replay-only: it verifies rerunning graders over normalized recorded trajectories without agent invocation. The replay-target run wrote its local bundle and manifest; its optional results-ref export warned on a local
agentv/results/v1ref lock after another run updated that ref. The direct transcript run exported successfully.Post-Deploy Monitoring & Validation
No additional production/runtime monitoring required. This is local/CI eval infrastructure and docs behavior, not a deployed service path.
Healthy signals after merge:
provider: replay+transcripts.provider: copilot-logconfigs fail with the explicit migration error.Failure signals and rollback trigger:
test_idmismatches.copilot-transcript-replayor trace-evaluation replay targets.Validation window/owner: next CI run and first reviewer smoke test; PR author owns follow-up before merge.
Related
Related: av-t2o5.2