[Voice evals 16] Add support-agent end-to-end smoke test#787
Conversation
Greptile SummaryThis PR adds
Confidence Score: 4/5The change is a test-only addition with no production code changes; it is safe to merge and will not affect runtime behavior. The end-to-end flow is logically correct: all key timing, latency-fallback, and scorecard-pass conditions trace through cleanly against the fixture data. Two gaps hold the score below clean: the local backend/internal/voicee2e/support_agent_smoke_test.go — the only changed file; focus on the unused fixture fields and the
|
| Filename | Overview |
|---|---|
| backend/internal/voicee2e/support_agent_smoke_test.go | New deterministic smoke test; end-to-end flow is logically sound, but the local contains helper diverges from voicetextsim.contains in whitespace handling, and several loaded fixture fields (ExpectedAgentTextOutput, ExpectedTraceJSON, etc.) are never consumed, creating silent drift risk. |
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
backend/internal/voicee2e/support_agent_smoke_test.go:280-285
**Local `contains` silently disagrees with `voicetextsim.contains`**
The local helper does exact string comparison, while `voicetextsim.validateInput` trims whitespace before comparing. If the fixture YAML ever has a transport value padded with whitespace (e.g., `" text_sim"`), `parseSupportPack` would fatalf with "want text_sim transport" even though the actual `voicetextsim.Run` call would succeed — making the failure message actively misleading.
### Issue 2 of 2
backend/internal/voicee2e/support_agent_smoke_test.go:85-105
**Several fixture fields are loaded but never consumed**
`loadSupportFixture` populates `fixture.ExpectedAgentTextOutput`, `fixture.ExpectedTraceJSON`, `fixture.ExpectedScorecardJSON`, `fixture.ScriptedUserTurnsJSON`, and `fixture.ExpectedToolResultJSON`, but none of these are referenced anywhere in the test. In particular, `fakeDeploymentScript` sources the happy-path agent text from `script.Steps[0].ExpectedAgentText` (the separate `voicesim` script file), completely bypassing the canonical `ExpectedAgentTextOutput` embedded in the fixture pack. If these two sources drift apart, the smoke test will keep passing while the fixtures are inconsistent.
Reviews (1): Last reviewed commit: "test(voice): add support eval smoke" | Re-trigger Greptile
| for _, value := range values { | ||
| if value == want { | ||
| return true | ||
| } | ||
| } | ||
| return false |
There was a problem hiding this comment.
Local
contains silently disagrees with voicetextsim.contains
The local helper does exact string comparison, while voicetextsim.validateInput trims whitespace before comparing. If the fixture YAML ever has a transport value padded with whitespace (e.g., " text_sim"), parseSupportPack would fatalf with "want text_sim transport" even though the actual voicetextsim.Run call would succeed — making the failure message actively misleading.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/voicee2e/support_agent_smoke_test.go
Line: 280-285
Comment:
**Local `contains` silently disagrees with `voicetextsim.contains`**
The local helper does exact string comparison, while `voicetextsim.validateInput` trims whitespace before comparing. If the fixture YAML ever has a transport value padded with whitespace (e.g., `" text_sim"`), `parseSupportPack` would fatalf with "want text_sim transport" even though the actual `voicetextsim.Run` call would succeed — making the failure message actively misleading.
How can I resolve this? If you propose a fix, please make it concise.| if err != nil { | ||
| t.Fatalf("LoadSupportBillingFixture returned error: %v", err) | ||
| } | ||
| return fixture | ||
| } | ||
|
|
||
| func parseSupportPack(t *testing.T, fixture voicefixtures.SupportBillingFixture) challengepack.Bundle { | ||
| t.Helper() | ||
| bundle, err := challengepack.ParseYAML(fixture.ChallengePackYAML) | ||
| if err != nil { | ||
| t.Fatalf("ParseYAML returned error: %v", err) | ||
| } | ||
| if bundle.Modality != challengepack.ModalityVoice { | ||
| t.Fatalf("bundle modality = %q, want voice", bundle.Modality) | ||
| } | ||
| if bundle.InterfaceSpec == nil || !contains(bundle.InterfaceSpec.Transports, "text_sim") { | ||
| t.Fatalf("bundle interface spec = %+v, want text_sim transport", bundle.InterfaceSpec) | ||
| } | ||
| return bundle | ||
| } | ||
|
|
There was a problem hiding this comment.
Several fixture fields are loaded but never consumed
loadSupportFixture populates fixture.ExpectedAgentTextOutput, fixture.ExpectedTraceJSON, fixture.ExpectedScorecardJSON, fixture.ScriptedUserTurnsJSON, and fixture.ExpectedToolResultJSON, but none of these are referenced anywhere in the test. In particular, fakeDeploymentScript sources the happy-path agent text from script.Steps[0].ExpectedAgentText (the separate voicesim script file), completely bypassing the canonical ExpectedAgentTextOutput embedded in the fixture pack. If these two sources drift apart, the smoke test will keep passing while the fixtures are inconsistent.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/voicee2e/support_agent_smoke_test.go
Line: 85-105
Comment:
**Several fixture fields are loaded but never consumed**
`loadSupportFixture` populates `fixture.ExpectedAgentTextOutput`, `fixture.ExpectedTraceJSON`, `fixture.ExpectedScorecardJSON`, `fixture.ScriptedUserTurnsJSON`, and `fixture.ExpectedToolResultJSON`, but none of these are referenced anywhere in the test. In particular, `fakeDeploymentScript` sources the happy-path agent text from `script.Steps[0].ExpectedAgentText` (the separate `voicesim` script file), completely bypassing the canonical `ExpectedAgentTextOutput` embedded in the fixture pack. If these two sources drift apart, the smoke test will keep passing while the fixtures are inconsistent.
How can I resolve this? If you propose a fix, please make it concise.|
Verdict: approve Blocking issues: none found. Step review:
Final test contract review:
External dependency check:
Commands run from
Review JSON: {
"steps": [
{
"step_number": 1,
"title": "Add support-agent voice eval smoke test",
"review_result": {
"status": "pass",
"issues_found": [],
"notes": "Independently verified the actual diff and local tests. The smoke covers the #770 contract, uses fake/local-only voice evaluation pieces, validates deterministic happy-path output, and proves the known-bad candidate fails the compare gate."
},
"cumulative_review": {
"previous_steps_still_valid": true,
"integration_issues": [],
"notes": "No integration drift found across fixture loading, text_sim execution, artifact verification, replay, scorecard generation, or releasegate comparison."
}
},
{
"step_number": "final",
"title": "Final review against test contract",
"test_contract_review": {
"functional_behavior": "pass - issue #770 contract verified against the actual diff and local tests.",
"unit_tests": "pass - TestSupportAgentVoiceEvalLoopSmoke covers the required smoke path.",
"integration_tests": "pass - focused package smoke passed.",
"smoke_tests": "pass - documented command, neighboring packages, and go test ./... passed.",
"e2e_tests": "pass - deterministic internal E2E-style smoke covers the requested slice without external services.",
"manual_tests": "N/A - no manual HTTP route required."
},
"overall_verdict": "approve",
"blocking_issues": []
}
]
} |
|
Verdict: approve Blocking issues: none found. Note: GitHub rejected a formal approving review from this local token because it belongs to the PR author, so I am posting the independent review as a PR conversation comment. Step review:
Greptile follow-up verification:
Final test contract review:
External dependency check:
Commands run from
Review JSON: {"steps":[{"step_number":1,"title":"Add support-agent voice eval smoke test","review_result":{"status":"pass","issues_found":[],"notes":"Actual diff and local tests satisfy issue #770 and the PR test contract. Greptile follow-up commit 1a2a0986 addressed fixture-field usage and whitespace matching."},"cumulative_review":{"previous_steps_still_valid":true,"integration_issues":[],"notes":"No integration drift found across fixture loading, text-sim execution, artifact verification, replay, scorecard generation, or releasegate comparison."}},{"step_number":"final","title":"Final review against test contract","test_contract_review":{"functional_behavior":"pass","unit_tests":"pass","integration_tests":"pass","smoke_tests":"pass","e2e_tests":"pass","manual_tests":"N/A"},"overall_verdict":"approve","blocking_issues":[]}]} |
Summary
backend/internal/voicee2ewith a deterministic support-agent voice eval smoke test.text_simwith fake deployment/tool behavior, validates canonical events plus artifact manifest checksums, builds replay projection, generates scorecards, and evaluates voice compare gates.go test ./internal/voicee2e -run TestSupportAgentVoiceEvalLoopSmoke -count=1frombackend/.Closes #770.
Parent: #754.
Tests
go test ./internal/voicee2e -run TestSupportAgentVoiceEvalLoopSmoke -count=1go test ./internal/voicee2e ./internal/voicetextsim ./internal/voicereplay ./internal/voicescorecard ./internal/releasegatego test ./...Test Contract
codex/voice-support-e2e-smoke — Test Contract
Functional Behavior
modality: voice,text_simmode, scripted user simulator data, fake voice-agent deployment, fake tool call, canonical voice events, artifact manifest validation, replay projection, scorecard generation, and voice compare gate evaluation.go test ./internal/voicee2e -run TestSupportAgentVoiceEvalLoopSmoke -count=1frombackend/.Unit Tests
TestSupportAgentVoiceEvalLoopSmokeloads the support-agent pack, runs text-sim with fake adapters, validates events/artifact manifest, builds replay projection, generates scorecards, and evaluates pass/fail compare gates.Integration / Functional Tests
go test ./internal/voicee2efrombackend/verifies the full deterministic smoke path.Smoke Tests
go test ./internal/voicee2e -run TestSupportAgentVoiceEvalLoopSmoke -count=1frombackend/.go test ./internal/voicee2e ./internal/voicetextsim ./internal/voicereplay ./internal/voicescorecard ./internal/releasegatefrombackend/.go test ./...frombackend/if focused tests pass.E2E Tests
Manual / cURL Tests
Review Checkpoint JSON
{ "branch": "codex/voice-support-e2e-smoke", "test_contract": "/tmp/codex-voice-support-e2e-smoke-test-contract.md", "created_at": "2026-05-13T17:45:00Z", "steps": [ { "step_number": 1, "title": "Add support-agent voice eval smoke test", "timestamp": "2026-05-13T17:52:00Z", "files_changed": [ "backend/internal/voicee2e/support_agent_smoke_test.go" ], "what_changed": "Added a deterministic support-agent voice evaluation smoke test that loads the support billing fixture pack, runs text-sim with fake deployment/tool call behavior, validates canonical events and artifact manifest checksums, builds a replay projection, generates voice scorecards, and evaluates the voice compare gate for both happy and known-bad candidates.", "review_instructions": "Verify the test exercises the full #770 contract without external LLM, telephony/WebRTC, object storage, or network calls; verify the documented command works and that both the happy pass and failing gate paths are deterministic.", "review_result": { "status": "pass", "issues_found": [], "notes": "Self-review checked the one-command smoke path, fixture pack modality/text_sim checks, fake deployment/tool call wiring, event validation, manifest checksum verification, replay projection, scorecard pass, and compare gate failure. Focused, neighboring, and full backend tests passed." }, "cumulative_review": { "previous_steps_still_valid": true, "integration_issues": [], "notes": "The smoke test composes existing voice packages without changing their behavior." } }, { "step_number": "final", "title": "Final review against test contract", "timestamp": "2026-05-13T17:53:00Z", "test_contract_review": { "functional_behavior": "pass - the smoke test covers the support billing voice pack, text_sim, fake deployment/tool call, canonical events, manifest checksums, replay projection, scorecards, and pass/fail compare gate paths without external services.", "unit_tests": "pass - TestSupportAgentVoiceEvalLoopSmoke is present and covers the contract.", "integration_tests": "pass - go test ./internal/voicee2e verifies the full deterministic internal smoke path.", "smoke_tests": "pass - documented command, neighboring package set, and go test ./... from backend passed.", "e2e_tests": "pass - deterministic internal E2E-style smoke test covers the final voice-evals slice.", "manual_tests": "N/A - no HTTP/manual route in this slice." }, "overall_verdict": "ready", "blocking_issues": [] } ] }