Skip to content

[Voice evals 15] Add production-call import and redaction contract#786

Merged
Atharva-Kanherkar merged 2 commits into
mainfrom
codex/voice-call-import-redaction
May 13, 2026
Merged

[Voice evals 15] Add production-call import and redaction contract#786
Atharva-Kanherkar merged 2 commits into
mainfrom
codex/voice-call-import-redaction

Conversation

@Atharva-Kanherkar
Copy link
Copy Markdown
Collaborator

Summary

  • Adds backend/internal/voiceimport with a strict JSON contract for externally captured production voice calls.
  • Converts approved/redacted imports into existing multimodaltrace.Trace and voiceartifacts.Manifest models while preserving refs/checksums.
  • Gates regression promotion on approved_for_regression redaction status and emits deterministic challengepack.CaseDefinition cases.

Closes #769.
Parent: #754.

Tests

  • go test ./internal/voiceimport
  • go test ./internal/voiceimport ./internal/multimodaltrace ./internal/voiceartifacts ./internal/challengepack
  • go test ./...

Test Contract

codex/voice-call-import-redaction — Test Contract

Functional Behavior

  • Define a deterministic production-call import format for captured voice-agent calls without ingesting from any real provider.
  • The format must carry transcript entries, audio artifact references, provider event fragments, explicit PII/redaction metadata, reviewer labels, expected outcome, failure category, and a promotion target challenge/input-set/case.
  • Parsed imports must validate into the existing multimodaltrace.Trace and voiceartifacts.Manifest models without mutating artifact references or checksums.
  • Redaction status must be one of unreviewed, redacted, approved_for_regression, or rejected.
  • Promotion to a regression challenge input case is allowed only for approved_for_regression imports with explicit redaction metadata.
  • No external provider dependency, network call, or LLM call is introduced.

Unit Tests

  • TestImportValidRedactedCallFixture parses a valid redacted fixture and validates the derived trace and artifact manifest.
  • TestPromotionRejectsUnreviewedCall asserts unreviewed imports cannot produce regression cases.
  • TestImportRejectsMissingRedactionMetadata asserts missing PII/redaction metadata fails validation.
  • TestApprovedFixtureProducesDeterministicChallengeCase asserts approved imports produce the exact deterministic challenge case shape.
  • TestImportPreservesOriginalArtifactReferencesAndChecksums asserts imported artifact refs and SHA-256 checksums are preserved.

Integration / Functional Tests

  • N/A — this PR defines an internal import contract and deterministic conversion helpers only.

Smoke Tests

  • go test ./internal/voiceimport from backend/.
  • go test ./internal/voiceimport ./internal/multimodaltrace ./internal/voiceartifacts ./internal/challengepack from backend/.
  • go test ./... from backend/ if focused tests pass.

E2E Tests

  • N/A — no production provider import is performed in this slice.

Manual / cURL Tests

  • N/A — deterministic unit tests cover the contract.

Review Checkpoint JSON

{
  "branch": "codex/voice-call-import-redaction",
  "test_contract": "/tmp/codex-voice-call-import-redaction-test-contract.md",
  "created_at": "2026-05-13T17:31:00Z",
  "steps": [
    {
      "step_number": 1,
      "title": "Add deterministic production-call import and redaction contract",
      "timestamp": "2026-05-13T17:41:00Z",
      "files_changed": [
        "backend/internal/voiceimport/import.go",
        "backend/internal/voiceimport/import_test.go"
      ],
      "what_changed": "Added the voiceimport package with a strict JSON fixture contract for externally captured calls, including transcript entries, audio artifact refs, provider event fragments, voice artifact manifests, explicit redaction metadata/status, reviewer labels, expected outcome, failure category, and promotion targets. Added conversion to multimodaltrace.Trace, manifest preservation, and approved-only promotion into deterministic challengepack.CaseDefinition values.",
      "review_instructions": "Verify the import format is provider-neutral, has no provider/network/LLM dependency, validates redaction metadata explicitly, rejects non-approved promotion, preserves original artifact refs/checksums, and produces deterministic trace/manifest/challenge case outputs.",
      "review_result": {
        "status": "pass",
        "issues_found": [],
        "notes": "Self-review checked strict decode, redaction status validation, trace reference ordering, manifest preservation, approved-only promotion, provider event metadata, and the exact tests. Focused, neighboring, and full backend tests passed."
      },
      "cumulative_review": {
        "previous_steps_still_valid": true,
        "integration_issues": [],
        "notes": "This package builds on existing multimodaltrace, voiceartifacts, and challengepack contracts without changing their behavior."
      }
    },
    {
      "step_number": "final",
      "title": "Final review against test contract",
      "timestamp": "2026-05-13T17:42:00Z",
      "test_contract_review": {
        "functional_behavior": "pass - the import contract includes transcript, audio artifact refs, provider events, redaction metadata/status, reviewer labels, expected outcome, failure category, and promotion target; it validates into trace and artifact models and only approved imports can promote.",
        "unit_tests": "pass - tests cover valid redacted import, unreviewed promotion rejection, missing redaction rejection, deterministic approved challenge case, and artifact ref/checksum preservation.",
        "integration_tests": "N/A - internal contract package only.",
        "smoke_tests": "pass - go test ./internal/voiceimport, go test ./internal/voiceimport ./internal/multimodaltrace ./internal/voiceartifacts ./internal/challengepack, and go test ./... from backend passed.",
        "e2e_tests": "N/A - no production provider import in this slice.",
        "manual_tests": "N/A - deterministic unit tests cover the contract."
      },
      "overall_verdict": "ready",
      "blocking_issues": []
    }
  ]
}

@Atharva-Kanherkar
Copy link
Copy Markdown
Collaborator Author

Verdict: approve

Blocking issues: None found.

Step review:

  1. Add deterministic production-call import and redaction contract - pass
    Notes: The actual GitHub PR diff contains only backend/internal/voiceimport/import.go and backend/internal/voiceimport/import_test.go. The new package defines a provider-neutral JSON import shape carrying transcript entries, audio references, provider event fragments, artifact manifest, redaction metadata/status, reviewer labels, expected outcome, failure category, and promotion target. Decode uses strict unknown-field rejection; Validate requires the artifact manifest, provider events, explicit redaction metadata/findings, expected outcome, failure category, and promotion target. ToTrace validates through multimodaltrace.Trace, ToArtifactManifest returns a copied manifest preserving artifact paths/checksums, and PromoteToChallengeCase only proceeds for approved_for_regression. The imports are stdlib/internal packages plus uuid; I found no provider integration, network call, or LLM dependency.
    Issues: None.

Final test contract review:

  • Functional behavior: pass - The implementation satisfies the [Voice evals 15] Add production-call import and redaction contract #769 contract for deterministic import data, redaction metadata/status, promotion gating, trace/manifest conversion, and deterministic approved challenge case generation.
  • Unit tests: pass - The named tests exist and cover valid redacted import, unreviewed promotion rejection, missing redaction metadata rejection, deterministic approved challenge case shape, and artifact ref/checksum preservation.
  • Integration tests: N/A - This PR only adds the internal import contract/conversion helper package.
  • Smoke tests: pass - Focused, neighboring, and full backend Go test commands passed locally.
  • E2E tests: N/A - No production provider import is performed in this slice.
  • Manual tests: N/A - Deterministic unit tests cover the stated contract.

Commands run:

  • sed -n '1,220p' /Users/atharva/.codex/skills/review-checkpoint-pr-review/SKILL.md - passed
  • gh pr view 786 --repo agentclash/agentclash --json number,title,body,baseRefName,headRefName,headRepositoryOwner,headRepository,author,url,state,mergeStateStatus,reviewDecision,commits,files - passed
  • gh issue view 769 --repo agentclash/agentclash --json number,title,body,state,url - passed
  • gh pr diff 786 --repo agentclash/agentclash --name-only - passed; confirmed two-file PR diff
  • gh pr diff 786 --repo agentclash/agentclash --patch | sed -n '1,260p' - passed
  • nl -ba backend/internal/voiceimport/import.go | sed -n '1,260p' - passed
  • nl -ba backend/internal/voiceimport/import.go | sed -n '261,560p' - passed
  • nl -ba backend/internal/voiceimport/import_test.go | sed -n '1,360p' - passed
  • rg -n "voiceimport|http\\. |http\\.Default|net/http|openai|llm|anthropic|provider" backend/internal/voiceimport backend/go.mod backend/go.sum -S - passed; no network/LLM/provider dependency in the new package
  • go test ./internal/voiceimport - passed
  • go test ./internal/voiceimport ./internal/multimodaltrace ./internal/voiceartifacts ./internal/challengepack - passed
  • go test ./... from backend/ - passed

Review JSON:

{
  "steps": [
    {
      "step_number": 1,
      "title": "Add deterministic production-call import and redaction contract",
      "review_result": {
        "status": "pass",
        "issues_found": [],
        "notes": "Verified actual PR diff, issue #769 contract, strict import validation, trace/manifest conversion, approved-only promotion, artifact path/checksum preservation via manifest, deterministic labels/artifact ordering, and absence of provider/network/LLM dependencies. Focused, neighboring, and full backend tests passed."
      },
      "cumulative_review": {
        "previous_steps_still_valid": true,
        "integration_issues": [],
        "notes": "No drift found between the new voiceimport package and existing multimodaltrace, voiceartifacts, or challengepack contracts."
      }
    },
    {
      "step_number": "final",
      "title": "Final review against test contract",
      "test_contract_review": {
        "functional_behavior": "pass - satisfies #769 import format, redaction metadata/status, promotion gating, trace/manifest conversion, and deterministic challenge case requirements.",
        "unit_tests": "pass - named contract tests exist and pass locally.",
        "integration_tests": "N/A - internal contract/conversion helper only.",
        "smoke_tests": "pass - go test ./internal/voiceimport, neighboring package tests, and backend go test ./... passed locally.",
        "e2e_tests": "N/A - no production provider import in this slice.",
        "manual_tests": "N/A - deterministic unit tests cover the stated contract."
      },
      "overall_verdict": "approve",
      "blocking_issues": []
    }
  ]
}

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Greptile Summary

This PR adds the backend/internal/voiceimport package, which defines a strict JSON import contract for externally captured production voice calls and converts validated fixtures into the existing multimodaltrace.Trace, voiceartifacts.Manifest, and challengepack.CaseDefinition models, gating promotion to regression cases on approved_for_regression redaction status.

  • import.go introduces Fixture, Decode(), Validate(), ToTrace(), ToArtifactManifest(), and PromoteToChallengeCase() with deterministic, provider-neutral conversion logic and explicit redaction-status enforcement.
  • import_test.go covers five unit-test scenarios, though the determinism test asserts same-struct JSON identity rather than independent-call determinism.

Confidence Score: 3/5

The package introduces a well-structured, provider-neutral import contract, but a gap in Fixture.Validate() means callers cannot rely on validation as a stable pre-flight check before calling ToTrace().

A fixture with duplicate segment IDs passes Fixture.Validate() but fails inside ToTrace() when trace.Validate() catches the duplicate, breaking the stated contract that validated imports produce valid traces.

backend/internal/voiceimport/import.go needs a cross-entry uniqueness check in Fixture.Validate() for all segment IDs contributed to the trace.

Important Files Changed

Filename Overview
backend/internal/voiceimport/import.go New voiceimport package defining the production call import contract; has a gap where Fixture.Validate() omits duplicate segment-ID checks, letting fixtures that validate still fail in ToTrace(), plus redundant triple-validation in PromoteToChallengeCase().
backend/internal/voiceimport/import_test.go Five unit tests covering the main import/promote/validate paths; determinism check marshals the same struct twice and is trivially true rather than testing cross-invocation determinism.

Fix All in Codex

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
backend/internal/voiceimport/import.go:164-180
**Segment ID uniqueness not checked in `Fixture.Validate()`**

`Fixture.Validate()` iterates over transcript entries and validates each in isolation, but never checks that `TranscriptEntry.SegmentID` values are unique across the slice, that `AudioReference.SegmentID` values are unique, or that an audio segment ID doesn't collide with a transcript segment ID. Both sets of IDs feed directly into `trace.Segments` in `ToTrace()`, so a fixture with duplicate segment IDs will pass `Fixture.Validate()` and then fail inside `ToTrace()` when `trace.Validate()` rejects the duplicate — contradicting the documented contract: *"Parsed imports must validate into the existing models"* and defeating the purpose of a standalone `Validate()` gate.

### Issue 2 of 3
backend/internal/voiceimport/import_test.go:111-128
**Determinism check is trivially true and proves nothing**

The test serializes the same `got` value twice and asserts the two byte strings match. Since `got` is the same struct instance both times and Go's `json.Marshal` is deterministic for the same in-memory value, this comparison will always pass regardless of whether `PromoteToChallengeCase()` is actually deterministic across independent calls. A meaningful test would call `PromoteToChallengeCase()` a second time on an equivalent freshly-decoded fixture and compare the two serialized outputs — that would catch any ordering-instability introduced by, for example, an unsorted `map[string]any` or a non-stable sort.

### Issue 3 of 3
backend/internal/voiceimport/import.go:211-237
**`PromoteToChallengeCase()` triggers `f.Validate()` three times**

`PromoteToChallengeCase()` calls `f.Validate()` once at line 212, then delegates to `f.ToTrace()` (which calls `f.Validate()` again at line 185) and `f.ToArtifactManifest()` (which calls `f.Validate()` a third time at line 200). Each `Validate()` also re-runs `validateArtifactManifest()`. For the happy-path the work is wasted; for large manifests or transcript slices this could be measurable. Internal helper variants (e.g., `toTraceUnchecked`) or inlining the post-validation logic would avoid the redundancy.

Reviews (1): Last reviewed commit: "feat(voice): add call import redaction c..." | Re-trigger Greptile

Comment on lines +164 to +180
for idx, event := range f.ProviderEvents {
if err := event.Validate(); err != nil {
return fmt.Errorf("provider_events[%d]: %w", idx, err)
}
}
if f.Redaction == nil {
return ErrMissingRedaction
}
if err := f.Redaction.Validate(); err != nil {
return err
}
if strings.TrimSpace(f.ExpectedOutcome) == "" {
return errors.New("expected_outcome is required")
}
if strings.TrimSpace(f.FailureCategory) == "" {
return errors.New("failure_category is required")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Segment ID uniqueness not checked in Fixture.Validate()

Fixture.Validate() iterates over transcript entries and validates each in isolation, but never checks that TranscriptEntry.SegmentID values are unique across the slice, that AudioReference.SegmentID values are unique, or that an audio segment ID doesn't collide with a transcript segment ID. Both sets of IDs feed directly into trace.Segments in ToTrace(), so a fixture with duplicate segment IDs will pass Fixture.Validate() and then fail inside ToTrace() when trace.Validate() rejects the duplicate — contradicting the documented contract: "Parsed imports must validate into the existing models" and defeating the purpose of a standalone Validate() gate.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/voiceimport/import.go
Line: 164-180

Comment:
**Segment ID uniqueness not checked in `Fixture.Validate()`**

`Fixture.Validate()` iterates over transcript entries and validates each in isolation, but never checks that `TranscriptEntry.SegmentID` values are unique across the slice, that `AudioReference.SegmentID` values are unique, or that an audio segment ID doesn't collide with a transcript segment ID. Both sets of IDs feed directly into `trace.Segments` in `ToTrace()`, so a fixture with duplicate segment IDs will pass `Fixture.Validate()` and then fail inside `ToTrace()` when `trace.Validate()` rejects the duplicate — contradicting the documented contract: *"Parsed imports must validate into the existing models"* and defeating the purpose of a standalone `Validate()` gate.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c7bb639.

Comment on lines +111 to +128
if originalByKey[key].ChecksumSHA256 != importedByKey[key].ChecksumSHA256 {
t.Fatalf("%s checksum = %q, want %q", key, importedByKey[key].ChecksumSHA256, originalByKey[key].ChecksumSHA256)
}
}
}

func assertCaseShape(t *testing.T, got challengepack.CaseDefinition) {
t.Helper()
if got.ChallengeKey != "voice-support-regression" || got.CaseKey != "prod-call-001" || got.ItemKey != "prod-call-001" {
t.Fatalf("case identity = %s/%s/%s, want voice-support-regression/prod-call-001/prod-call-001", got.ChallengeKey, got.CaseKey, got.ItemKey)
}
if got.Payload["source_import_id"] != "import-prod-call-001" ||
got.Payload["source_provider"] != "acme-contact-center" ||
got.Payload["source_call_id"] != "acme-call-001" ||
got.Payload["redaction_status"] != string(RedactionStatusApprovedForRegression) ||
got.Payload["failure_category"] != "billing_refund_policy" ||
got.Payload["promotion_pack_slug"] != "voice-support-regressions" ||
got.Payload["promotion_input_set"] != "approved-production-calls" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Determinism check is trivially true and proves nothing

The test serializes the same got value twice and asserts the two byte strings match. Since got is the same struct instance both times and Go's json.Marshal is deterministic for the same in-memory value, this comparison will always pass regardless of whether PromoteToChallengeCase() is actually deterministic across independent calls. A meaningful test would call PromoteToChallengeCase() a second time on an equivalent freshly-decoded fixture and compare the two serialized outputs — that would catch any ordering-instability introduced by, for example, an unsorted map[string]any or a non-stable sort.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/voiceimport/import_test.go
Line: 111-128

Comment:
**Determinism check is trivially true and proves nothing**

The test serializes the same `got` value twice and asserts the two byte strings match. Since `got` is the same struct instance both times and Go's `json.Marshal` is deterministic for the same in-memory value, this comparison will always pass regardless of whether `PromoteToChallengeCase()` is actually deterministic across independent calls. A meaningful test would call `PromoteToChallengeCase()` a second time on an equivalent freshly-decoded fixture and compare the two serialized outputs — that would catch any ordering-instability introduced by, for example, an unsorted `map[string]any` or a non-stable sort.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c7bb639.

Comment on lines +211 to +237
Kind: audioKindForSpeaker(entry.Speaker),
Actor: actorForSpeaker(entry.Speaker),
OccurredAt: entry.OccurredAt.UTC(),
Audio: &multimodaltrace.AudioPayload{
ArtifactRef: strings.TrimSpace(entry.Audio.ArtifactRef),
Format: strings.TrimSpace(entry.Audio.Format),
Channel: strings.TrimSpace(entry.Audio.Channel),
DurationMS: entry.Audio.DurationMS,
},
})
}
sequence++
kind := multimodaltrace.SegmentKindTranscriptPartial
if entry.Final {
kind = multimodaltrace.SegmentKindTranscriptFinal
}
trace.Segments = append(trace.Segments, multimodaltrace.Segment{
SegmentID: strings.TrimSpace(entry.SegmentID),
SequenceNumber: sequence,
Kind: kind,
Actor: actorForSpeaker(entry.Speaker),
OccurredAt: entry.OccurredAt.UTC(),
Transcript: &multimodaltrace.TranscriptPayload{
Text: entry.Text,
Language: strings.TrimSpace(entry.Language),
Confidence: entry.Confidence,
SourceSegmentID: sourceSegmentID,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 PromoteToChallengeCase() triggers f.Validate() three times

PromoteToChallengeCase() calls f.Validate() once at line 212, then delegates to f.ToTrace() (which calls f.Validate() again at line 185) and f.ToArtifactManifest() (which calls f.Validate() a third time at line 200). Each Validate() also re-runs validateArtifactManifest(). For the happy-path the work is wasted; for large manifests or transcript slices this could be measurable. Internal helper variants (e.g., toTraceUnchecked) or inlining the post-validation logic would avoid the redundancy.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/voiceimport/import.go
Line: 211-237

Comment:
**`PromoteToChallengeCase()` triggers `f.Validate()` three times**

`PromoteToChallengeCase()` calls `f.Validate()` once at line 212, then delegates to `f.ToTrace()` (which calls `f.Validate()` again at line 185) and `f.ToArtifactManifest()` (which calls `f.Validate()` a third time at line 200). Each `Validate()` also re-runs `validateArtifactManifest()`. For the happy-path the work is wasted; for large manifests or transcript slices this could be measurable. Internal helper variants (e.g., `toTraceUnchecked`) or inlining the post-validation logic would avoid the redundancy.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c7bb639.

@Atharva-Kanherkar
Copy link
Copy Markdown
Collaborator Author

Verdict: approve

No blocking issues found.

Step review:

  1. Add deterministic production-call import and redaction contract - pass
    Notes: The PR adds only backend/internal/voiceimport/import.go and backend/internal/voiceimport/import_test.go. The fixture contract covers transcript entries, audio artifact references, provider event fragments, redaction metadata/status, reviewer labels, expected outcome, failure category, and promotion target. Decode uses DisallowUnknownFields, Fixture.Validate enforces explicit redaction metadata/status and required promotion fields, and conversion preserves artifact refs/checksums through the existing manifest model.
    Greptile fixes verified: Fixture.Validate records both TranscriptEntry.SegmentID and AudioReference.SegmentID in one seenSegmentIDs map before ToTrace construction; the determinism test compares two independently freshly decoded approved fixtures; PromoteToChallengeCase validates once and then uses toTraceUnchecked/toArtifactManifestUnchecked rather than calling Fixture.Validate through each conversion helper.
    Issue [Voice evals 15] Add production-call import and redaction contract #769 contract verified: promotion is gated to approved_for_regression; unreviewed promotion fails; missing redaction metadata fails; valid imports parse into multimodal trace and voice artifact manifest; approved imports produce deterministic challenge case shape; no provider, network, or LLM dependency was introduced.

Final test contract review:

  • Functional behavior: pass - implementation satisfies the production-call import/redaction contract from issue [Voice evals 15] Add production-call import and redaction contract #769 and the PR contract.
  • Unit tests: pass - required tests are present, with additional duplicate segment ID coverage.
  • Integration tests: N/A - internal contract/conversion helpers only.
  • Smoke tests: pass - focused, package-set, and full backend tests passed locally.
  • E2E tests: N/A - no production provider import is performed in this slice.
  • Manual tests: N/A - deterministic unit tests cover the contract.

Commands run:

  • git status --short --branch - clean on codex/voice-call-import-redaction, head matches requested PR head.
  • git rev-parse HEAD && git remote -v - confirmed c7bb639a0caed5ea028009959ab5a50f9f7c9103 and agentclash/agentclash remote.
  • gh pr view 786 --repo agentclash/agentclash --json number,title,headRefName,headRefOid,baseRefName,baseRefOid,body,url,state,author - confirmed PR metadata and test contract.
  • gh issue view 769 --repo agentclash/agentclash --json number,title,body,state,url - re-checked issue contract.
  • git diff --stat a3933c67dbd8425c546cfda02e902a6b856a746a...HEAD - reviewed changed file scope.
  • git diff --name-only a3933c67dbd8425c546cfda02e902a6b856a746a...HEAD - confirmed only voiceimport files changed.
  • sed -n '1,260p' backend/internal/voiceimport/import.go - reviewed implementation.
  • sed -n '1,360p' backend/internal/voiceimport/import_test.go - reviewed tests.
  • sed -n '180,620p' backend/internal/voiceimport/import.go - reviewed promotion/conversion helpers and validators.
  • nl -ba backend/internal/voiceimport/import.go | sed -n '90,260p' - checked validation and trace construction line references.
  • nl -ba backend/internal/voiceimport/import_test.go | sed -n '55,130p' - checked determinism and duplicate tests line references.
  • rg -n "Validate\(|PromoteToChallengeCase|recordSegmentID|duplicates|freshFixture|gotAgain" backend/internal/voiceimport - verified validation/promotion call sites.
  • go test ./internal/voiceimport -run 'TestImportRejectsDuplicateSegmentIDsBeforeTraceConversion|TestApprovedFixtureProducesDeterministicChallengeCase' -count=1 - passed.
  • go test ./internal/voiceimport - passed.
  • go test ./internal/voiceimport ./internal/multimodaltrace ./internal/voiceartifacts ./internal/challengepack - passed.
  • go test ./... from backend/ - passed.
  • rg -n "http|https|openai|llm|provider|net/|os/exec|grpc|client|Do\(" backend/internal/voiceimport - found only provider metadata/test fixture references, no network/LLM/client dependency.
  • git diff --check a3933c67dbd8425c546cfda02e902a6b856a746a...HEAD - passed.

Review JSON:

{
  "overall_verdict": "approve",
  "blocking_issues": [],
  "steps": [
    {
      "step_number": 1,
      "title": "Add deterministic production-call import and redaction contract",
      "review_result": {
        "status": "pass",
        "issues_found": [],
        "notes": "Verified implementation against issue #769 and PR contract; Greptile fixes are present; focused and full backend tests passed."
      },
      "cumulative_review": {
        "previous_steps_still_valid": true,
        "integration_issues": [],
        "notes": "No drift found across multimodaltrace, voiceartifacts, or challengepack integration."
      }
    },
    {
      "step_number": "final",
      "title": "Final review against test contract",
      "test_contract_review": {
        "functional_behavior": "pass",
        "unit_tests": "pass",
        "integration_tests": "N/A",
        "smoke_tests": "pass",
        "e2e_tests": "N/A",
        "manual_tests": "N/A"
      },
      "overall_verdict": "approve",
      "blocking_issues": []
    }
  ]
}

@Atharva-Kanherkar Atharva-Kanherkar merged commit 2e66340 into main May 13, 2026
3 checks passed
@Atharva-Kanherkar Atharva-Kanherkar deleted the codex/voice-call-import-redaction branch May 13, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Voice evals 15] Add production-call import and redaction contract

1 participant