Add generic voice artifact manifest schema#812
Conversation
Greptile SummaryThis PR introduces a producer-neutral JSON Schema (Draft 2020-12) for AgentClash voice artifact manifests, adds backend Go tests that validate both positive and negative examples against that schema, and links the new schema from the voice-artifact contracts docs.
Confidence Score: 4/5Safe to merge; the schema and documentation are correct, and the tests cover the intended cases, but the test helper code has minor rough edges that could cause confusion on future library upgrades. The JSON schema itself is well-formed and the positive/negative test coverage is meaningful. Two issues in the test file keep the score from being clean: an unused backend/internal/voiceartifacts/schema_test.go — the rejection test struct and error-string check both warrant a quick look before merging.
|
| Filename | Overview |
|---|---|
| backend/internal/voiceartifacts/schema_test.go | Adds two new test functions for the manifest schema; contains an unreachable validate field and an unconditional error-string assertion that diverges from the established pattern. |
| docs/schemas/voice-artifact-manifest.schema.json | New JSON Schema (Draft 2020-12) for voice artifact manifests; correctly encodes required kinds, location-based if/then constraints, path-traversal guards, and checksum/UUID patterns. |
| testing/codex-voice-manifest-schema.md | New markdown test-contract document for the manifest schema; describes functional behavior, unit tests, and expected test commands. Documentation only, no issues. |
| web/content/docs/concepts/voice-artifact-contracts.mdx | Documentation update linking the new manifest schema and clarifying that the Go validator enforces unique artifact keys beyond what JSON Schema's uniqueItems covers. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
M[Voice Artifact Manifest] --> SV["schema_version == '2026-05-13'"]
M --> RI["run_id (UUID)"]
M --> RAI["run_agent_id (UUID)"]
M --> VSI["voice_session_id (minLength: 1)"]
M --> A["artifacts array (minItems: 1, uniqueItems: true)"]
M --> META["metadata (optional)"]
A --> CONTAINS["allOf: contains checks"]
CONTAINS --> C1["caller_audio ✓"]
CONTAINS --> C2["agent_audio ✓"]
CONTAINS --> C3["transcript_json ✓"]
CONTAINS --> C4["waveform_timeline_json ✓"]
CONTAINS --> C5["structured_output_json ✓"]
A --> AREF["artifact_ref"]
AREF --> KEY["key (minLength: 1)"]
AREF --> KIND["kind (enum: 11 values)"]
AREF --> LOC["location"]
AREF --> CHKSUM["checksum_sha256 (^[0-9a-f]{64}$)"]
LOC --> LP["local_path"]
LOC --> OS["object_storage"]
LP --> LPREQ["requires: path (relative, no traversal)"]
LP --> LPFORBID["must NOT have: bucket, object_key"]
OS --> OSREQ["requires: bucket + object_key"]
OS --> OSFORBID["must NOT have: path"]
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
backend/internal/voiceartifacts/schema_test.go:254-258
**Unused `validate` field is dead code**
The `validate func(t *testing.T, manifest map[string]any)` field is declared in the struct but none of the eight test cases ever assign it. The `if tt.validate != nil` guard on line 316 is therefore unreachable. The existing rejection-test helper (`TestVoiceReportSchemasRejectInvalidExamples`) uses a simpler `errSubstr string` pattern, which is both populated and checked. Either remove the field or populate it in at least one case to keep the code meaningful.
### Issue 2 of 2
backend/internal/voiceartifacts/schema_test.go:322-325
**Unconditional schema-filename assertion diverges from the established pattern and may mask real failures**
Every rejection case now hard-requires `err.Error()` to contain `"voice-artifact-manifest.schema.json"`. The existing `TestVoiceReportSchemasRejectInvalidExamples` uses a per-case optional `errSubstr` field that is only checked `if tt.errSubstr != ""`, so the substring check can be enabled selectively. If the `jsonschema` library changes how it formats its error strings (e.g., during a library upgrade), *all* eight rejection sub-tests would fail with `"error … does not mention manifest schema"` even though each invalid document was correctly rejected — the misleading failure would make a library upgrade look like a schema regression.
Reviews (1): Last reviewed commit: "fix: align voice manifest schema preflig..." | Re-trigger Greptile
| tests := []struct { | ||
| name string | ||
| mutate func(map[string]any) | ||
| validate func(t *testing.T, manifest map[string]any) | ||
| }{ |
There was a problem hiding this comment.
Unused
validate field is dead code
The validate func(t *testing.T, manifest map[string]any) field is declared in the struct but none of the eight test cases ever assign it. The if tt.validate != nil guard on line 316 is therefore unreachable. The existing rejection-test helper (TestVoiceReportSchemasRejectInvalidExamples) uses a simpler errSubstr string pattern, which is both populated and checked. Either remove the field or populate it in at least one case to keep the code meaningful.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/voiceartifacts/schema_test.go
Line: 254-258
Comment:
**Unused `validate` field is dead code**
The `validate func(t *testing.T, manifest map[string]any)` field is declared in the struct but none of the eight test cases ever assign it. The `if tt.validate != nil` guard on line 316 is therefore unreachable. The existing rejection-test helper (`TestVoiceReportSchemasRejectInvalidExamples`) uses a simpler `errSubstr string` pattern, which is both populated and checked. Either remove the field or populate it in at least one case to keep the code meaningful.
How can I resolve this? If you propose a fix, please make it concise.| if tt.validate != nil { | ||
| tt.validate(t, manifest) | ||
| } | ||
| err := schema.Validate(asJSONDocument(t, manifest)) |
There was a problem hiding this comment.
Unconditional schema-filename assertion diverges from the established pattern and may mask real failures
Every rejection case now hard-requires err.Error() to contain "voice-artifact-manifest.schema.json". The existing TestVoiceReportSchemasRejectInvalidExamples uses a per-case optional errSubstr field that is only checked if tt.errSubstr != "", so the substring check can be enabled selectively. If the jsonschema library changes how it formats its error strings (e.g., during a library upgrade), all eight rejection sub-tests would fail with "error … does not mention manifest schema" even though each invalid document was correctly rejected — the misleading failure would make a library upgrade look like a schema regression.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/voiceartifacts/schema_test.go
Line: 322-325
Comment:
**Unconditional schema-filename assertion diverges from the established pattern and may mask real failures**
Every rejection case now hard-requires `err.Error()` to contain `"voice-artifact-manifest.schema.json"`. The existing `TestVoiceReportSchemasRejectInvalidExamples` uses a per-case optional `errSubstr` field that is only checked `if tt.errSubstr != ""`, so the substring check can be enabled selectively. If the `jsonschema` library changes how it formats its error strings (e.g., during a library upgrade), *all* eight rejection sub-tests would fail with `"error … does not mention manifest schema"` even though each invalid document was correctly rejected — the misleading failure would make a library upgrade look like a schema regression.
How can I resolve this? If you propose a fix, please make it concise.
Closes #811\n\n## Summary\n- adds a producer-neutral JSON Schema for voice artifact manifests\n- validates manifest examples and common malformed references in backend tests\n- links the manifest schema from the generic voice artifact contract docs\n\n## Testing\n- cd backend && go test ./internal/voiceartifacts\n\n## Review checkpoint\n- testing/codex-voice-manifest-schema.md\n\nThis stays generic to AgentClash voice evals. Voicey is not a canonical schema surface here; product-specific details belong in metadata or raw traces.