feat(framework): Phase 3 PR 1 — audit prompt templates + output schema#85
Merged
feat(framework): Phase 3 PR 1 — audit prompt templates + output schema#85
Conversation
First of 6 PRs implementing Phase 3 (multi-model external audit) + the open frictions F2/F5/F7. Framework-only — no CLI code yet. Artifacts (all under dist/.devtrail/, auto-distributed via the existing recursive manifest pattern): - audit-prompts/auditor-primary.md - audit-prompts/auditor-secondary.md - audit-prompts/calibrator-reconciler.md - schemas/audit-output.schema.v0.json Architectural decision A1 (per the Phase 3 plan): Phase 3 v0 is ORCHESTRATION-ONLY, not an HTTP-API client. The CLI prepares and persists prompts, awaits the operator's responses, validates outputs against the schema, integrates findings into the Charter telemetry — but does NOT invoke any LLM API directly. Adopters paste the resolved prompts into their auditor of choice (Copilot, Gemini, Claude, etc.), save responses to the canonical paths, and the CLI consolidates. Rationale for orchestration-only: - Implementing 3 HTTP clients (OpenAI / Google / Anthropic) is 1-2 weeks of work + perpetual maintenance when APIs change. For an EXPERIMENTAL v0 schema, that investment is premature. - Sentinel's empirical pattern (the 6-cycle dual-audit experiment that motivated Phase 3) ALREADY uses this human-in-the-loop shape via /plan-audit skills. The CLI's value-add is the canon (prompt shape + output schema + telemetry integration), not the API call. - Closes RFC #82 (audit visibility) by design — the prompt-resolution and the auditor's response are both files on disk, version-controlled, inspectable, and reproducible by hand if the API call fails. - Aligns with principle #10 (honesty about what the tool does NOT do): "no LLM gateway, no model evaluation". Schema design: - audit-output.schema.v0.json uses oneOf to distinguish auditor outputs (primary/secondary, fresh findings) from calibrator outputs (reconciliation across the two). The `audit_role` field is the discriminator — three fixed roles, not arbitrary N. - findings_by_category enum (hallucination | implementation_gap | real_debt | false_positive) is the same vocabulary used by the external_audit array in charter-telemetry.schema.v0.json. The audit cycle output integrates directly into Charter telemetry at close. - Every output declares prompt_used: <relative path>, satisfying RFC #82's requirement that the prompt path be discoverable from the output. Prompt design: - Primary and secondary prompts are STRUCTURALLY IDENTICAL. The heterogeneity signal lives in the auditor MODEL (different family per §5.2), not in different prompts. A/B-testing prompt phrasings is forward-looking; v0 keeps them symmetric for clean comparability. - Calibrator prompt assumes both auditor outputs as context and asks for status assignment (agreed | disputed | unique_primary | unique_secondary | rejected) per finding. Status counts cross-check against body section count — the schema enforces consistency. - All three prompts include explicit categorization rules + discipline rules ("don't fabricate findings", "no external sources beyond the prompt"). The rules are duplicated across the three so the auditor doesn't need to consult external documentation. What's NOT in this PR: - No CLI code yet — the `devtrail charter audit` command lands in PR 2. - No heterogeneity validation (`--implementer-family` enforcement) — v1. - No invocation of LLM APIs — orchestration-only by design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks
montfort
added a commit
that referenced
this pull request
May 3, 2026
…n) (#86) Second of 6 PRs implementing Phase 3 + open frictions. The CLI command that orchestrates the dual-audit + calibrator cycle, using the prompt templates and output schema shipped in PR 1 (#85). Architecture A1 (orchestration-only) means the CLI does NOT invoke LLM APIs. The operator pastes resolved prompts into their auditor of choice (Copilot, Gemini, Claude, etc.) and saves responses to canonical paths under audit/charters/<CHARTER-ID>/. The CLI's value is structure (prompt resolution + output schema validation + telemetry-ready YAML), not invocation. Three steps, each invokable independently: $ devtrail charter audit CHARTER-01 Step 1/3: PREPARE Resolves auditor-primary.prompt.md and auditor-secondary.prompt.md against the Charter content + git diff + originating AILOGs, writes to audit/charters/CHARTER-01/prompts/. $ devtrail charter audit CHARTER-01 --calibrate Step 2/3: CALIBRATE Validates the two auditor responses against audit-output.schema.v0.json, resolves the calibrator-reconciler prompt with both responses embedded as context. $ devtrail charter audit CHARTER-01 --finalize Step 3/3: FINALIZE Validates all 3 outputs (auditor-primary + auditor-secondary + calibrator), prints a YAML-formatted external_audit array block ready to paste into the Charter telemetry, and points to the calibrator's reconciliation summary for outcome.scope_change_notes. Each step is a filesystem mutation. Files persist between steps — operator can run prepare, walk away, come back days later, run calibrate. Each step prints clear next-action guidance pointing to the exact paths involved. Per RFC #82 the resolved prompt is persisted BEFORE any external action. The schema's prompt_used field cites which prompt template was used; the calibrator can verify provenance. Module shape: - src/audit_schema.rs: jsonschema wrapper with oneOf-aware error formatting, mirroring telemetry_schema.rs and charter_schema.rs. - src/commands/charter/audit.rs: 3-step run dispatch, template resolution with placeholder substitution, frontmatter parsing for auditor summaries, external_audit YAML rendering. Placeholders supported in templates: {{charter_id}}, {{charter_title}}, {{charter_path}}, {{charter_content}}, {{git_range}}, {{git_diff}}, {{ailog_paths}}, {{ailog_contents}}, {{audit_role}}, {{schema_path}}, {{auditor_primary_findings}}, {{auditor_secondary_findings}}. Unknown placeholders are left as literals (no surprise mutations). Tests: - 5 unit tests in src/audit_schema.rs (auditor vs calibrator oneOf discriminator, charter_id pattern, auditors_reconciled minItems). - 5 unit tests in src/commands/charter/audit.rs (canonical_id, template substitution, frontmatter parsing, AuditorSummary). - 7 integration tests in cli/tests/charter_audit_test.rs covering all three steps + error paths (devtrail-not-installed, unknown charter, calibrate-without-auditor-outputs, schema validation failure, full cycle, mutually-exclusive flags). 400/400 tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First of 6 PRs implementing Phase 3 (multi-model external audit) + the open frictions F2/F5/F7. Framework-only — no CLI code yet.
What's added
dist/.devtrail/audit-prompts/auditor-primary.md— prompt template for the primary auditor.dist/.devtrail/audit-prompts/auditor-secondary.md— prompt template for the secondary auditor (different model family).dist/.devtrail/audit-prompts/calibrator-reconciler.md— prompt template for the third-tier calibrator that reconciles the two auditor outputs.dist/.devtrail/schemas/audit-output.schema.v0.json— JSON Schema Draft 2020-12 withoneOfdiscriminator onaudit_role(auditor outputs vs calibrator output).Architectural decision A1: orchestration-only
Phase 3 v0 is orchestration-only, not an HTTP-API client. The CLI prepares and persists prompts, awaits the operator's responses, validates outputs against the schema, integrates findings into the Charter telemetry — but does not invoke any LLM API directly.
Rationale:
/plan-auditskills. The CLI's value-add is the canon (prompt shape + output schema + telemetry integration), not the API call.Schema design
oneOfdiscriminator onaudit_role: three fixed roles, not arbitrary N.findings_by_categoryenum (hallucination | implementation_gap | real_debt | false_positive) is the same vocabulary used byexternal_auditincharter-telemetry.schema.v0.json. The audit cycle output integrates directly into Charter telemetry at close.prompt_used: <relative path>, satisfying RFC RFC: Phase 3 audit visibility — persist resolved prompts + standardize auditor handoff #82's requirement that the prompt path be discoverable from the output.Prompt design
agreed | disputed | unique_primary | unique_secondary | rejected) per finding. Status counts cross-check against body section count.Test plan
jsonmodule).dist-manifest.ymlchange needed —.devtrail/is already declared recursively.🤖 Generated with Claude Code