Skip to content

The 4 Judge Quality Council

Harery edited this page May 30, 2026 · 1 revision

The 4-Judge Quality Council

A major bottleneck for LLM-driven engineering tools is the trust barrier. Standard code review tools will often hallucinate issues, claim that things exist when they do not, or generate incorrect configurations.

Praetor solves this by introducing a Quality Council (QC) layer consisting of four specialized judges. Every artifact generated by the Tier 2, 3, and 4 agents must pass through the Quality Council before it can be emitted.


🏛️ The Four Judges and Their Mandates

1. Judge 1: Coverage Completeness

  • Focus: Are we auditing the right things?
  • Mandate: This judge compares the emitted artifacts against the codebase domain structures mapped out in Phase 2 (Domain Mapping). If an agent skipped a critical payment endpoint or database transaction check without an explicitly justified reason, the artifact is rejected.

2. Judge 2: Citation Verification (100% Correctness)

  • Focus: Is the evidence real?
  • Mandate: The Citation Judge is the ultimate safeguard against hallucination. For every code observation, vulnerability, or SRE hook, the judge checks the referenced file:line citation.
    • If the file path is incorrect, the artifact is rejected.
    • If the line range points to comments or irrelevant code, the artifact is rejected.
    • If the citation does not exist in the source-resolution cache, the artifact is rejected.

3. Judge 3: Clarity & Readability

  • Focus: Is this actionable for humans?
  • Mandate: This judge reviews formatting, layout, language tone, and structure. It ensures operational runbooks are readable by an SRE at 3:00 AM, that compliance evidence reads as clean control mappings for external auditors, and that UAT scripts are step-by-step and executable.

4. Judge 4: Skip-Validity Review

  • Focus: Are agent omissions justified?
  • Mandate: During analysis, agents might defer or skip checks (e.g. skipping WCAG audits for an API-only service). This judge audits all deferred list declarations. If an agent is found to be "taking shortcuts" rather than analyzing code, the task is sent back for re-execution.

🔄 The Feedback Loop

If any of the four judges reject an artifact:

  1. The Quality Council compiles a QC Review Report outlining the failure (e.g., “Judge 2: Citation in payment_gateway.go:L54 does not match authentication code”).
  2. The orchestrator resets the agent state and passes the review feedback back to the agent.
  3. The agent regenerates the artifact to correct the compliance error.
  4. The council re-audits the artifact. Only after all 4 judges mark the artifact as PASSED does Praetor move to final gate generation.

Clone this wiki locally