An open-source, high-integrity audit layer that sits downstream of any LLM-as-Judge pipeline. It catches sycophancy, hallucination, and reasoning failures before evaluated outputs reach downstream consumers.
LLM-as-Judge pipelines are increasingly used to evaluate AI outputs in safety-critical domains. But the evaluating model can tell the evaluated model what it wants to hear rather than what is true. There is currently no standard infrastructure for auditing the judge's own work.
The Correspondence Auditor provides that infrastructure.
The audit engine runs every judged output through a Three-Stage Gauntlet:
| Gate | Name | What It Does | LLM? |
|---|---|---|---|
| Gate 1 | Sanity Check | Deterministic structural validation — required fields, score ranges, forbidden terms | No |
| Gate 2 | The Librarian | Semantic truth verification — every claim is checked against provided source text, producing tripartite verdicts: SUPPORTED / CONTRADICTED / UNSUPPORTED |
Yes |
| Gate 3 | The Logic Engine | Logical coherence audit — uses Gate 2's results as context to catch internal contradictions, category errors, and reasoning failures | Yes |
Gates 2 and 3 are LLM-powered but separated by duty and grounded against provided evidence, not against the model's own beliefs. This is a fundamentally different approach to the "who watches the watchmen" problem.
The auditor generates a standalone HTML dashboard to visualize run telemetry, failure rates, and the LLM's deep reasoning traces.
(To view the live demo, click the image above. The dashboard runs entirely in the browser with no backend required).
Developers integrating this pipeline usually want to see exactly what the auditor catches. Here is a real example of the pipeline catching a hallucinated penalty.
1. The LLM Judge's Output (Input to Auditor) Your initial LLM judge evaluates a vendor and penalizes them for supposedly lacking a feature:
{
"factor": "Missing Independent Recovery Mechanism",
"polarity": -1,
"reason_headline": "Structural Risk Gap"
}2. The Ground Truth Source Material The auditor checks the judge's claim against the actual text provided to the judge:
"Entra ID provides the independent recovery paths and architectural safeguards required for fiduciary peace of mind."
3. The Quarantine Trace (Gate 2 Output) Gate 2 catches the contradiction. It fails the run, prevents the output from moving downstream, and generates a strict reasoning trace for the human auditor:
{
"status": "FAIL",
"failures": [
{
"claim": "Missing Independent Recovery Mechanism",
"status": "CONTRADICTED",
"source": "Main Body",
"quote": "Entra ID provides the independent recovery paths and architectural safeguards required for fiduciary peace of mind."
}
],
"model_thinking": "Text explicitly states 'independent recovery paths' in main body ('Entra ID provides the independent recovery paths...'), contradicting the 'missing' claim."
}The Correspondence Auditor is designed with segregation of duties in mind. While it comes with a CLI (run_audit.py) for asynchronous batch processing of files, the core engine is fully modular.
You can import the individual gates directly into your existing Python application and pass them standard dictionaries, entirely bypassing the file system:
from steps.audit_engine import run_gate_1_sanity, run_gate_2_facts, run_gate_3_logic
# Pass your in-memory JSON objects directly to the gates
sanity_result = run_gate_1_sanity(llm_output_dict)Fail-closed, not fail-open. Infrastructure errors produce ERROR states rather than silent passes. No output reaches downstream consumers without clearing all three gates.
High recall, false-positive bias. The system is tuned to minimise the risk of an undetected sycophantic or hallucinated error slipping through. Borderline cases are flagged, not passed.
Quarantine with reasoning traces. Failed outputs are quarantined with full reasoning traces for human review — not just a pass/fail label, but the auditor's working shown in full.
Backend-agnostic. Gates 2 and 3 support configurable LLM backends (local Ollama, OpenRouter, or Cerebras), so you can run the audit pipeline entirely on-premises if your threat model requires it.
├── run_audit.py # CLI entry point
├── steps/
│ └── audit_engine.py # Core 3-gate audit logic
├── prompts/
│ ├── P10_fact_checker_v3.1.json # Gate 2 prompt manifest
│ └── P11_logic_auditor_v3.0.json # Gate 3 prompt manifest
├── shared/
│ ├── llm_utils.py # Prompt loading, LLM dispatch, retry logic
│ ├── string_utils.py # Robust JSON extraction from LLM output
│ ├── ui_utils.py # Terminal formatting
│ ├── logging_utils.py # Telemetry logging (stubbed for standalone use)
│ └── api_clients/
│ ├── ollama_client.py # Local Ollama backend
│ ├── openrouter_client.py # OpenRouter API backend
│ └── cerebras_client.py # Cerebras API backend
├── requirements.txt
└── output/ # Place outputs to audit here
pip install -r requirements.txtConfigure API keys in a .env file at the project root:
OPENROUTER_API_KEY=sk-or-...
CEREBRAS_API_KEY_FREE=csk-...
CEREBRAS_API_KEY_PAID=csk-...
For local inference, ensure Ollama is running with the model specified in the prompt manifests.
python run_audit.pyThe CLI will prompt you to select an output folder and execution target, then offer options to run audits, view results in an interactive HTML dashboard, or archive failures.
Each audit run folder should contain:
- JSON outputs from your LLM-as-Judge pipeline (the evaluated outputs to audit)
- Source material (JSON or YAML) containing the evidence the judge was supposed to evaluate against
- Optionally, a
_provenance/manifest.jsonpointing to the source material
The auditor is domain-agnostic — it validates any LLM-as-Judge output against any provided source text. The current reference implementation audits cognitive simulation outputs, but the architecture applies wherever an LLM is judging another LLM's work.
Working production code with 16 months of deployment data. Currently being transitioned to community-owned infrastructure under AGPLv3.
AGPL-3.0 — ensuring this tool remains a public good and cannot disappear behind a paywall.
Adrian St. Vaughan
- LinkedIn: Adrian St. Vaughan
