Correspondence Auditor — Three-Stage Gauntlet for LLM-as-Judge Pipelines

An open-source, high-integrity audit layer that sits downstream of any LLM-as-Judge pipeline. It catches sycophancy, hallucination, and reasoning failures before evaluated outputs reach downstream consumers.

The Problem

LLM-as-Judge pipelines are increasingly used to evaluate AI outputs in safety-critical domains. But the evaluating model can tell the evaluated model what it wants to hear rather than what is true. There is currently no standard infrastructure for auditing the judge's own work.

The Correspondence Auditor provides that infrastructure.

How It Works

The audit engine runs every judged output through a Three-Stage Gauntlet:

Gate	Name	What It Does	LLM?
Gate 1	Sanity Check	Deterministic structural validation — required fields, score ranges, forbidden terms	No
Gate 2	The Librarian	Semantic truth verification — every claim is checked against provided source text, producing tripartite verdicts: `SUPPORTED` / `CONTRADICTED` / `UNSUPPORTED`	Yes
Gate 3	The Logic Engine	Logical coherence audit — uses Gate 2's results as context to catch internal contradictions, category errors, and reasoning failures	Yes

Gates 2 and 3 are LLM-powered but separated by duty and grounded against provided evidence, not against the model's own beliefs. This is a fundamentally different approach to the "who watches the watchmen" problem.

Interactive Dashboard

The auditor generates a standalone HTML dashboard to visualize run telemetry, failure rates, and the LLM's deep reasoning traces.

(To view the live demo, click the image above. The dashboard runs entirely in the browser with no backend required).

Example Input & Output (The Data Shape)

Developers integrating this pipeline usually want to see exactly what the auditor catches. Here is a real example of the pipeline catching a hallucinated penalty.

1. The LLM Judge's Output (Input to Auditor) Your initial LLM judge evaluates a vendor and penalizes them for supposedly lacking a feature:

{
  "factor": "Missing Independent Recovery Mechanism",
  "polarity": -1,
  "reason_headline": "Structural Risk Gap"
}

2. The Ground Truth Source Material The auditor checks the judge's claim against the actual text provided to the judge:

"Entra ID provides the independent recovery paths and architectural safeguards required for fiduciary peace of mind."

3. The Quarantine Trace (Gate 2 Output) Gate 2 catches the contradiction. It fails the run, prevents the output from moving downstream, and generates a strict reasoning trace for the human auditor:

{
  "status": "FAIL",
  "failures": [
    {
      "claim": "Missing Independent Recovery Mechanism",
      "status": "CONTRADICTED",
      "source": "Main Body",
      "quote": "Entra ID provides the independent recovery paths and architectural safeguards required for fiduciary peace of mind."
    }
  ],
  "model_thinking": "Text explicitly states 'independent recovery paths' in main body ('Entra ID provides the independent recovery paths...'), contradicting the 'missing' claim."
}

Integrating into Existing Pipelines

The Correspondence Auditor is designed with segregation of duties in mind. While it comes with a CLI (run_audit.py) for asynchronous batch processing of files, the core engine is fully modular.

You can import the individual gates directly into your existing Python application and pass them standard dictionaries, entirely bypassing the file system:

from steps.audit_engine import run_gate_1_sanity, run_gate_2_facts, run_gate_3_logic

# Pass your in-memory JSON objects directly to the gates
sanity_result = run_gate_1_sanity(llm_output_dict)

Design Principles

Fail-closed, not fail-open. Infrastructure errors produce ERROR states rather than silent passes. No output reaches downstream consumers without clearing all three gates.

High recall, false-positive bias. The system is tuned to minimise the risk of an undetected sycophantic or hallucinated error slipping through. Borderline cases are flagged, not passed.

Quarantine with reasoning traces. Failed outputs are quarantined with full reasoning traces for human review — not just a pass/fail label, but the auditor's working shown in full.

Backend-agnostic. Gates 2 and 3 support configurable LLM backends (local Ollama, OpenRouter, or Cerebras), so you can run the audit pipeline entirely on-premises if your threat model requires it.

Project Structure

├── run_audit.py                  # CLI entry point
├── steps/
│   └── audit_engine.py           # Core 3-gate audit logic
├── prompts/
│   ├── P10_fact_checker_v3.1.json   # Gate 2 prompt manifest
│   └── P11_logic_auditor_v3.0.json  # Gate 3 prompt manifest
├── shared/
│   ├── llm_utils.py              # Prompt loading, LLM dispatch, retry logic
│   ├── string_utils.py           # Robust JSON extraction from LLM output
│   ├── ui_utils.py               # Terminal formatting
│   ├── logging_utils.py          # Telemetry logging (stubbed for standalone use)
│   └── api_clients/
│       ├── ollama_client.py      # Local Ollama backend
│       ├── openrouter_client.py  # OpenRouter API backend
│       └── cerebras_client.py    # Cerebras API backend
├── requirements.txt
└── output/                       # Place outputs to audit here

Setup

pip install -r requirements.txt

Configure API keys in a .env file at the project root:

OPENROUTER_API_KEY=sk-or-...
CEREBRAS_API_KEY_FREE=csk-...
CEREBRAS_API_KEY_PAID=csk-...

For local inference, ensure Ollama is running with the model specified in the prompt manifests.

Usage

python run_audit.py

The CLI will prompt you to select an output folder and execution target, then offer options to run audits, view results in an interactive HTML dashboard, or archive failures.

Expected Input

Each audit run folder should contain:

JSON outputs from your LLM-as-Judge pipeline (the evaluated outputs to audit)
Source material (JSON or YAML) containing the evidence the judge was supposed to evaluate against
Optionally, a _provenance/manifest.json pointing to the source material

The auditor is domain-agnostic — it validates any LLM-as-Judge output against any provided source text. The current reference implementation audits cognitive simulation outputs, but the architecture applies wherever an LLM is judging another LLM's work.

Status

Working production code with 16 months of deployment data. Currently being transitioned to community-owned infrastructure under AGPLv3.

License

AGPL-3.0 — ensuring this tool remains a public good and cannot disappear behind a paywall.

Contact & Maintainer

Adrian St. Vaughan

LinkedIn: Adrian St. Vaughan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Correspondence Auditor — Three-Stage Gauntlet for LLM-as-Judge Pipelines

The Problem

How It Works

Interactive Dashboard

Example Input & Output (The Data Shape)

Integrating into Existing Pipelines

Design Principles

Project Structure

Setup

Usage

Expected Input

Status

License

Contact & Maintainer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
output/sample_run		output/sample_run
prompts		prompts
shared		shared
steps		steps
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_audit.py		run_audit.py

Folders and files

Latest commit

History

Repository files navigation

Correspondence Auditor — Three-Stage Gauntlet for LLM-as-Judge Pipelines

The Problem

How It Works

Interactive Dashboard

Example Input & Output (The Data Shape)

Integrating into Existing Pipelines

Design Principles

Project Structure

Setup

Usage

Expected Input

Status

License

Contact & Maintainer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages