empanel

Anti-sycophancy multi-evaluator engine for LLM agents.

Empanel a grand jury of independent reviewers over the same code. Each lens returns its own verdict. Weighers score them in isolation. A deterministic synthesizer combines the findings into a ranked report — without any reviewer ever seeing another's opinion.

The thesis: LLM-as-judge workflows anchor catastrophically when one model sees another's output. Empanel enforces isolation at every stage so diverse lenses produce diverse findings, and weighers can't be flattered into consensus.

pip install empanel

Quickstart

# Review a file with four independent lenses, synthesize a ranked report
empanel review \
  --files src/contract.sol \
  --spec spec.md \
  --model claude-opus-4-7 \
  --output review.json \
  --markdown review.md

from empanel import CodeReviewEngine
from empanel.lenses import SECURITY, SPEC, EDGE_CASES, ARCHITECTURE

engine = CodeReviewEngine(lenses=[SECURITY, SPEC, EDGE_CASES, ARCHITECTURE])
result = engine.run(code=Path("src/contract.sol").read_text(),
                    spec=Path("spec.md").read_text())

for finding in result.findings:
    print(f"[{finding.confidence}] {finding.title} — {finding.location}")

Why a grand jury metaphor

A grand jury is the closest real proceeding to what this tool does:

Multiple independent reviewers hearing the same evidence
No cross-examination between reviewers — each works in isolation
Output is a ranked list of indictments (issues worth pursuing), not a verdict
Evidence-enforced — every finding must cite the code

The architecture maps directly onto those properties. Adding a new lens is seating another juror; tightening weighing is tightening isolation rules.

How it works

Three phases, isolated by construction:

Evaluate — each lens reviews the code independently. No cross-talk.
Weigh — each weigher scores the raw findings without seeing other weighers' scores or lens identities. This is the anti-sycophancy wedge: a weigher can't be flattered into agreement with the majority.
Synthesize — deterministic math combines the scores. Finding fingerprints deduplicate near-identical reports. Confidence tiers fall out of reviewer concurrence, not vibes.

Replay artifacts are stored as JSON so any review can be reproduced or disputed after the fact.

Built-in lenses

SECURITY — adversarial threat model framing
SPEC — compares implementation against an optional spec
EDGE_CASES — boundaries, error paths, null/empty, overflow
ARCHITECTURE — coupling, leaky abstractions, hidden state
PERFORMANCE — complexity, allocation, hot paths
READABILITY — naming, flow, cognitive load

Register custom lenses by subclassing Lens and passing them to the engine. The only contract is "return a list of Findings with evidence."

Integration points

Claude Code slash command — qa-hard uses empanel as the review backend. Runs without an API key by dispatching each evaluator through the Claude Code Agent tool.
Standalone CLI — empanel review works with any Anthropic-API-keyed setup. Model is a flag, so anything that quacks like Claude works.
Fixtures — empanel.fixtures bundles the regression corpus of real bugs the engine has caught. Use tests/test_fixtures.py as a template for pinning your own.

Status

v0.3.0 — renamed from independent-eval. 141 tests pass.
Self-review converged after three rounds at v0.2.x — engine reviews its own source without regressions.
Roadmap: ROADMAP.md (pluggable lenses, cost budgeting, cross-session replay diffing).

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
empanel		empanel
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
KNOWN_ISSUES.md		KNOWN_ISSUES.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

empanel

Quickstart

Why a grand jury metaphor

How it works

Built-in lenses

Integration points

Status

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

empanel

Quickstart

Why a grand jury metaphor

How it works

Built-in lenses

Integration points

Status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 1

Languages

Packages