Claude states things confidently that turn out to be wrong.
Truth Shield is a Claude Code skill that verifies every factual claim against real sources — and flags anything it can't confirm.
|
Mac / Linux curl -sfL https://raw.githubusercontent.com/BAS-More/truth-shield/master/install.sh | bash |
|
Windows (PowerShell) irm https://raw.githubusercontent.com/BAS-More/truth-shield/master/install.ps1 | iex |
Tip
Add TRUTH_SHIELD_HOOK=yes before the install command to also install the always-on enforcement hook — a deterministic gate that runs outside Claude's context window and cannot be prompt-injected.
Manual install
# Mac / Linux
mkdir -p ~/.claude/skills
cp SKILL.md ~/.claude/skills/truth-shield.md
# Windows (PowerShell)
New-Item -ItemType Directory -Path "$env:USERPROFILE\.claude\skills" -Force | Out-Null
Copy-Item SKILL.md "$env:USERPROFILE\.claude\skills\truth-shield.md"Optional: enforcement hook
# Mac / Linux
mkdir -p ~/.claude/hooks
cp hooks/truth-shield-enforcer.js ~/.claude/hooks/
# Windows (PowerShell)
New-Item -ItemType Directory -Path "$env:USERPROFILE\.claude\hooks" -Force | Out-Null
Copy-Item hooks\truth-shield-enforcer.js "$env:USERPROFILE\.claude\hooks\"Then add to ~/.claude/hooks.json — see ENHANCE.md for full hook configuration.
Verify it works: Open Claude Code, ask any factual question, then type verify this.
Truth Shield is a single markdown file (SKILL.md) that Claude loads as a skill. It instructs Claude to:
- Extract claims — parse every factual statement from a response
- Pre-screen — self-consistency check flags uncertain claims
- Verify — check against available sources in tier order (fast/local first, slow/external last)
- Isolated context — read evidence without the original claim to break confirmation bias
- Cross-check — MiniCheck and multi-model sampling for independent verification
- Report — table with every claim, its verdict, and the evidence
The enforcement hook (v4) adds a deterministic layer outside Claude's context. Claude's confidence has zero influence — every response with factual claims must pass verification or it gets blocked.
Tested against 47 adversarial scenarios — 100% accuracy, 100% recall, zero false negatives.
| Metric | Score |
|---|---|
| True Positives | 18/18 — every hallucination caught |
| True Negatives | 29/29 — no false alarms |
| Precision | 100% — no false positives |
| Recall | 100% — no false negatives |
The hardest category — confident-but-wrong claims — improved from ~40% to ~75% catch rate (+35 pts) thanks to the enforcement hook bypassing Claude's own confidence bias.
Run the full test yourself:
node hooks/catch-rate-test.js
See CATCH-RATE.md for the full breakdown, confusion matrix, and before/after comparison.
You: How does React's useMemo work?
Claude: [responds with claims about useMemo]
You: verify this
shield on
Every response is now verified before you see it. Inline markers flag anything wrong:
useMemo guarantees the cached value is never stale.
[CONTRADICTED — React docs: "You may rely on useMemo as a performance
optimization, not as a semantic guarantee."]
[shield: 4/5 verified, 1 contradicted | v3]
Turn it off: shield off
Add this line to your CLAUDE.md:
Always verify factual claims before presenting them using the truth-shield skill.
| Rating | Meaning |
|---|---|
| VERIFIED | Confirmed by a real source. Evidence quoted. |
| UNVERIFIED | No source could confirm or deny. Not wrong — just unconfirmed. |
| CONTRADICTED | A source directly contradicts the claim. Correction provided. |
| CONFLICTED | Sources disagree with each other. Both positions presented. |
| UNCERTAIN | Self-consistency pre-screen flagged low internal confidence. |
Claude's own confidence is never treated as a source. A claim stated with certainty gets the same scrutiny as a hedged guess.
Trigger phrases
| Phrase | What happens |
|---|---|
verify this |
Full verification of the previous response |
truth-check this / fact-check this |
Same as above |
shield this / truth shield / is this true |
Same as above |
check your work |
Full verification |
shield on |
Continuous mode — every response verified |
shield off |
Stop continuous mode |
are you sure... |
Spot-check a specific claim |
really? / source? / prove it |
Spot-check the most recent claim |
how do you know / is that right / double-check that |
Spot-check the most recent claim |
Out of the box, Truth Shield uses tools every Claude Code session has:
| Tool | What it verifies |
|---|---|
| Grep / Read / Glob | Code claims — function names, file paths, signatures |
| WebSearch | General knowledge — dates, versions, people, facts |
Optional: add more verification sources (11 tiers)
Install additional MCP servers, models, and hooks to unlock more tiers. See ENHANCE.md for setup.
| Add this | What you gain |
|---|---|
| Context7 (MCP) | Live library docs — React, Express, Prisma, 9,000+ libraries |
| DepScope (MCP) | Package existence checking — catches hallucinated package names |
| Total Recall (MCP) | Persistent memory — corrections survive across sessions |
| fact-mcp (MCP) | Cached verifications — instant repeat lookups |
| Graphiti (MCP) | Entity relationships, temporal facts |
| Knowledge Graph (MCP) | Code structure — call chains, symbol maps, dependency trees |
| MiniCheck (Ollama) | External fact-checker — matches GPT-4 on grounding benchmarks |
| Local LLM proxy | Multi-model cross-check + self-consistency sampling |
| LLM Council (skill) | FACTS-style multi-judge conflict resolution |
| Stop hook v4 (hook) | Always-on enforcement with hook-level MiniCheck |
When Truth Shield finds a wrong claim and you have Total Recall installed, it persists the correction so the same mistake never happens again.
- Claude claims "useEffect runs before render"
- Truth Shield checks Context7 docs, finds it runs after render
- The correction is stored in Total Recall with trigger phrases
- Next week, Claude tries the same claim → caught instantly
Without Total Recall, corrections are reported but not remembered across sessions.
- Self-consistency pre-screen — flags claims Claude is internally uncertain about
- Isolated-context verification — breaks confirmation bias by reading evidence without the original claim
- DepScope package checking — catches hallucinated package names across 19 ecosystems
- MiniCheck fact-checker — purpose-built verification model (EMNLP 2024)
- Multi-judge arbitration — FACTS-style 3-judge panel resolves conflicts
- Always-on enforcement hook (v4) — deterministic enforcement outside Claude's context window
- Sources can be wrong. Docs can be outdated. Search results can be inaccurate.
- UNVERIFIED ≠ wrong. Many true things will be UNVERIFIED because no source was available.
- Cannot verify opinions or predictions. Only factual claims.
- Shield-on mode is slower. Every response goes through verification.
- MiniCheck requires Ollama. Without it, Tier 8 is skipped gracefully.
The goal is to move from "Claude said it confidently" to "Claude said it, and here's the evidence."
- Claude Code (any version with skills support)
- That's it. No API keys, no dependencies, no config files.
Optional enhancements — see ENHANCE.md.
Found a hallucination pattern Truth Shield misses? Open an issue with:
- What Claude said (the wrong claim)
- What the truth is (with source)
- Which verification tier should have caught it
Pull requests welcome.
Built by Avi Bendetsky.
v3 informed by research from: SelfCheckGPT, Chain of Verification (Meta 2023), MiniCheck (EMNLP 2024), Semantic Entropy (Nature 2024), FACTS Grounding (DeepMind 2024), DepScope.
MIT — see LICENSE.