GitHub - BAS-More/truth-shield: Stop Claude from lying to you. A Claude Code skill that fact-checks every claim against real sources — code files, live docs, web search — and flags anything it can't confirm.

Claude states things confidently that turn out to be wrong.
Truth Shield is a Claude Code skill that verifies every factual claim against real sources — and flags anything it can't confirm.

Install (30 seconds)

Mac / Linux

curl -sfL https://raw.githubusercontent.com/BAS-More/truth-shield/master/install.sh | bash

Windows (PowerShell)

irm https://raw.githubusercontent.com/BAS-More/truth-shield/master/install.ps1 | iex

Tip

Add TRUTH_SHIELD_HOOK=yes before the install command to also install the always-on enforcement hook — a deterministic gate that runs outside Claude's context window and cannot be prompt-injected.

Manual install

# Mac / Linux
mkdir -p ~/.claude/skills
cp SKILL.md ~/.claude/skills/truth-shield.md

# Windows (PowerShell)
New-Item -ItemType Directory -Path "$env:USERPROFILE\.claude\skills" -Force | Out-Null
Copy-Item SKILL.md "$env:USERPROFILE\.claude\skills\truth-shield.md"

Optional: enforcement hook

# Mac / Linux
mkdir -p ~/.claude/hooks
cp hooks/truth-shield-enforcer.js ~/.claude/hooks/

# Windows (PowerShell)
New-Item -ItemType Directory -Path "$env:USERPROFILE\.claude\hooks" -Force | Out-Null
Copy-Item hooks\truth-shield-enforcer.js "$env:USERPROFILE\.claude\hooks\"

Then add to ~/.claude/hooks.json — see ENHANCE.md for full hook configuration.

Verify it works: Open Claude Code, ask any factual question, then type verify this.

How it works

Truth Shield is a single markdown file (SKILL.md) that Claude loads as a skill. It instructs Claude to:

Extract claims — parse every factual statement from a response
Pre-screen — self-consistency check flags uncertain claims
Verify — check against available sources in tier order (fast/local first, slow/external last)
Isolated context — read evidence without the original claim to break confirmation bias
Cross-check — MiniCheck and multi-model sampling for independent verification
Report — table with every claim, its verdict, and the evidence

The enforcement hook (v4) adds a deterministic layer outside Claude's context. Claude's confidence has zero influence — every response with factual claims must pass verification or it gets blocked.

Measured catch rate

Tested against 47 adversarial scenarios — 100% accuracy, 100% recall, zero false negatives.

Metric	Score
True Positives	18/18 — every hallucination caught
True Negatives	29/29 — no false alarms
Precision	100% — no false positives
Recall	100% — no false negatives

The hardest category — confident-but-wrong claims — improved from ~40% to ~75% catch rate (+35 pts) thanks to the enforcement hook bypassing Claude's own confidence bias.

Run the full test yourself: node hooks/catch-rate-test.js

See CATCH-RATE.md for the full breakdown, confusion matrix, and before/after comparison.

How to use it

Say "verify this" after any response

You: How does React's useMemo work?
Claude: [responds with claims about useMemo]
You: verify this

Turn on continuous mode for high-stakes work

shield on

Every response is now verified before you see it. Inline markers flag anything wrong:

useMemo guarantees the cached value is never stale.
[CONTRADICTED — React docs: "You may rely on useMemo as a performance
optimization, not as a semantic guarantee."]

[shield: 4/5 verified, 1 contradicted | v3]

Turn it off: shield off

Always-on mode (no trigger needed)

Add this line to your CLAUDE.md:

Always verify factual claims before presenting them using the truth-shield skill.

Confidence ratings

Rating	Meaning
VERIFIED	Confirmed by a real source. Evidence quoted.
UNVERIFIED	No source could confirm or deny. Not wrong — just unconfirmed.
CONTRADICTED	A source directly contradicts the claim. Correction provided.
CONFLICTED	Sources disagree with each other. Both positions presented.
UNCERTAIN	Self-consistency pre-screen flagged low internal confidence.

Claude's own confidence is never treated as a source. A claim stated with certainty gets the same scrutiny as a hedged guess.

Trigger phrases

Phrase	What happens
`verify this`	Full verification of the previous response
`truth-check this` / `fact-check this`	Same as above
`shield this` / `truth shield` / `is this true`	Same as above
`check your work`	Full verification
`shield on`	Continuous mode — every response verified
`shield off`	Stop continuous mode
`are you sure...`	Spot-check a specific claim
`really?` / `source?` / `prove it`	Spot-check the most recent claim
`how do you know` / `is that right` / `double-check that`	Spot-check the most recent claim

What gets checked

Out of the box, Truth Shield uses tools every Claude Code session has:

Tool	What it verifies
Grep / Read / Glob	Code claims — function names, file paths, signatures
WebSearch	General knowledge — dates, versions, people, facts

Optional: add more verification sources (11 tiers)

Install additional MCP servers, models, and hooks to unlock more tiers. See ENHANCE.md for setup.

Add this	What you gain
Context7 (MCP)	Live library docs — React, Express, Prisma, 9,000+ libraries
DepScope (MCP)	Package existence checking — catches hallucinated package names
Total Recall (MCP)	Persistent memory — corrections survive across sessions
fact-mcp (MCP)	Cached verifications — instant repeat lookups
Graphiti (MCP)	Entity relationships, temporal facts
Knowledge Graph (MCP)	Code structure — call chains, symbol maps, dependency trees
MiniCheck (Ollama)	External fact-checker — matches GPT-4 on grounding benchmarks
Local LLM proxy	Multi-model cross-check + self-consistency sampling
LLM Council (skill)	FACTS-style multi-judge conflict resolution
Stop hook v4 (hook)	Always-on enforcement with hook-level MiniCheck

Learning loop

When Truth Shield finds a wrong claim and you have Total Recall installed, it persists the correction so the same mistake never happens again.

Claude claims "useEffect runs before render"
Truth Shield checks Context7 docs, finds it runs after render
The correction is stored in Total Recall with trigger phrases
Next week, Claude tries the same claim → caught instantly

Without Total Recall, corrections are reported but not remembered across sessions.

v3 / v4 upgrades

Self-consistency pre-screen — flags claims Claude is internally uncertain about
Isolated-context verification — breaks confirmation bias by reading evidence without the original claim
DepScope package checking — catches hallucinated package names across 19 ecosystems
MiniCheck fact-checker — purpose-built verification model (EMNLP 2024)
Multi-judge arbitration — FACTS-style 3-judge panel resolves conflicts
Always-on enforcement hook (v4) — deterministic enforcement outside Claude's context window

Limitations

Sources can be wrong. Docs can be outdated. Search results can be inaccurate.
UNVERIFIED ≠ wrong. Many true things will be UNVERIFIED because no source was available.
Cannot verify opinions or predictions. Only factual claims.
Shield-on mode is slower. Every response goes through verification.
MiniCheck requires Ollama. Without it, Tier 8 is skipped gracefully.

The goal is to move from "Claude said it confidently" to "Claude said it, and here's the evidence."

Requirements

Claude Code (any version with skills support)
That's it. No API keys, no dependencies, no config files.

Optional enhancements — see ENHANCE.md.

Contributing

Found a hallucination pattern Truth Shield misses? Open an issue with:

What Claude said (the wrong claim)
What the truth is (with source)
Which verification tier should have caught it

Pull requests welcome.

Credit

Built by Avi Bendetsky.

v3 informed by research from: SelfCheckGPT, Chain of Verification (Meta 2023), MiniCheck (EMNLP 2024), Semantic Entropy (Nature 2024), FACTS Grounding (DeepMind 2024), DepScope.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install (30 seconds)

How it works

Measured catch rate

How to use it

Say "verify this" after any response

Turn on continuous mode for high-stakes work

Always-on mode (no trigger needed)

Confidence ratings

What gets checked

Learning loop

v3 / v4 upgrades

Limitations

Requirements

Contributing

Credit

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
assets		assets
hooks		hooks
.gitignore		.gitignore
CATCH-RATE.md		CATCH-RATE.md
ENHANCE.md		ENHANCE.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
install.ps1		install.ps1
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

Install (30 seconds)

How it works

Measured catch rate

How to use it

Say "verify this" after any response

Turn on continuous mode for high-stakes work

Always-on mode (no trigger needed)

Confidence ratings

What gets checked

Learning loop

v3 / v4 upgrades

Limitations

Requirements

Contributing

Credit

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages