Deterministic citation verification for US legal writing. Catches fabricated cases, fabricated quotes, and pincite errors against CourtListener. Refuses to accuse what it cannot verify.
Author: David Leung · davidleung.co
I work in compliance at a single family office in Hong Kong. I'm not a lawyer. I've spent a long time in offshore structuring, and enough time with AI tools to know where they fail quietly.
In April 2026, a well-known US law firm filed a letter with a federal bankruptcy court disclosing that an emergency motion they had filed nine days earlier contained hallucinated citations and other AI-generated errors. The remedial letter included a schedule correcting the errors line by line. It is a careful, public, and generous piece of work — the kind of documentation most firms never publish — and it makes a concrete benchmark possible for the rest of us.
I built this skill to make that schedule's four error classes deterministically detectable before a similar filing reaches a court.
The idea was prompted by Anthropic's recent Claude Skills webinar and by Mark Pike's Q&A discussion of the growing legal community around Claude Skills. Most of the code was written in collaboration with Claude Code.
Given a brief, motion, memo, or any document containing case citations, the skill runs a deterministic pipeline against CourtListener — the Free Law Project's open US case-law database:
- Parses every citation with eyecite.
- Resolves each citation to a canonical record via three fallback strategies.
- Fetches the full opinion text.
- Verifies that any quoted language actually appears in the opinion — with normalization for smart quotes, whitespace, editorial bracketing, ellipses, and star-pagination.
- Cross-checks the pincite where star-pagination is available.
- Emits a layered evidence ledger plus a human-readable report.
Two modes:
- Preflight — auditing your own draft before filing.
- Adversary — auditing opposing counsel's filing for a response brief or 28(j) letter.
The tool reports findings in layered form across seven dimensions. Two of those layers are always reported as not_reviewed:
- Proposition support. No judgment of whether the cited case stands for what the brief says it stands for. That requires legal reasoning; the tool retrieves the passage and leaves the call to the human.
- Treatment / citator. No KeyCite or Shepard's integration.
It also does not verify non-US law, statutes, regulations, or secondary sources. Those are out of scope, honestly labeled, and flagged to the reviewer.
A ✅ badge from this tool never means "this case supports the proposition." It means the quote and metadata are internally consistent. Proposition support is human judgment. See GOVERNANCE.md for the full honesty model.
I'd rather you know these upfront than discover them in a filing:
- CourtListener coverage gaps. Recent bankruptcy court opinions, unpublished decisions, and many Westlaw- or Lexis-only citations are not indexed. When that happens, the tool labels the result
unresolved — coverage gapwith a manual-verification note. It does not label it a potential fabrication. This matters: in adversary mode, miscalling a coverage gap would be defamatory. The tool is built not to do that. - Short-form back-references. Citations like
400 B.R. at 291that reference an earlier full cite are resolved but currently showcase_name: None, because eyecite doesn't automatically link short-form citations back to their antecedents. Not a correctness bug — the underlying citation is still verified — but the display is rough. - Reporter-page vs. star-page pincites. When a brief cites by reporter page (e.g.,
283) and CourtListener's opinion text only marks star-pagination (*284), the tool cannot deterministically translate between the two numbering systems. It returnspincite: unresolvedrather than guess. - Non-US law, statutes, regulations, treaties, secondary sources. Out of scope by design.
- Proposition support and treatment. Out of scope by design, as above.
- Not a lawyer. I built this as an engineer and compliance practitioner. The legal judgment in any output is yours. This tool is an evidence retrieval and matching pipeline, not substitute counsel.
The CHANGELOG.md documents the bugs I found during live testing and the fixes for each. I recommend reading it before relying on the tool — particularly the v1.3 and v1.6 entries, which concern false-positive patterns that live testing surfaced and mock testing would not have.
pip install eyecite
git clone https://github.com/dave817/case-verification
cd case-verification
# Free token from courtlistener.com/profile/
export COURTLISTENER_API_TOKEN=your_token
# Audit your own draft
python3 scripts/verify.py your_brief.txt --mode preflight --output report.md
# Audit opposing counsel's filing
python3 scripts/verify.py their_brief.txt --mode adversary --output response_memo.mdFor every citation, the tool emits a seven-layer status table:
| Layer | What it checks |
|---|---|
| A. Authority resolution | Does the citation resolve in CourtListener? |
| B. Source retrieval | Can we retrieve opinion text? |
| C. Metadata consistency | Do year and case name match the brief? |
| D. Quote verification | Does quoted language appear in the opinion? |
| E. Pincite verification | Is the *N page consistent with where the quote appears? |
| F. Proposition support | ALWAYS not_reviewed — engine does not judge |
| G. Treatment / citator | ALWAYS not_reviewed — no citator integration |
No top-level status reads as bare verified. The most confident label is verified_quote_and_metadata — a deliberate reminder that proposition support remains human work.
In adversary mode, the tool produces draftable paragraphs you can use in a response. The language is calibrated to distinguish a genuine catch (fabricated quote, wrong pincite) from a CourtListener coverage gap that calls for manual verification rather than accusation.
One sentence: the worst failure of a verification tool is not missing a hallucination — it is falsely accusing a real citation of being one.
Every iteration of this tool has been shaped by that sentence. The layered status model exists because a single verified badge overclaims. The distinction between reporter and WL citations exists because CourtListener is comprehensive on the former and spotty on the latter, and conflating them in adversary output is how you end up writing a response brief that damages your own credibility. The pincite path runs on raw text while quote matching runs on normalized text, because otherwise star-pagination markers contaminate the match positions.
Those decisions came from running the tool against real filed briefs and watching the early versions produce confident wrong answers. Each wrong answer pointed at a better design.
MIT. Pull requests welcome, especially:
- Additional test fixtures drawn from real filed briefs.
- Bug reports with live CourtListener responses attached.
- Coverage for citation forms I haven't handled yet.
If you find this useful, please consider donating to Free Law Project. Their work building open US case-law infrastructure is what makes this tool possible.
- Anthropic, whose Claude Skills webinar made this class of tool look buildable on a weekend rather than a quarter, and whose Claude Code product wrote most of the tests with me.
- Mark Pike, whose Q&A discussion of the growing legal community around Claude Skills pointed at exactly this kind of use case.
- Free Law Project, for a decade of work building CourtListener and eyecite and keeping them free. Commercial citators are good; free ones are civilisational.
- The compliance practitioners I've learned from, who taught me across many years why verifiable evidence matters more than confident assertion.
David Leung · Hong Kong · April 2026 davidleung.co