Extract references from academic PDFs and verify them against the CrossRef API.
- PDF reference extraction — automatically finds the References section and parses numbered
[n]entries - DOI extraction from hyperlinks — uses embedded PDF links (most reliable) with regex fallback for line-wrapped DOIs
- CrossRef verification — checks each reference by DOI resolution or strict title matching
- Issue detection — flags year mismatches, broken DOIs, and title discrepancies
- Missing DOI suggestions — high-confidence only (near-exact title + year match)
- Rich console output — color-coded tables with progress bar
- Export — CSV and/or JSON output
Requires Python 3.10+ and uv.
git clone git@github.com:ademasi/ref-checker.git
cd ref-checker
uv sync# Verify references in a PDF
uv run check_references.py paper.pdf
# Export results to CSV
uv run check_references.py paper.pdf --export csv
# Export results to JSON
uv run check_references.py paper.pdf --export json
# Export both CSV and JSON
uv run check_references.py paper.pdf --export bothOutput files are saved next to the input PDF as <name>.refs.csv / <name>.refs.json.
The tool displays:
- Summary panel — total count with OK / Issues / Not Found / Missing DOI breakdown
- All References table — status, year, first author, title, DOI presence, verification method (with similarity %)
- Issues table — details on DOI resolution failures, year/title mismatches
- Not Found table — references that couldn't be matched on CrossRef (may be correct but not indexed)
- Missing DOIs table — high-confidence DOI suggestions for references that omit them
- Extract text and DOI hyperlinks from the PDF using PyMuPDF
- Parse the References section into individual entries (authors, year, title, DOI)
- Verify each reference:
- If DOI present → resolve via CrossRef API
- Otherwise → search CrossRef by title, rank candidates by normalized title similarity + year match
- Report results with color-coded status
# Lint and format
uvx ruff check .
uvx ruff format .
# Run tests
uv run pytestMIT