Skip to content

ademasi/ref-checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ref-checker

Extract references from academic PDFs and verify them against the CrossRef API.

Features

  • PDF reference extraction — automatically finds the References section and parses numbered [n] entries
  • DOI extraction from hyperlinks — uses embedded PDF links (most reliable) with regex fallback for line-wrapped DOIs
  • CrossRef verification — checks each reference by DOI resolution or strict title matching
  • Issue detection — flags year mismatches, broken DOIs, and title discrepancies
  • Missing DOI suggestions — high-confidence only (near-exact title + year match)
  • Rich console output — color-coded tables with progress bar
  • Export — CSV and/or JSON output

Installation

Requires Python 3.10+ and uv.

git clone git@github.com:ademasi/ref-checker.git
cd ref-checker
uv sync

Usage

# Verify references in a PDF
uv run check_references.py paper.pdf

# Export results to CSV
uv run check_references.py paper.pdf --export csv

# Export results to JSON
uv run check_references.py paper.pdf --export json

# Export both CSV and JSON
uv run check_references.py paper.pdf --export both

Output files are saved next to the input PDF as <name>.refs.csv / <name>.refs.json.

Output

The tool displays:

  1. Summary panel — total count with OK / Issues / Not Found / Missing DOI breakdown
  2. All References table — status, year, first author, title, DOI presence, verification method (with similarity %)
  3. Issues table — details on DOI resolution failures, year/title mismatches
  4. Not Found table — references that couldn't be matched on CrossRef (may be correct but not indexed)
  5. Missing DOIs table — high-confidence DOI suggestions for references that omit them

How it works

  1. Extract text and DOI hyperlinks from the PDF using PyMuPDF
  2. Parse the References section into individual entries (authors, year, title, DOI)
  3. Verify each reference:
    • If DOI present → resolve via CrossRef API
    • Otherwise → search CrossRef by title, rank candidates by normalized title similarity + year match
  4. Report results with color-coded status

Development

# Lint and format
uvx ruff check .
uvx ruff format .

# Run tests
uv run pytest

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages