Convert PDF files to clean, well-structured Markdown using Claude AI.
- Extracts text from PDFs and formats it into proper Markdown
- Automatic chunking for large documents (handles PDFs of any size)
- Proper heading hierarchy, lists, tables, and code block detection
- Token usage and cost reporting
- CLI interface with sensible defaults
pip install -r requirements.txtOption A: Environment variable (recommended)
export ANTHROPIC_API_KEY='your-api-key-here'Add it to your shell profile (~/.zshrc, ~/.bashrc) to persist across sessions.
Option B: .env file
Create a .env file in the project root:
ANTHROPIC_API_KEY=your-api-key-here
ANTHROPIC_MODEL=claude-sonnet-4-6See .env.example for all available options.
Then load it before running:
source .env && python pdf2markdown.py input.pdfThe
.envfile is gitignored by default to prevent accidental key exposure.
# Basic usage (outputs input.md alongside the PDF)
python pdf2markdown.py document.pdf
# Specify output path
python pdf2markdown.py document.pdf -o output.md
# Use a different Claude model
python pdf2markdown.py document.pdf -m claude-haiku-4-5| Flag | Description | Default |
|---|---|---|
pdf |
Path to the input PDF file | (required) |
-o, --output |
Output Markdown file path | Same name as PDF with .md extension |
-m, --model |
Claude model to use | claude-sonnet-4-6 |
| Model | Best For | Input $/M tokens | Output $/M tokens |
|---|---|---|---|
claude-opus-4-6 |
Highest quality output | $15.00 | $75.00 |
claude-sonnet-4-6 |
Balance of quality and cost (default) | $3.00 | $15.00 |
claude-haiku-4-5 |
Fastest and cheapest | $0.80 | $4.00 |
Set your preferred model via the ANTHROPIC_MODEL env var or the -m CLI flag. The CLI flag takes priority.