Skip to content

MokeyBytes/pdf2markdown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF2Markdown

Convert PDF files to clean, well-structured Markdown using Claude AI.

Features

  • Extracts text from PDFs and formats it into proper Markdown
  • Automatic chunking for large documents (handles PDFs of any size)
  • Proper heading hierarchy, lists, tables, and code block detection
  • Token usage and cost reporting
  • CLI interface with sensible defaults

Setup

1. Install dependencies

pip install -r requirements.txt

2. Set your Anthropic API key

Option A: Environment variable (recommended)

export ANTHROPIC_API_KEY='your-api-key-here'

Add it to your shell profile (~/.zshrc, ~/.bashrc) to persist across sessions.

Option B: .env file

Create a .env file in the project root:

ANTHROPIC_API_KEY=your-api-key-here
ANTHROPIC_MODEL=claude-sonnet-4-6

See .env.example for all available options.

Then load it before running:

source .env && python pdf2markdown.py input.pdf

The .env file is gitignored by default to prevent accidental key exposure.

Usage

# Basic usage (outputs input.md alongside the PDF)
python pdf2markdown.py document.pdf

# Specify output path
python pdf2markdown.py document.pdf -o output.md

# Use a different Claude model
python pdf2markdown.py document.pdf -m claude-haiku-4-5

Options

Flag Description Default
pdf Path to the input PDF file (required)
-o, --output Output Markdown file path Same name as PDF with .md extension
-m, --model Claude model to use claude-sonnet-4-6

Available Models

Model Best For Input $/M tokens Output $/M tokens
claude-opus-4-6 Highest quality output $15.00 $75.00
claude-sonnet-4-6 Balance of quality and cost (default) $3.00 $15.00
claude-haiku-4-5 Fastest and cheapest $0.80 $4.00

Set your preferred model via the ANTHROPIC_MODEL env var or the -m CLI flag. The CLI flag takes priority.

About

Python script to extract text from PDFs to markdown format using openAI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages