BookConvert

Convert PDF books to clean Markdown files for use in Claude Projects, NotebookLM, and other LLM tools.

Handles both text-based PDFs (using Marker) and scanned/image-based PDFs (using OCR via Tesseract).

Setup

1. Install system dependencies (macOS)

brew install tesseract poppler ghostscript

2. Clone and set up Python environment

git clone https://github.com/AndySparks/BookConvert.git
cd BookConvert
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Usage

Convert a single PDF

python convert.py input/MyBook.pdf

Convert all PDFs in a directory

python convert.py input/

Convert a scanned/image-based PDF (OCR mode)

python convert.py input/ScannedBook.pdf --ocr

Specify output directory

python convert.py input/MyBook.pdf --output output/Philosophy/

How it works

Drop your PDF(s) into the input/ folder
Run convert.py — it uses Marker by default for text-based PDFs, which produces high-quality markdown
For scanned books (where the pages are images), use the --ocr flag — this uses Tesseract to extract text via OCR
Converted markdown appears in output/

Using with Claude Code

This project includes a CLAUDE.md file, so Claude Code understands the project and can help you convert and clean up books. Just open the project directory in Claude Code and ask it to help convert your PDFs.

Tips

Start with the default mode (no --ocr flag). It's faster and produces better formatting for text-based PDFs.
Use --ocr only if the default mode produces empty or garbled output — this usually means the PDF is scanned/image-based.
OCR output may need cleanup. Claude Code can help you fix OCR artifacts, add proper headings, and improve formatting.
Organize your output into subdirectories by topic (e.g., output/Coaching/, output/Writing/) to keep things tidy.

Project structure

BookConvert/
  input/          <- Drop your PDFs here
  output/         <- Converted markdown files appear here
  convert.py      <- Main conversion script
  requirements.txt
  CLAUDE.md       <- Instructions for Claude Code

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
input		input
output		output
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
convert.py		convert.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BookConvert

Setup

1. Install system dependencies (macOS)

2. Clone and set up Python environment

Usage

Convert a single PDF

Convert all PDFs in a directory

Convert a scanned/image-based PDF (OCR mode)

Specify output directory

How it works

Using with Claude Code

Tips

Project structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BookConvert

Setup

1. Install system dependencies (macOS)

2. Clone and set up Python environment

Usage

Convert a single PDF

Convert all PDFs in a directory

Convert a scanned/image-based PDF (OCR mode)

Specify output directory

How it works

Using with Claude Code

Tips

Project structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages