Snap photos of a book, get a single plain-text file ready to paste into Speechify (or any text-to-speech app).
Born out of frustration with Speechify's own scan feature after an update broke it. Runs entirely on your Mac using Apple Vision for OCR — no cloud, no API key, no upload. An optional review pass uses your existing Claude Code subscription to fix obvious OCR typos.
photos/ → sips (HEIC→JPG) → Apple Vision OCR → reflow & clean → speechify.txt
↘ optional Claude Code review
bookshot.sh is the orchestrator. Behind it:
sipsconverts every HEIC/JPG/JPEG/PNG in the input folder to a normalized JPG.ocr.swiftruns Apple Vision text recognition on each image. Two-page spreads are split down the middle and each page is OCR'd top-to-bottom.clean.pyreflows the raw OCR — strips page numbers and per-page headers, joins wrapped paragraph lines into single-line paragraphs (Speechify treats every line break as a sentence boundary), keeps headings and bullet items on their own lines.review.py(optional,--review) splits the output by chapter, asksclaude -pfor a JSON list of{find, replace}OCR fixes, and applies the ones that match uniquely.
- macOS (uses
sipsand Apple's Vision framework via Swift) python3(stdlib only)--reviewrequires the Claude Code CLI on yourPATH. It runsclaude -pheadlessly against your existing Claude Code subscription — no separate API key or billing.
./bookshot.sh <input-folder> [output-file] [flags]Flags:
--no-split— treat each photo as a single page (default assumes two-page spreads)--keep-temp— keep the intermediate_book_tmp/directory for inspection--review— run the AI cleanup pass (chunked, parallelized, ~1 minute per 20 chapters)
Examples:
# basic run — output goes to <folder>/speechify.txt
./bookshot.sh ~/Documents/my-book
# custom output path
./bookshot.sh ~/Documents/my-book ~/Desktop/my-book.txt
# single-page photos (e.g., from a book scanner app)
./bookshot.sh ~/Documents/my-book --no-split
# with AI typo cleanup
./bookshot.sh ~/Documents/my-book --reviewAdd to ~/.zshrc to run from anywhere:
alias bookshot="$HOME/path/to/bookshot/bookshot.sh"- Two-page spreads are easier — open the book flat, photograph the whole spread, move on. The script splits each photo down the middle.
- Sort the photos in the order they were taken (iPhone's default filename order works).
- Make sure the spine is roughly centered so the split lines up. Slight tilt is fine.
- Avoid fingers, glare, and shadows on the text. Vision is robust but not magic.
- Designed for prose books with chapter headings, paragraphs, and bullet lists. Novels with stylized typography or textbooks with multi-column layouts may produce mixed-up output.
- The single OCR pass on a half-spread reads top-to-bottom in a single column. Side-by-side text blocks within one page will interleave (rare in trade paperbacks).
- The heading detector caps section headings at 35 characters; longer ones get folded into the next paragraph (still readable, just not visually split).
- Hyphenated line breaks are rejoined by removing the newline only —
pebble-\nlikebecomespebble-like(correct), buthap-\npenedbecomeshap-pened(cosmetic only — Speechify still reads it correctly).
| File | Purpose |
|---|---|
bookshot.sh |
Orchestrates the pipeline |
ocr.swift |
Apple Vision OCR with two-page split and portrait fallback |
clean.py |
Reflows raw OCR into Speechify-ready paragraphs |
review.py |
Optional claude -p typo-fix pass |