pdf-to-text

AI coding skill for converting PDF files to clean text. Handles both embedded-text PDFs and scanned/image PDFs via OCR.

setup

navigate to ai project skills folder

git clone https://github.com/RandyHaylor/pdf-to-text.git

option 1: tell ai agent to use the pdf-to-text skill for your task
option 2: update your project or global agent instructions to incorporate the pdf-to-text skill

This skill goes beyond basic extraction with several techniques that improve reliability and output quality.

Two-page sample before full extraction
Rather than blindly processing the entire PDF, the agent extracts just the first two pages and evaluates quality before committing to a method. This catches encoding issues, column mangling, and layout problems early — before wasting time on a bad full run.
Comparative method selection
When embedded text extraction produces poor results, the agent runs OCR on the same sample pages and compares both outputs side by side. The better method wins. No guessing.
Visual inspection via AI vision
When neither extraction method produces clean output, the agent renders pages as images and visually inspects them using multimodal vision. It can see what's actually on the page — watermarks, unusual fonts, columns, embedded images of text — and diagnose the specific issue before recommending a strategy.
User checkpoint before full processing
The agent reports its sample findings and recommended approach to the user before running full extraction. No surprises, no wasted processing on the wrong method.
Graceful tool fallback chain
The skill defines three extraction paths (pdftotext → ocrmypdf → tesseract + pdftoppm) and checks tool availability upfront. If the preferred tool is missing, it falls through to the next option rather than failing.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
SKILL.md		SKILL.md