PDF to Text

A tiny CLI utility to stream large PDF files into plain text without loading the entire file into memory. It wraps pdfminer.six with page-based iteration, configurable LAParams, a friendly CLI spinner, and safe logging so you can batch-process enormous PDFs. When conversion finishes, the CLI prints a summary showing file sizes and elapsed time.

Installation

Create or activate your Python virtual environment (the repository already contains .venv/).
Install the requirements:

pip install -r requirements.txt

Usage

python pdf_to_text.py INPUT_PDF [-o OUTPUT_TXT] [OPTIONS]

Examples

Convert an entire PDF:

python pdf_to_text.py documents/manual.pdf

Extract a subsection without overwriting an existing file:

python pdf_to_text.py big-output.pdf --page-range 50-150 \
    --output extracted.txt --overwrite

Append additional content to an existing transcript if you are processing PDFs in chunks. The CLI will show a lightweight animation while it works:

python pdf_to_text.py another-chunk.pdf --append --output extracted.txt

Helpful flags

--page-range: specify start-end to control the page window (e.g., 10- for everything after page 10).
--encoding: control the output text encoding (default utf-8).
--char-margin, --line-margin, --word-margin, --boxes-flow, --detect-vertical: customize pdfminer.six layout heuristics when dealing with complex columns or rotated text.
--quiet / --log-level: mute or raise logging verbosity.
--no-spinner: disable the CLI animation (it is automatically muted when --quiet is used).

Testing

Run the CLI with --help to verify the script starts without errors:

python pdf_to_text.py --help

For more thorough testing you can write automated tests that call convert_pdf_to_text with a short PDF fixture (e.g., created by fpdf or reportlab).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
pdf_to_text.py		pdf_to_text.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF to Text

Installation

Usage

Examples

Helpful flags

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

LiteObject/pdf-to-text

Folders and files

Latest commit

History

Repository files navigation

PDF to Text

Installation

Usage

Examples

Helpful flags

Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages