Skip to content

pdf2html is a lightweight command-line utility that batch-converts PDF files into structured, readable HTML using Python. Ideal for archiving, web conversion, and text extraction pipelines.

License

Notifications You must be signed in to change notification settings

Synaptechlabs/pdf2html

Repository files navigation

pdf2html

CI codecov License: MIT PyPI version Docker PyPI version

Convert PDF files to simple, readable HTML using a command-line tool.

Features

  • Converts single PDFs or entire folders
  • Retains original filenames
  • Simple, semantic HTML output
  • CLI-friendly and pip-installable

Installation

Option 1: Local install

pip install .

Option 2: pipx (recommended)

pipx install path/to/pdf2html/

Usage

Convert a single file:

pdf2html path/to/file.pdf -o output_folder

Convert all PDFs in a folder:

pdf2html path/to/folder -o output_folder

Requirements

  • Python 3.8+
  • pdfminer.six
  • beautifulsoup4

License

MIT

About

pdf2html is a lightweight command-line utility that batch-converts PDF files into structured, readable HTML using Python. Ideal for archiving, web conversion, and text extraction pipelines.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published