pdf2any

PDF to DOCX, HTML, and Markdown converter — extract text, tables, and images from PDFs.

Features

Convert PDF to DOCX (Word documents with full formatting)
Convert PDF to HTML (preserves layout, tables and images)
Convert PDF to Markdown (clean, readable text with tables)
Preserve document structure: paragraphs, tables, images, text styling
Extract tables from PDFs
Multi-processing support for large documents
Command-line and Python API interfaces

Installation

pip install pdf2any

Quick Start

Command Line

# Convert PDF to DOCX
pdf2any convert input.pdf output.docx

# Convert PDF to HTML
pdf2any convert-html input.pdf output.html

# Convert PDF to Markdown (no page breaks)
pdf2any convert-md input.pdf output.md --nopage_break

# Convert specific pages
pdf2any convert input.pdf output.docx --pages=1,3,5

Python API

from pdf2any import Converter

# Convert to DOCX
cv = Converter("input.pdf")
cv.convert("output.docx")

# Convert to HTML (no page breaks)
cv.convert_html("output.html", page_break=False)

# Convert to Markdown
cv.convert_md("output.md", page_break=False)

# Extract tables
tables = cv.extract_tables()
cv.close()

Key Options

Option	Description	Default
`--pages`	Specific pages to convert (e.g. `1,3,5`)	All
`--nopage_break`	Remove page separators in output	`False`
`--remove_header_footer`	Remove headers and footers	`False`
`--multi_processing`	Enable parallel processing	`False`

Documentation

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 902 Commits
.github		.github
docs		docs
pdf2any		pdf2any
test		test
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2any

Features

Installation

Quick Start

Command Line

Python API

Key Options

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf2any

Features

Installation

Quick Start

Command Line

Python API

Key Options

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages