PDF to DOCX, HTML, and Markdown converter — extract text, tables, and images from PDFs.
- Convert PDF to DOCX (Word documents with full formatting)
- Convert PDF to HTML (preserves layout, tables and images)
- Convert PDF to Markdown (clean, readable text with tables)
- Preserve document structure: paragraphs, tables, images, text styling
- Extract tables from PDFs
- Multi-processing support for large documents
- Command-line and Python API interfaces
pip install pdf2any# Convert PDF to DOCX
pdf2any convert input.pdf output.docx
# Convert PDF to HTML
pdf2any convert-html input.pdf output.html
# Convert PDF to Markdown (no page breaks)
pdf2any convert-md input.pdf output.md --nopage_break
# Convert specific pages
pdf2any convert input.pdf output.docx --pages=1,3,5from pdf2any import Converter
# Convert to DOCX
cv = Converter("input.pdf")
cv.convert("output.docx")
# Convert to HTML (no page breaks)
cv.convert_html("output.html", page_break=False)
# Convert to Markdown
cv.convert_md("output.md", page_break=False)
# Extract tables
tables = cv.extract_tables()
cv.close()| Option | Description | Default |
|---|---|---|
--pages |
Specific pages to convert (e.g. 1,3,5) |
All |
--nopage_break |
Remove page separators in output | False |
--remove_header_footer |
Remove headers and footers | False |
--multi_processing |
Enable parallel processing | False |
MIT License — see LICENSE for details.