pyxpdf

pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.

Features

Almost x20 times faster than pure python based pdf parsers (see Speed Comparison)
Extract text while maintaining original document layout (best possible)
Support almost all PDF encodings, CMaps and predefined CMaps.
Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
No explict dependencies (except optional ones, see Installation)
Thread Safe

pyxpdf is licensed under the GNU General Public License (GPL), version 2 or 3. See the LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
.github		.github
benchmark		benchmark
build_tools		build_tools
docs		docs
samples		samples
src/pyxpdf		src/pyxpdf
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
.travis.yml		.travis.yml
BUILD.rst		BUILD.rst
CHANGES.rst		CHANGES.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
TODO.md		TODO.md
azure-pipelines.yml		azure-pipelines.yml
get_libxpdf.py		get_libxpdf.py
requirements.txt		requirements.txt
runtests.py		runtests.py
setup.py		setup.py
setupinfo.py		setupinfo.py
test_requirements.txt		test_requirements.txt
valgrind-python.supp		valgrind-python.supp
versioninfo.py		versioninfo.py