Skip to content

py-pdf/benchmarks

Repository files navigation

PDF Library Benchmarks

This benchmark is about reading pure PDF files - notscanned documents and not documents that applied OCR.

Benchmarking machine

Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz

Input Documents

# Name File Size Pages
1 2201.00214 2.4MiB 22
2 GeoTopo-book 5.1MiB 117
3 2201.00151 1.5MiB 12
4 1707.09725 7.0MiB 134
5 2201.00021 2.6MiB 10
6 2201.00037 2.9MiB 33
7 2201.00069 14.7MiB 15
8 2201.00178 2.3MiB 16
9 2201.00201 1.3MiB 9
10 1602.06541 2.9MiB 16
11 2201.00200 284.8KiB 7
12 2201.00022 1.2MiB 14
13 2201.00029 797.6KiB 12
14 1601.03642 1004.9KiB 8

Libraries

Name Last PyPI Release License Version Dependencies
pypdfium2 2024-12-19 Apache-2.0 or BSD-3-Clause 4.30.1 PDFium (Foxit/Google)
pdfminer.six 2025-05-06 MIT/X 20250506
pdfplumber 2025-06-12 MIT 0.11.7 pdfminer.six
pdfrw 2017-09-18 MIT 0.4
pdftotext - GPL 0.86.1 build-essential libpoppler-cpp-dev pkg-config python3-dev
PyMuPDF 2025-06-12 GNU AFFERO GPL 3.0 / Commerical 1.26.1 MuPDF
pypdf 2025-06-29 BSD 3-Clause 5.7.0
Tika 2025-03-26 Apache v2 3.1.0 Apache Tika

Text Extraction Speed

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 PyMuPDF 0.1s 0.4s 0.3s 0.2s 0.2s 0.0s 0.1s 0.0s 0.1s 0.0s 0.1s 0.0s 0.1s 0.0s 0.0s
2 pypdfium2 0.1s 0.5s 0.3s 0.2s 0.2s 0.0s 0.1s 0.0s 0.0s 0.0s 0.1s 0.0s 0.0s 0.0s 0.0s
3 Tika 0.2s 0.8s 0.5s 0.3s 0.3s 0.1s 0.2s 0.1s 0.1s 0.1s 0.1s 0.1s 0.1s 0.0s 0.0s
4 pdftotext 0.3s 0.7s 0.9s 0.2s 0.8s 0.1s 0.3s 0.4s 0.1s 0.1s 0.2s 0.1s 0.1s 0.0s 0.0s
5 pypdf 3.5s 26.2s 6.4s 6.8s 3.3s 0.9s 1.6s 0.6s 0.6s 0.5s 0.8s 0.6s 0.6s 0.5s 0.3s
6 pdfminer.six 5.8s 35.1s 16.6s 10.2s 5.5s 1.5s 2.5s 1.1s 1.6s 1.1s 2.0s 1.5s 1.4s 0.7s 0.6s
7 pdfplumber 9.5s 60.9s 16.6s 17.0s 10.7s 3.1s 5.3s 2.6s 2.5s 2.3s 3.8s 2.5s 2.7s 1.4s 1.3s

Image Extraction Speed

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 PyMuPDF 0.5s 0.3s 0.5s 0.0s 1.6s 0.4s 0.0s 2.9s 0.4s 0.4s 0.1s 0.0s 0.3s 0.2s 0.0s
2 pypdfium2 1.1s 1.2s 1.8s 0.0s 3.3s 0.9s 0.2s 5.1s 0.7s 0.6s 0.4s 0.0s 0.5s 0.2s 0.0s
3 pypdf 4.2s 21.6s 6.1s 5.7s 11.8s 1.3s 0.6s 6.5s 1.2s 1.2s 0.8s 0.2s 0.9s 0.2s 0.2s
4 pdfminer.six 7.4s 43.9s 17.5s 12.7s 15.4s 1.6s 2.5s 1.6s 1.5s 1.0s 1.8s 1.2s 1.3s 0.7s 0.5s

Watermarking Speed

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 pdfrw 0.1s 0.1s 0.5s 0.0s 0.3s 0.1s 0.1s 0.1s 0.1s 0.1s 0.1s 0.0s 0.1s 0.0s 0.0s
2 PyMuPDF 0.2s 0.4s 0.6s 0.2s 0.4s 0.1s 0.1s 0.1s 0.1s 0.1s 0.1s 0.0s 0.1s 0.0s 0.0s
3 pypdf 0.5s 0.6s 2.0s 0.4s 1.1s 0.2s 0.3s 0.3s 0.3s 0.2s 0.3s 0.1s 0.6s 0.1s 0.1s

Watermarking File Size

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 pypdf 3.4MB 2.5MB 5.6MB 1.6MB 7.2MB 2.7MB 3.1MB 15.4MB 2.4MB 1.3MB 3.0MB 0.3MB 1.2MB 0.8MB 1.0MB
2 pdfrw 3.5MB 2.5MB 5.7MB 1.6MB 7.3MB 2.7MB 3.1MB 15.4MB 2.4MB 1.3MB 3.0MB 0.3MB 1.2MB 0.8MB 1.0MB
3 PyMuPDF 3.7MB 2.7MB 6.9MB 1.7MB 8.5MB 2.8MB 3.4MB 15.5MB 2.5MB 1.4MB 3.2MB 0.3MB 1.3MB 0.9MB 1.1MB

Text Extraction Quality

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 pypdfium2 97% 99% 97% 94% 99% 98% 96% 99% 99% 99% 99% 98% 78% 99% 99%
2 pypdf 96% 99% 95% 93% 98% 99% 96% 97% 99% 99% 99% 99% 78% 100% 99%
3 PyMuPDF 96% 98% 96% 93% 97% 98% 95% 99% 98% 98% 98% 97% 77% 98% 99%
4 Tika 95% 99% 98% 92% 97% 98% 96% 93% 97% 98% 93% 98% 73% 98% 96%
5 pdftotext 91% 96% 93% 91% 94% 92% 96% 96% 96% 97% 83% 94% 77% 96% 79%
6 pdfminer.six 89% 95% 79% 86% 92% 86% 93% 95% 93% 92% 92% 93% 71% 98% 86%
7 pdfplumber 75% 94% 84% 68% 97% 61% 93% 61% 89% 57% 59% 67% 58% 98% 67%