Open
Description
I was trying to use the exact same example mentioned in here, but it gives blank output, even though I copied the same code, and same PDF file. (Fix is at the bottom of this issue report)
Environment
Debian
$ python -m platform
Linux-6.1.0-12-amd64-x86_64-with-glibc2.36
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.0.1, crypt_provider=('cryptography', '41.0.7'), PIL=10.2.0
Code + PDF
This is a minimal, complete example that shows the issue (same example from documentation):
from pypdf import PdfReader
reader = PdfReader("GeoBase_NHNC1_Data_Model_UML_EN.pdf")
page = reader.pages[3]
parts = []
def visitor_body(text, cm, tm, font_dict, font_size):
y = cm[5]
if y > 50 and y < 720:
parts.append(text)
page.extract_text(visitor_text=visitor_body)
text_body = "".join(parts)
print(text_body)
Fix
Just change cm
to tm
. The selection of height must be from the text matrix, not current matrix.
Here's to the PDF file.