-
-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdf2image outputs 1x1 blank image #34
Comments
As pdf2image is only a thin wrapper around pdftoppm, I would try directly from the CLI. Something like If you still get a 1x1 pixel then the problem is on their side and I can't help much. It would also help if you could provide the exact call you do on |
@Belval, I figured this is on the pdftoppm side, I am already investigating. When a run pdftoppm I get: Bogus memory allocation size The output is a 1x1 blank image. Thanks for the help |
Just to share the solution: the mediaBox of my pdf was huge and it was eating up all the memory available rendering a lot of blank space. Turns out that the cropBox of the pdf was correct, so converting using the cropBox ( It would be nice to have an option to use the crop box instead, like:
Cheers |
Can you try with support-cropbox and see if it fixes your issue? I will upload the new package to PyPi tonight if it does. |
I just tested. It works fine. Thanks!! |
Pull request merged: #35 Package uploaded: https://pypi.org/project/pdf2image/ Be aware that PyPi caches packages so it can take a few minutes until it is available. |
As this was fixed in a previous version, I am closing it. |
Hi I've run into the same issue. Running with pdftoppm, without the wrapper, does not have the same issue, the jpeg gets correctly generated. When opening the image afterwards with PIL, I get a DecompressionBombError. So it might be the case that the image is simply too big to be processed by pdf2image. |
I was unable to reproduce the issue described with the linked file. Using pdftoppm version 0.62.0 the output image seems to correspond to the PDF. Here is my code snippet: from pdf2image import convert_from_path
convert_from_path("test.pdf", size=(3000,))[0].save("out.png") I changed the name of your PDF for reading purposes, I set a size to avoid the Pillow decompression bomb check. Could you provide your pdftoppm version with |
Hi, I didn't set any size, so the issue is indeed that the image is too big. Would it be possible to have convert_from_path raise an error, explaining that the size is too big and that it can be avoided by setting the size parameter, instead of simply silently returning a white pixel? |
That's the thing, the underlying library that parses the image file should and does raise an exception on the PDF you linked: >>> from pdf2image import convert_from_path
>>> convert_from_path("test.pdf")[0].save("out.png")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/.local/lib/python3.6/site-packages/pdf2image/pdf2image.py", line 202, in convert_from_path
images += parse_buffer_func(data)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pdf2image/parsers.py", line 21, in parse_buffer_to_ppm
images.append(Image.open(BytesIO(data[index : index + file_size])))
File "/home/ubuntu/.local/lib/python3.6/site-packages/PIL/Image.py", line 2881, in open
im = _open_core(fp, filename, prefix)
File "/home/ubuntu/.local/lib/python3.6/site-packages/PIL/Image.py", line 2868, in _open_core
_decompression_bomb_check(im.size)
File "/home/ubuntu/.local/lib/python3.6/site-packages/PIL/Image.py", line 2793, in _decompression_bomb_check
"could be decompression bomb DOS attack." % (pixels, 2 * MAX_IMAGE_PIXELS)
PIL.Image.DecompressionBombError: Image size (382369975 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack. Do you override de Pillow limit somewhere else in your code? |
The issue is not that pdf2image returns an image that is too big, but that it returns one single white pixel, without any warnings. |
I understand that, but I am simply to reproduce the issue, if I disable Pillow's warning, the saved image is correctly rendered. In other word, I am unable to get the 1x1 white pixel output you describe. |
same issue here, Help. |
seems use_cropbox=False, refer to https://pdf2image.readthedocs.io/en/latest/reference.html |
Describe the bug
For some pdf files,
convert_from_path, convert_from_bytes
outputs a blank 1x1 PIL image. Interestingly for very similar pdfs it works fine. The documents are mostly one very long page pdfs. Any ideais?To Reproduce
Steps to reproduce the behavior:
Expected behavior
I would expect to see a normal PIL image, as happened to other similar pdfs.
Screenshots
Output:
[<PIL.PpmImagePlugin.PpmImageFile image mode=RGB size=1x1 at 0x7FCC6C3FA4A8>]
. The number of pages is correct, the pdf is just one very long page.Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: