Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on lossless compression #69

Open
ken-huston opened this issue Jun 28, 2020 · 3 comments
Open

Error on lossless compression #69

ken-huston opened this issue Jun 28, 2020 · 3 comments

Comments

@ken-huston
Copy link

ken-huston commented Jun 28, 2020

Hi,

With lossy compression I get fantastic results (more than 10 times reduction in size in a pdf made from .jpg images). Although reduction in quality is bearable, as I get so much reduction I wanted to try the lossless compression to compare the results.

From other issues I read that this is done without the -s option, but if I do that, I get this error:

Processing "pages-000.jpg"...
source image: 708 x 1121 (8 bits) 0dpi x 0dpi, refcount = 1
thresholded image: 708 x 1121 (1 bits) 0dpi x 0dpi, refcount = 1
0�a&��a��������j��QD�ŭd�Z,��q�f4i�dDY�4^ȟ!�X�؂�ub0~�~���5����k�5�Q �dK�'�4�m.��;�g�hm���F��m�&
             �*(���:S�Pq�M�����L,�#�ex�D.�/��u�
                                               \�}*�YvCBO��
                                                           �P��n�
                                                                 �p��ăUAuDZ�TLX&�:p���'�V4w�j%z�hu+��S�~�-�@iȅ
                              v���ye���1_L�����X+���]�Ȓ�$�����-^��g�!!pB����L��A��$�^��]��2^S K�q�4��A�d
                        �:��D����v}��ZY%_���,(-�EӶ��M̸��1�F�`��cV�Ț�=+h�:k��kM�
                                                                              ,1͖��$a����/usr/local/bin/pdf.py: symbol table output.sym not found!��
Usage: /usr/local/bin/pdf.py [file_basename] > out.pdf

I do not now how to deal with it. Any help will be very much appreciated.

@DingoDog
Copy link

DingoDog commented Jul 2, 2020

Solution was found already in 2016 by klivens
#24 (comment)

I use a sort of one-liner that does the same task, but without requiring modifications of pdf.py

@ken-huston
Copy link
Author

Thanks for the response Dingo, it works.

I'm trying to compress a pdf composed of JPG text images. I extracted them with pdfimages and used the jbig2 compression. With the lossless option I can reduce the size of the pdf from 40 to 2,5 MB, with lossless to 5,5 MB. But I see almost no difference in quality between the two outputs (both reduce quality of the original pdf).

Am I doing something wrong, or these results are what is expected?

Thanks again.

@joshuakraemer
Copy link

I use a sort of one-liner that does the same task, but without requiring modifications of pdf.py

@DingoDog, would you please be so kind to share your one-line solution?

I'm trying to compress a pdf composed of JPG text images. I extracted them with pdfimages and used the jbig2 compression. With the lossless option I can reduce the size of the pdf from 40 to 2,5 MB, with lossless to 5,5 MB. But I see almost no difference in quality between the two outputs (both reduce quality of the original pdf).

Am I doing something wrong, or these results are what is expected?

@ken-huston, lossy and lossless compressions are expected to look similar, because the lossy jbig compression is good at preserving visual quality. You should be aware though that the lossy compression can also lead to letter substitutions (see https://en.wikipedia.org/wiki/JBIG2#Disadvantages). Always check your results when using lossy compression.

I assume your sources are grayscale or colored jpg files. You might be able to improve final quality and file size by using a different program to convert the jpg files to black and white image files first. I often use ImageMagick with OTSU binarization, e.g.:

magick in.jpg -auto-threshold OTSU out.pbm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants