Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 error while uploading file #46

Closed
jhf2442 opened this issue Jul 22, 2020 · 5 comments
Closed

UTF-8 error while uploading file #46

jhf2442 opened this issue Jul 22, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@jhf2442
Copy link

jhf2442 commented Jul 22, 2020

Docker image downloaded and started 5 min ago, therefore latest/greatest

while uploading a 180kB, 20-page PDF document (that opens perfectly in okular)

papermerge_service | upload for f=2020-07-17_AGB_A02092019.pdf user=admin
papermerge_service | Internal Server Error: /upload/
papermerge_service | Traceback (most recent call last):
papermerge_service |   File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 34, in inner
papermerge_service |     response = get_response(request)
papermerge_service |   File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 115, in _get_response
papermerge_service |     response = self.process_exception_by_middleware(e, request)
papermerge_service |   File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 113, in _get_response
papermerge_service |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
papermerge_service |   File "/usr/local/lib/python3.7/dist-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
papermerge_service |     return view_func(request, *args, **kwargs)
papermerge_service |   File "/usr/local/lib/python3.7/dist-packages/django/views/generic/base.py", line 71, in view
papermerge_service |     return self.dispatch(request, *args, **kwargs)
papermerge_service |   File "/usr/local/lib/python3.7/dist-packages/django/views/generic/base.py", line 97, in dispatch
papermerge_service |     return handler(request, *args, **kwargs)
papermerge_service |   File "/opt/papermerge/papermerge/core/views/documents.py", line 373, in post
papermerge_service |     page_count = get_pagecount(f.temporary_file_path())
papermerge_service |   File "/usr/local/lib/python3.7/dist-packages/pmworker/pdfinfo.py", line 63, in get_pagecount
papermerge_service |     lines = compl.stdout.decode('utf-8').split('\n')
papermerge_service | UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 101: invalid start byte
@ciur
Copy link
Owner

ciur commented Jul 23, 2020

Thank you for your feedback.

pdfinfo utility has an unexpected output :(
pdfinfo (part of poppler)- is used internally to figure out number of pages in the document.

Can you, please, run pdfinfo utility on the pdf document 2020-07-17_AGB_A02092019.pdf again and paste here the output?
Example:

eugen@dell-xps:Scans$ pdfinfo brother_003962.pdf
Creator:        Brother Scanner System
Producer:       Brother Scanner System Image Conversion
CreationDate:   Wed Jun 24 12:22:17 2020 CEST
ModDate:        Wed Jun 24 12:22:17 2020 CEST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          3
Encrypted:      no
Page size:      610.56 x 1074.24 pts
Page rot:       0
File size:      876133 bytes
Optimized:      no
PDF version:    1.4

@ciur ciur added the bug Something isn't working label Jul 23, 2020
@jhf2442
Copy link
Author

jhf2442 commented Jul 24, 2020

Here we go :

pdfinfo 2020-07-17_AGB_A02092019.pdf 
Title:          Versicherungsbedingungen
Producer:       M/TEXT CS version 6.7.0.476
CreationDate:   ��
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          20
Encrypted:      yes (print:yes copy:yes change:no addNotes:no algorithm:RC4)
Page size:      595.276 x 841.89 pts (A4)
Page rot:       0
File size:      181193 bytes
Optimized:      no
PDF version:    1.4

-> it's the creation date field that contains some strange data ! (yes it's two diamonds)

@jhf2442
Copy link
Author

jhf2442 commented Jul 24, 2020

here the header of the PDF

%PDF-1.4
%\252\253\254\255
4 0 obj
<<
/Title (^S\325ޥ\\\365\202\276\240^]K@\b\324諪\213"\237^GO\276\303!\262%\335\362Ӓz\\\3443iC\357^RR\256\265'\251x\344K\2260\232)
/Producer (^S\325\336\276\\\277\202\230\240+Kq\b\343\350㪭"\276^G^Z\276\333!\260%\334\362\302\222v\\\3573nC\241^R^C\256\356'\360x\255K\3030ڹ\2474نK)
/CreationDate (^S\325\336\267\\\252\202\376\240^K^[\b\207\350\363\252\335"\337^G^O\276\234!\343%\234\362\205\222.\\\2653+C\261^R^D\256\347'\367x\263K\324)
>>
endobj

@ciur
Copy link
Owner

ciur commented Jul 24, 2020

it's the creation date field that contains some strange data ! (yes it's two diamonds)

I think those two diamonds cause the issue (de: sind schuldig) as they might be encoded in something different than UTF-8 (just guessing).
Does the document contains sensitive information ?
In case it is just random AGB (i.e. no sensitive data) would you send me a copy of it (my email is at the very bottom of readme page)? Otherwise I have no other means of troubleshooting the issue.

@ciur
Copy link
Owner

ciur commented Jul 25, 2020

I received your document and fixed encoding issue.

Fix will be available in 1.4.0 (in about 2 weeks).

Thank you again for providing useful feedback!

@ciur ciur closed this as completed Jul 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants