Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'ascii' codec can't decode . . . #42

Open
quietlyconfident opened this issue Dec 10, 2015 · 6 comments
Open

UnicodeDecodeError: 'ascii' codec can't decode . . . #42

quietlyconfident opened this issue Dec 10, 2015 · 6 comments

Comments

@quietlyconfident
Copy link

@quietlyconfident quietlyconfident commented Dec 10, 2015

When I try to use python-pdfkit with certain HTML content that has certain characters in it, it fails with one of these errors if the html content is loaded into memory:

File ". . . /pdfkit.py", line 100, in to_pdf
    input = self.source.to_s().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 64: ordinal not in range(128)

or

File ". . ./pdfkit.py", line 102, in to_pdf
    input = self.source.source.read().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 64: ordinal not in range(128)

But, python pdfkit works just fine if it is provided with just a filename, and so does wkhtmltopdf.

I think that python pdfkit is doing something unsafe with strings; perhaps it should assume that the input is just bytes.

python-pdfkit error demo.zip

@debaetsr
Copy link

@debaetsr debaetsr commented Dec 23, 2015

I have also problems when the source is already in utf-8 (encoding utf-8 to utf-8 gives weird results).

Removing the encode works for me. My HTML source files are in UTF-8, as we have many accents in Belgium.

I assume it's the programmers job to ensure correct encoding before calling the library, so he can be in complete control what to do if unsupported characters occur.

Regards,
Ruben

@patrickyan
Copy link

@patrickyan patrickyan commented Apr 29, 2016

@debaetsr how did you fix the problem?

@alanhamlett
Copy link
Collaborator

@alanhamlett alanhamlett commented May 4, 2017

Should be fixed on master branch with #81 and released in the next version.

@gbrowdy
Copy link

@gbrowdy gbrowdy commented Jun 5, 2017

The commit you referenced deals with the decoding, whereas the problem stated here (which I am also having) is about the encode function in the to_pdf method.

@alexandrezia
Copy link

@alexandrezia alexandrezia commented Jun 3, 2018

I'm also having this issue,
All my html files are already utf-8 as they are in Portuguese language.

@mdesjardins
Copy link

@mdesjardins mdesjardins commented Apr 18, 2019

I am having the same problem, with the French language, also utf-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.