Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'ascii' codec can't decode . . . #42

Closed
quietlyconfident opened this issue Dec 10, 2015 · 7 comments
Closed

UnicodeDecodeError: 'ascii' codec can't decode . . . #42

quietlyconfident opened this issue Dec 10, 2015 · 7 comments

Comments

@quietlyconfident
Copy link

When I try to use python-pdfkit with certain HTML content that has certain characters in it, it fails with one of these errors if the html content is loaded into memory:

File ". . . /pdfkit.py", line 100, in to_pdf
    input = self.source.to_s().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 64: ordinal not in range(128)

or

File ". . ./pdfkit.py", line 102, in to_pdf
    input = self.source.source.read().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 64: ordinal not in range(128)

But, python pdfkit works just fine if it is provided with just a filename, and so does wkhtmltopdf.

I think that python pdfkit is doing something unsafe with strings; perhaps it should assume that the input is just bytes.

python-pdfkit error demo.zip

@debaetsr
Copy link

I have also problems when the source is already in utf-8 (encoding utf-8 to utf-8 gives weird results).

Removing the encode works for me. My HTML source files are in UTF-8, as we have many accents in Belgium.

I assume it's the programmers job to ensure correct encoding before calling the library, so he can be in complete control what to do if unsupported characters occur.

Regards,
Ruben

@patrickyan
Copy link

@debaetsr how did you fix the problem?

@alanhamlett
Copy link
Collaborator

Should be fixed on master branch with #81 and released in the next version.

@gbrowdy
Copy link

gbrowdy commented Jun 5, 2017

The commit you referenced deals with the decoding, whereas the problem stated here (which I am also having) is about the encode function in the to_pdf method.

@alexandrezia
Copy link

I'm also having this issue,
All my html files are already utf-8 as they are in Portuguese language.

@mdesjardins
Copy link

I am having the same problem, with the French language, also utf-8.

JazzCore added a commit that referenced this issue Mar 19, 2021
@JazzCore
Copy link
Owner

Should be fixed now.

Sorry that it took so long for me to get to it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants