New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid unicode decode error with html_parser #147

Merged
merged 1 commit into from Mar 31, 2017

Conversation

Projects
None yet
3 participants
@suned
Contributor

suned commented Mar 28, 2017

Html parser currently assumes UTF8 encoding of html files. BeautifulSoup handles byte streams, and can therefore be trusted to parse html files in strange encodings by giving it a bytestream instead of a text stream, and then decoding using BaseParser.process in the end.

avoid unicode decode error with html_parser
Html parser currently assumes UTF8 encoding of html files. BeautifulSoup handles byte streams, and can therefore be trusted to parse html files in strange encodings by giving it a bytestream instead of a text stream, and then decoding using BaseParser.process in the end.
@coveralls

This comment has been minimized.

coveralls commented Mar 28, 2017

Coverage Status

Coverage remained the same at 90.789% when pulling aa7b126 on suned:patch-1 into e5a046f on deanmalmgren:master.

@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Mar 31, 2017

Nice! Thanks for the PR :)

@deanmalmgren deanmalmgren merged commit 7e2c41c into deanmalmgren:master Mar 31, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment