Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue with BeautifulSoup #184

Closed
lunusvir opened this issue Jan 3, 2017 · 1 comment
Closed

Encoding issue with BeautifulSoup #184

lunusvir opened this issue Jan 3, 2017 · 1 comment

Comments

@lunusvir
Copy link

@lunusvir lunusvir commented Jan 3, 2017

Hi Nandaka,

When I was downloading some images (for example, illust_id=59140460), in the function 'getImagePage' BeautifulSoup converted utf-8 encoded content (var 'response') to a mess with HTML entities (var 'parsed'). Likely BeautifulSoup was confused by some characters (have no idea what) and treated the page as other encoding. It only happened on some of my computers, so I am not sure if the problem is repeatable with the illust_id above.

I tried adding 'Accept-Charset: utf-8' to request header and passing the content by
BeautifulSoup(response.decode('utf-8')). Then it ran well. Not sure if that solution is ugly (XD) or not, and looking for a official fix maybe either on your side or on BeautifulSoup's.

Thanks.

Nandaka added a commit that referenced this issue Jan 5, 2017
@Nandaka Nandaka closed this Jan 31, 2017
Nandaka added a commit that referenced this issue Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants