Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cheerio Crawler / httpRequest encoding problems #524

Closed
AndreyBykov opened this issue Dec 3, 2019 · 2 comments
Closed

Cheerio Crawler / httpRequest encoding problems #524

AndreyBykov opened this issue Dec 3, 2019 · 2 comments

Comments

@AndreyBykov
Copy link
Member

Could not get proper encoding for response body for Cyrillic chars. The website is in Bulgarian, and when using the browser - everything's perfect. But when using the cheerio scraper (httpRequest/got) - html is not encoded properly.

Here's the example page: https://www.imot.bg/pcgi/imot.cgi?act=5&adv=2b157484078874523&slink=51kk4i&f1=1

Tried using iconv, return the buffer and convert using various encodings but with no luck.

Any ideas how could be actually handle this? Despite this is an edge case obviously, it seems that it could happen from time to time.

@mnmkng
Copy link
Member

mnmkng commented Apr 15, 2020

Latest version of CheerioCrawler supports fallback encodings, would it work if you set suggestResponseEncoding: 'windows-1251'?

@mnmkng
Copy link
Member

mnmkng commented Apr 15, 2020

Seems to be working. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants