Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.html() method returns html with UNICODE when Chinese words are passed #565

Closed
Tjatse opened this issue Sep 22, 2014 · 8 comments
Closed

Comments

@Tjatse
Copy link

Tjatse commented Sep 22, 2014

var cheerio = require('cheerio');
var $ = cheerio.load('<title>文章抓取</title>');
console.log($('title').html());

<=0.15.0 output:

文章抓取 

0.16.0-0.17.0 output:

&#x6587;&#x7AE0;&#x6293;&#x53D6;
@Tjatse
Copy link
Author

Tjatse commented Sep 22, 2014

but

$('title').text()

goes fine, can you guys figure it out? thx a lot.

@fb55
Copy link
Member

fb55 commented Sep 22, 2014

Try passing decodeEntities: false.

@Tjatse
Copy link
Author

Tjatse commented Sep 23, 2014

@fb55
Hi, Felix, I've tried all the combinations of normalizeWhitespace, xmlMode and decodeEntities, still not working, the .html returns decoded UNICODE as usual, but every thing works fine with returning .text() or .attr([ATTR_NAME])

@Tjatse
Copy link
Author

Tjatse commented Sep 23, 2014

more information, i've test on LINUX, MAC and PC, pure node.js environment, the app.js were saved as 'UTF-8' format, cheerio version:
0.15.0(√)
0.16.0(x)
0.17.0(x)

@Tjatse
Copy link
Author

Tjatse commented Nov 4, 2014

Every thing goes fine now, weird, sorry for the bother...

@surmon-china
Copy link

@fb55
thanks! it works!

@MohammedEssehemy
Copy link

@fb55 Thanks, worked like a charm.

@indatawetrust
Copy link

@fb55 thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants