This repository has been archived by the owner. It is now read-only.

chinese charset not show right #10164

Closed
ariya opened this Issue Jul 8, 2011 · 32 comments

Comments

Projects
None yet
@ariya
Owner

ariya commented Jul 8, 2011

coley...@gmail.com commented:

Which version of PhantomJS are you using?
1.2

What steps will reproduce the problem?
1.phanthomjs examples/render_multi_url.js www.sohu.com
2.see generated www.sohu.com.cn.png
3.can't not render chinese charset

What is the expected output? What do you see instead?
render chinese charset right
Which operating system are you using?
windows xp sp3

Did you use binary PhantomJS or did you compile it from source?
1.2

Please provide any additional information below.

Disclaimer:
This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #164.
🌟   10 people had starred this issue at the time of migration.

@ariya

This comment has been minimized.

Show comment
Hide comment
@ariya

ariya Jul 9, 2011

Owner

roejame...@gmail.com commented:

If you go to the site in a regular browser can you see the charset right?

Owner

ariya commented Jul 9, 2011

roejame...@gmail.com commented:

If you go to the site in a regular browser can you see the charset right?

@aportale

This comment has been minimized.

Show comment
Hide comment
@aportale

aportale Jul 10, 2011

Contributor

alessandro.portale@gmail.com commented:

I assume you have Chinese fonts installed on Windows. What do you see instead of the Chinese text? Are there just square boxes? Ideally, please attach your www.sohu.com.cn.png to this task.

Contributor

aportale commented Jul 10, 2011

alessandro.portale@gmail.com commented:

I assume you have Chinese fonts installed on Windows. What do you see instead of the Chinese text? Are there just square boxes? Ideally, please attach your www.sohu.com.cn.png to this task.

@ariya

This comment has been minimized.

Show comment
Hide comment
@ariya

ariya Aug 25, 2011

Owner

ariya.hi...@gmail.com commented:

Any further updates on this? Any screenshot to help us troubleshooting the issue?

Owner

ariya commented Aug 25, 2011

ariya.hi...@gmail.com commented:

Any further updates on this? Any screenshot to help us troubleshooting the issue?

@otakustay

This comment has been minimized.

Show comment
Hide comment
@otakustay

otakustay Sep 1, 2011

otakus...@gmail.com commented:

I would provide a screenshot to show the Chinese text problem.
When snapshoting this image, I was using Windows 7 with system local Chinese(Simplified), pretty sure that system's encoding is GB2312, and also this webpage's encoding is GB2312 too(defined using tag, not HTTP header Content-Type field).
the url is: http://www.tiexue.net

otakustay commented Sep 1, 2011

otakus...@gmail.com commented:

I would provide a screenshot to show the Chinese text problem.
When snapshoting this image, I was using Windows 7 with system local Chinese(Simplified), pretty sure that system's encoding is GB2312, and also this webpage's encoding is GB2312 too(defined using tag, not HTTP header Content-Type field).
the url is: http://www.tiexue.net

@masdude

This comment has been minimized.

Show comment
Hide comment
@masdude

masdude Dec 1, 2011

bigdude6...@gmail.com commented:

I have this problem too.
In a regular browser http://www.baidu.com renders well, but in phantomjs I got this:(as the image shows below)

baidu.com is encoded as gb2312, but phantomjs renders chinese site which encoded as utf-8 very well. It could be a problem when phantomjs dealing with none utf-8 ecoded sites.

masdude commented Dec 1, 2011

bigdude6...@gmail.com commented:

I have this problem too.
In a regular browser http://www.baidu.com renders well, but in phantomjs I got this:(as the image shows below)

baidu.com is encoded as gb2312, but phantomjs renders chinese site which encoded as utf-8 very well. It could be a problem when phantomjs dealing with none utf-8 ecoded sites.

@firedfox

This comment has been minimized.

Show comment
Hide comment
@firedfox

firedfox Dec 13, 2011

Contributor

wangyang...@gmail.com commented:

i'm running phantomjs 1.4.0 on ubuntu 11.10 and i got gb2312 web pages rendered correctly with following simple demo code:

var page = require('webpage').create();
page.onLoadFinished = function() {
page.render('gb2312.png');
phantom.exit();
}
page.open('http://www.baidu.com/');

maybe you can upgrade to 1.4.0 and try again.

Contributor

firedfox commented Dec 13, 2011

wangyang...@gmail.com commented:

i'm running phantomjs 1.4.0 on ubuntu 11.10 and i got gb2312 web pages rendered correctly with following simple demo code:

var page = require('webpage').create();
page.onLoadFinished = function() {
page.render('gb2312.png');
phantom.exit();
}
page.open('http://www.baidu.com/');

maybe you can upgrade to 1.4.0 and try again.

@ariya

This comment has been minimized.

Show comment
Hide comment
@ariya

ariya Apr 13, 2012

Owner

bla...@layar.com commented:

I'm using phantomjs 1.5 / Mac OS X 10.6.8.

UTF-8 encoded traditional Chinese is not working (characters do not show or take space), but simplified Chinese is. In my regular browser, both traditional and simplified works.

Baidu shows the same as the attachment in Comment 6.

Attached is rendering of http://zh.wikipedia.org/wiki/%E4%BD%A0%E5%A5%BD%EF%BC%8C%E5%87%AF%E5%85%B0

Owner

ariya commented Apr 13, 2012

bla...@layar.com commented:

I'm using phantomjs 1.5 / Mac OS X 10.6.8.

UTF-8 encoded traditional Chinese is not working (characters do not show or take space), but simplified Chinese is. In my regular browser, both traditional and simplified works.

Baidu shows the same as the attachment in Comment 6.

Attached is rendering of http://zh.wikipedia.org/wiki/%E4%BD%A0%E5%A5%BD%EF%BC%8C%E5%87%AF%E5%85%B0

@mashihua

This comment has been minimized.

Show comment
Hide comment
@mashihua

mashihua Jul 23, 2012

mashi...@gmail.com commented:

I'm using phantomjs 1.6.1 / Mac OS X 10.7.4

I have this problem. baidu.com is encoded as gb2312, and its font-family is 'arial'. regular browser display this site correct。but with following demo code, it dipelay '°Ù¶Èһϣ¬Äã¾ÍÖªµÀ'.

require('webpage').create().open "http://www.baidu.com", (status) ->
if status isnt 'success'
console.log "Unable to open"
else
title = page.evaluate ->
document.body.bgColor = 'white'
document.title
console.log title
fs.write "test.txt", title, 'w'
page.render "www.baidu.com.png"
phantom.exit()

mashihua commented Jul 23, 2012

mashi...@gmail.com commented:

I'm using phantomjs 1.6.1 / Mac OS X 10.7.4

I have this problem. baidu.com is encoded as gb2312, and its font-family is 'arial'. regular browser display this site correct。but with following demo code, it dipelay '°Ù¶Èһϣ¬Äã¾ÍÖªµÀ'.

require('webpage').create().open "http://www.baidu.com", (status) ->
if status isnt 'success'
console.log "Unable to open"
else
title = page.evaluate ->
document.body.bgColor = 'white'
document.title
console.log title
fs.write "test.txt", title, 'w'
page.render "www.baidu.com.png"
phantom.exit()

@mashihua

This comment has been minimized.

Show comment
Hide comment
@mashihua

mashihua Jul 24, 2012

mashi...@gmail.com commented:

I am googling with this issuse. find that do not use static Mac build, compile it form source ,the problem is gone. https://gist.github.com/1260734

mashihua commented Jul 24, 2012

mashi...@gmail.com commented:

I am googling with this issuse. find that do not use static Mac build, compile it form source ,the problem is gone. https://gist.github.com/1260734

@ariya

This comment has been minimized.

Show comment
Hide comment
@ariya

ariya Aug 16, 2012

Owner

bla...@layar.com commented:

I had no success with my own compiled build.

Mac OS X 10.6.8
phantomjs 1.6.2

Owner

ariya commented Aug 16, 2012

bla...@layar.com commented:

I had no success with my own compiled build.

Mac OS X 10.6.8
phantomjs 1.6.2

@ariya

This comment has been minimized.

Show comment
Hide comment
@ariya

ariya Sep 9, 2012

Owner

astonia....@gmail.com commented:

It's weird, but I found a workaround to solve the Chinese gb2312 encoding problem.
As the example for www.baidu.com, the output encoding is utf-8 encoded from iso8859-1 with default encoding settings. That's weird, right? phantomjs does not recognize baidu.com returns gb2312, but it treats the return as iso8859-1.
So you can convert the output from utf-8 to iso8859-1, and you will get the gb2312 encoded result.
Use the commands:
iconv -f utf-8 -t iso8859-1 output > out.gb2312
iconv -f gb2312 -t utf-8 out.gb2312

The attachment is my test program, ugly, sorry.
Run it by
$ phantomjs test.js http://www.baidu.com baidu.html
and then
$ iconv -f utf-8 -t iso8859-1 baidu.html > baidu.gb2312
now baidu.gb2312 is properly encoded.
For me, I'll convert it to utf-8
$ iconv -f gb2312 -t utf-8 baidu.gb2312 > baidu.utf-8

Owner

ariya commented Sep 9, 2012

astonia....@gmail.com commented:

It's weird, but I found a workaround to solve the Chinese gb2312 encoding problem.
As the example for www.baidu.com, the output encoding is utf-8 encoded from iso8859-1 with default encoding settings. That's weird, right? phantomjs does not recognize baidu.com returns gb2312, but it treats the return as iso8859-1.
So you can convert the output from utf-8 to iso8859-1, and you will get the gb2312 encoded result.
Use the commands:
iconv -f utf-8 -t iso8859-1 output > out.gb2312
iconv -f gb2312 -t utf-8 out.gb2312

The attachment is my test program, ugly, sorry.
Run it by
$ phantomjs test.js http://www.baidu.com baidu.html
and then
$ iconv -f utf-8 -t iso8859-1 baidu.html > baidu.gb2312
now baidu.gb2312 is properly encoded.
For me, I'll convert it to utf-8
$ iconv -f gb2312 -t utf-8 baidu.gb2312 > baidu.utf-8

@wisesimpson

This comment has been minimized.

Show comment
Hide comment
@wisesimpson

wisesimpson Dec 26, 2012

Wise.Sim...@gmail.com commented:

I had the same issue

Mac OS X 10.8.2
phantomjs 1.8.0

here is my screenshot for baidu.com

wisesimpson commented Dec 26, 2012

Wise.Sim...@gmail.com commented:

I had the same issue

Mac OS X 10.8.2
phantomjs 1.8.0

here is my screenshot for baidu.com

@wisesimpson

This comment has been minimized.

Show comment
Hide comment
@wisesimpson

wisesimpson Dec 26, 2012

Wise.Sim...@gmail.com commented:

I'm wondering whether there is way to set the page encoding? I read the docs and didn't find any setting about the page encoding?

wisesimpson commented Dec 26, 2012

Wise.Sim...@gmail.com commented:

I'm wondering whether there is way to set the page encoding? I read the docs and didn't find any setting about the page encoding?

@greyglay

This comment has been minimized.

Show comment
Hide comment
@greyglay

greyglay Jan 8, 2013

cheryl4...@gmail.com commented:

Hi All,
I solved it. you should install all chinese fonts(includes X11 fonts)
yum install font

greyglay commented Jan 8, 2013

cheryl4...@gmail.com commented:

Hi All,
I solved it. you should install all chinese fonts(includes X11 fonts)
yum install font

@nikescar

This comment has been minimized.

Show comment
Hide comment
@nikescar

nikescar Mar 16, 2013

nikes...@gmail.com commented:

astonia's solution #12 is good one.
it did solve my problem.

nikescar commented Mar 16, 2013

nikes...@gmail.com commented:

astonia's solution #12 is good one.
it did solve my problem.

@shepherdwind

This comment has been minimized.

Show comment
Hide comment
@shepherdwind

shepherdwind May 30, 2013

I have the same problem. The binary package provided by phantormjs.org not work when the page charset is gbk. Then, I build it myself, followed out the instructions of http://phantomjs.org/build.html.

Mac OS X 10.8.3
phantomjs 1.9.0

shepherdwind commented May 30, 2013

I have the same problem. The binary package provided by phantormjs.org not work when the page charset is gbk. Then, I build it myself, followed out the instructions of http://phantomjs.org/build.html.

Mac OS X 10.8.3
phantomjs 1.9.0

@airclear

This comment has been minimized.

Show comment
Hide comment
@airclear

airclear Nov 27, 2013

when the page does not contain a Content-Type ;

phantomjs use the iso8859-1 default,

so when the page contain the chinese word ,it will display wrong.

Can I modify the Content-Type in the js code ??

airclear commented Nov 27, 2013

when the page does not contain a Content-Type ;

phantomjs use the iso8859-1 default,

so when the page contain the chinese word ,it will display wrong.

Can I modify the Content-Type in the js code ??

@tpneumat

This comment has been minimized.

Show comment
Hide comment
@tpneumat

tpneumat Dec 22, 2013

This fixed it for me:
yum install fonts-japanese fonts-chinese fonts-korean

Also, make sure to output headers as 'Content-Type: text/html; charset=utf-8'

tpneumat commented Dec 22, 2013

This fixed it for me:
yum install fonts-japanese fonts-chinese fonts-korean

Also, make sure to output headers as 'Content-Type: text/html; charset=utf-8'

@mozbugbox

This comment has been minimized.

Show comment
Hide comment
@mozbugbox

mozbugbox May 17, 2014

We just need a function page.set_charset(charset_str) to force page render in a certain charset value.

mozbugbox commented May 17, 2014

We just need a function page.set_charset(charset_str) to force page render in a certain charset value.

@mozbugbox

This comment has been minimized.

Show comment
Hide comment
@mozbugbox

mozbugbox May 17, 2014

OK, it seems that if font family is set in the page, it is not substituted with system font if not found. In this case, boxes will be displayed in place of Chinese text.

For www.baidu.com, 宋体 (song) is specified in the page, hence failed to display if not song family font in system. BTW, baidu.com use UTF-8 encoding now.

On Debian box, the package fonts-arphic-uming need to be installed to render the baidu.com properly with phantomjs.

mozbugbox commented May 17, 2014

OK, it seems that if font family is set in the page, it is not substituted with system font if not found. In this case, boxes will be displayed in place of Chinese text.

For www.baidu.com, 宋体 (song) is specified in the page, hence failed to display if not song family font in system. BTW, baidu.com use UTF-8 encoding now.

On Debian box, the package fonts-arphic-uming need to be installed to render the baidu.com properly with phantomjs.

@quentindemetz

This comment has been minimized.

Show comment
Hide comment
@quentindemetz

quentindemetz Jul 3, 2014

On Debian box, the package fonts-arphic-uming need to be installed to render the baidu.com properly with phantomjs.

👍
On Arch that's ttf-arphic-uming

quentindemetz commented Jul 3, 2014

On Debian box, the package fonts-arphic-uming need to be installed to render the baidu.com properly with phantomjs.

👍
On Arch that's ttf-arphic-uming

@zackw

This comment has been minimized.

Show comment
Hide comment
@zackw

zackw Apr 19, 2015

Collaborator

PhantomJS 2.0 includes a major update to Webkit that may well have solved this problem, and I cannot reproduce it (on Debian) with any of the suggested test URLs. It's possible that there is still a problem with some combination of character encodings and fonts, but I'm going to go ahead and close this issue. If you are still having this problem, please file a new issue and be specific about what website, what characters, and what fonts are involved.

Collaborator

zackw commented Apr 19, 2015

PhantomJS 2.0 includes a major update to Webkit that may well have solved this problem, and I cannot reproduce it (on Debian) with any of the suggested test URLs. It's possible that there is still a problem with some combination of character encodings and fonts, but I'm going to go ahead and close this issue. If you are still having this problem, please file a new issue and be specific about what website, what characters, and what fonts are involved.

@zackw zackw closed this Apr 19, 2015

@zackw zackw added Bug 1.x and removed old.Priority-Medium labels Apr 19, 2015

@vaneri

This comment has been minimized.

Show comment
Hide comment
@vaneri

vaneri Dec 17, 2015

Had a similar issue on Ubuntu; installing the Chinese fonts such as described in this link, helped me out
https://en.wikipedia.org/wiki/Help:Multilingual_support_(East_Asian)

sudo apt-get install fonts-arphic-ukai fonts-arphic-uming fonts-ipafont-mincho fonts-ipafont-gothic fonts-unfonts-core

vaneri commented Dec 17, 2015

Had a similar issue on Ubuntu; installing the Chinese fonts such as described in this link, helped me out
https://en.wikipedia.org/wiki/Help:Multilingual_support_(East_Asian)

sudo apt-get install fonts-arphic-ukai fonts-arphic-uming fonts-ipafont-mincho fonts-ipafont-gothic fonts-unfonts-core

@zackw

This comment has been minimized.

Show comment
Hide comment
@zackw

zackw Dec 17, 2015

Collaborator

@vaneri Yes, PhantomJS doesn't bundle any fonts.

Collaborator

zackw commented Dec 17, 2015

@vaneri Yes, PhantomJS doesn't bundle any fonts.

@vaneri

This comment has been minimized.

Show comment
Hide comment
@vaneri

vaneri Dec 17, 2015

Is there a way to bundle font in phantomjs ?

vaneri commented Dec 17, 2015

Is there a way to bundle font in phantomjs ?

@zackw

This comment has been minimized.

Show comment
Hide comment
@zackw

zackw Dec 17, 2015

Collaborator

@vaneri We don't have any way to do that right now. I think it's possible, but it would be a big chunk of new code to write. I don't think any of us has time to do it anytime soon, sorry.

Collaborator

zackw commented Dec 17, 2015

@vaneri We don't have any way to do that right now. I think it's possible, but it would be a big chunk of new code to write. I don't think any of us has time to do it anytime soon, sorry.

@vaneri

This comment has been minimized.

Show comment
Hide comment
@vaneri

vaneri Dec 17, 2015

thanks for the info @zackw ;)

vaneri commented Dec 17, 2015

thanks for the info @zackw ;)

@Vitallium

This comment has been minimized.

Show comment
Hide comment
@Vitallium

Vitallium Dec 17, 2015

Collaborator

I think we can extend our qpa plugin with this feature (or by this http://doc.qt.io/qt-5/qfontdatabase.html#addApplicationFont). i.e. add a custom option to specify font folders from where PhantomJS should load fonts. But this is just my guess.

Collaborator

Vitallium commented Dec 17, 2015

I think we can extend our qpa plugin with this feature (or by this http://doc.qt.io/qt-5/qfontdatabase.html#addApplicationFont). i.e. add a custom option to specify font folders from where PhantomJS should load fonts. But this is just my guess.

@alex88

This comment has been minimized.

Show comment
Hide comment
@alex88

alex88 Aug 12, 2016

I've the same issue, no system chinese fonts, but even when i import some custom fonts including chinese glyphs it doesn't work.
E.g. I import 2 fonts from typekit, regular alphabet works fine, chinese doesn't

alex88 commented Aug 12, 2016

I've the same issue, no system chinese fonts, but even when i import some custom fonts including chinese glyphs it doesn't work.
E.g. I import 2 fonts from typekit, regular alphabet works fine, chinese doesn't

@koodalingam

This comment has been minimized.

Show comment
Hide comment
@koodalingam

koodalingam Sep 21, 2017

Hi, While being export to pdf using phantomJS, some of the Chinese characters are missing. Can you help on this?

koodalingam commented Sep 21, 2017

Hi, While being export to pdf using phantomJS, some of the Chinese characters are missing. Can you help on this?

@vaneri

This comment has been minimized.

Show comment
Hide comment
@vaneri

vaneri Sep 21, 2017

vaneri commented Sep 21, 2017

@koodalingam

This comment has been minimized.

Show comment
Hide comment
@koodalingam

koodalingam Sep 21, 2017

Thanks for the reply. Example, the HTML content has

Simple 中国是东亚人口众多的国家,其广阔的景观包括草原。

Its output was Simple 中国是东 人口众多的国家,其广阔的景观包括草. Here is replaced by space.

Required fonts are installed on my OS as below,

/usr/share/fonts/chinese/TrueType/fireflysung.ttf: AR PL New Sung,文鼎PL新宋:style=Regular
/usr/share/fonts/chinese/TrueType/zysong.ttf: ZYSong18030,中易宋体18030:style=regular

koodalingam commented Sep 21, 2017

Thanks for the reply. Example, the HTML content has

Simple 中国是东亚人口众多的国家,其广阔的景观包括草原。

Its output was Simple 中国是东 人口众多的国家,其广阔的景观包括草. Here is replaced by space.

Required fonts are installed on my OS as below,

/usr/share/fonts/chinese/TrueType/fireflysung.ttf: AR PL New Sung,文鼎PL新宋:style=Regular
/usr/share/fonts/chinese/TrueType/zysong.ttf: ZYSong18030,中易宋体18030:style=regular

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.