New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reading_time helper doesn't work with Chinese #9507
Comments
This is something that is not yet implemented, as there is need to find a solution that works for latin languages as well as asian languages such as Chinese. We're happy for PRs here to solve this issue! |
@AileenCGN The comment in the word count utility says
I wonder why the result is different? 🤔 |
I think the client word-count.js is unused. The Markdown editor uses simpleMDE's wordcount function. Any objections to using that implementation in the server wordCount? I can raise a PR for that when I get home tonight. |
Sounds good- should prob remove it from client as well! |
issue TryGhost/Ghost#9507 - Removed unused wordCount utility
closes #9507 - Changed the utils.wordCount implementation to the one used by simpleMDE - Added extra À-ÿ to the regex to support diacritics characters - Added corresponding text with Chinese text mentioned in the issue
Issue Summary
reading_time helper does not work with Chinese.
It may always be "<1 min read"
Steps to Reproduce
2.In Chinese we seldom use spaces to separate words. An Chinese word is consists of characters, we use words to express our thoughts but we write with characters.So we should count characters for Chinese reading time.
For example:
我今天在家吃了好多好多好吃的,现在的我非常开心非常满足。
This Chinese sentence contains 26 characters and 2 punctuations.
But ghost reading_time helper count it as 2 words may be.
So may be an 4000 Chinese Characters article shows "<1 min read" ,but we use about 10min to read actually.
3.Ghost reading helper might accept a parameter to count characters for Chinese.
here's a regex to count Chinese:
var reg = /[\u4e00-\u9fa5]/g;
var str = "js计算一段文本中有多少个汉字?";
console.debug(str.match(reg).length);
As I know there's other language use characters, such as Japanese.
Technical details:
The text was updated successfully, but these errors were encountered: