Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counting words for Asian languages #17323

Closed
atimmer opened this issue Jul 4, 2016 · 115 comments
Closed

Counting words for Asian languages #17323

atimmer opened this issue Jul 4, 2016 · 115 comments

Comments

@atimmer
Copy link
Contributor

atimmer commented Jul 4, 2016

WordPress has build code to be able to count words in Asian languages. We should take this as inspiration. WordPress actually ships the word counting code only in the Chinese language pack, the third link shows where to find it.

Patch for word-count.js
https://core.trac.wordpress.org/ticket/20738
https://core.trac.wordpress.org/ticket/30966

Where to find the language pack:
https://core.trac.wordpress.org/ticket/33454

@monbauza
Copy link

monbauza commented Dec 6, 2016

Please inform the customer of conversation # 99154 when this conversation has been closed.

@monbauza
Copy link

monbauza commented Dec 6, 2016

Please inform the customer of conversation # 148759 when this conversation has been closed.

@paulovsky
Copy link

paulovsky commented Jan 3, 2017

Any developments/ideas on this issue? Not being able to use Yoast with Chinese content is keeping the plugin away from 450 million internet users.

@monbauza
Copy link

Please inform the customer of conversation # 172656 when this conversation has been closed.

@a4jp-com
Copy link

Maybe a character count is better for Japanese/Chinese. One kanji character is usually takes up the space of 2 regular characters.

@terw-dan
Copy link
Member

@a4jp-com thanks for your suggestion. So each kanji character can be seen as 1 word? Or is there more to it?

@paulovsky
Copy link

paulovsky commented Jan 24, 2017

Japanese (also Chinese, Korean, Vietnamese) is a logographic language from the Han family; that means that not all characters represent morphemes: some morphemes are composed of more than one characters (see more here).

@terw-dan The approach is to take the word count as character count, mostly because each character counts as one "word" and there are no spaces between characters (at least in Chinese).

As for keyword analysis, if the user imputs the combination of two or more characters, that must be seen as one word.

@a4jp-com
Copy link

Individual kanji characters sometimes have a meaning but in some situations they are combined with a few other hiragana characters to make different words with different readings.

Kanji only:
下 (shita) down

Kanji with hiragana:
下さい (kudasai) please

Each character in either hiragana, katakana or kanji takes up the space of 2 english characters.

@terw-dan
Copy link
Member

Thanks both for the explanation. This will come in helpful when we start implementing and testing this.

@a4jp-com
Copy link

a4jp-com commented Feb 1, 2017

Is there anything I can do to help?

@IreneStr
Copy link
Contributor

IreneStr commented Feb 2, 2017

@a4jp-com Thank you for your eagerness to help! At the moment, our main problem is the absence of spaces in Asian languages. Could you confirm that in Japanese there are no spaces between individual words as well.

@a4jp-com
Copy link

a4jp-com commented Feb 2, 2017

Sometimes a regular space is used but in other situations a Japanese space is used.

Here is the encoding for the unicode character 'IDEOGRAPHIC SPACE' (U+3000)

@IreneStr
Copy link
Contributor

IreneStr commented Feb 3, 2017

@a4jp-com Thank you for your reply. From googling some Japanese websites, I got the impression that spaces are generally speaking only used between sentences, but not between words.

In the following sentences, for example, there is a whitespace after 、 and 。.):

日本の文化を美術や音楽、演劇、映画からファッションやデザインまで幅広く世界に紹介しています。また、言葉を超えた共感の場をつくり出し、ともに創造する喜びをわかちあって、人と人との交流を深めていきます。

However, these white spaces are part of the 、(U+3001) and 。(U+3002) characters (so the space is not a separate character).

In what situations do people use the regular space or Japanese space?

@a4jp-com
Copy link

a4jp-com commented Feb 3, 2017

Regular spaces are used between the surname and given name.
山田 たろ
but sometimes no space is used or a Japanese space is used.
山田たろ
山田 たろ

It's kind of a design choice.

Regular spaces are also used when romaji or English phrases are used in advertising.

@idpokute
Copy link

Most of Japanese don't use space between words. Maybe Yoast can give the option; that turns off some deduction rules for Japanese.

@a4jp-com
Copy link

a4jp-com commented Feb 12, 2017

I've been living in Japan for 16 years and have worked as a system engineer in 3 Japanese companies. Spaces are used in sites here.

Especially in pages mixed with English words.

For example:
https://www.toshiba-newenergy.com/
OECD加盟国34ヵ国中33位
IEA Energy Balance of OECD Countries 2013

@IreneStr
Copy link
Contributor

@idpokute @a4jp-com Thank you both for the information. It'll be very helpful when we want to implement this feature in the future.

@mmikhan
Copy link
Member

mmikhan commented Feb 16, 2017

Please inform the customer of conversation # 179998 when this conversation has been closed.

@a4jp-com
Copy link

a4jp-com commented Feb 17, 2017

Thank you very much @IreneStr.

@monbauza
Copy link

Please inform the customer of conversation # 184521 when this conversation has been closed.

@a4jp-com
Copy link

a4jp-com commented Mar 16, 2017

I have been following the rules set out in the plugin but I have lost 75% of my views on one Japanese site. I've gone from about 200 views a day down to about 50 views a day.

The only other change I made was changing the site to HTTPS. I thought that was meant to increase the ranking. Any ideas what could be causing the problem? https://agreatdream.com/word-lists/

Is this somehow linked to the count being off?

@ullivr
Copy link

ullivr commented Mar 17, 2017

i wrote my blog in chinese, really really really need this function.

@terw-dan
Copy link
Member

@a4jp-com The wordcount is only shown as an indication. It is not something we (can) save to your post that has influence on your rankings. So it has to be something else that caused a decrease.

@a4jp-com
Copy link

I was just thinking as the numbers are wrong that when we make pages we might be adding titles that are too long etc

@saitaiky
Copy link

saitaiky commented Apr 7, 2017

I write my blog in Chinese and Japanese. There is some example of my post title for you to test.
本願寺,錦市場,八坂神社 <==Chinese only
京阪神八日之旅 <==Chinese only
伏見稻荷大社,けんどん屋,奈良公園 <==Chinese mix Japanese

Please help to fix it , we do need this amazing function! it keeps telling us our posts are poor makes us sad.
Thank you so so much

@mmikhan
Copy link
Member

mmikhan commented Apr 21, 2017

Please inform the customer of conversation # 192085 when this conversation has been closed.

@saitaiky
Copy link

has it been fixed yet? guys

@a4jp-com
Copy link

I'd love to find out the character count through this plugin.

@mmikhan
Copy link
Member

mmikhan commented May 24, 2017

Please inform the customer of conversation # 198257 when this conversation has been closed.

@a4jp-com
Copy link

Any ideas on checking the word count?

@a4jp-com
Copy link

Is anyone still working on this?

@amboutwe
Copy link
Member

Please inform the customer of conversation # 516867 when this conversation has been closed.

@amboutwe
Copy link
Member

amboutwe commented Sep 6, 2019

Please inform the customer of conversation # 538393 when this conversation has been closed.

@hawm
Copy link

hawm commented Sep 18, 2019

Still a problem at Sep 2019.

@mayada-ibrahim
Copy link

Please inform the customer of conversation # 565766 when this conversation has been closed.

@mmikhan
Copy link
Member

mmikhan commented Feb 16, 2020

Please inform the customer of conversation # 585879 when this conversation has been closed.

@a4jp-com
Copy link

a4jp-com commented Feb 17, 2020

Is anyone working on this anymore? This has been here for about 4 years.

@Djennez
Copy link
Member

Djennez commented Feb 17, 2020

This is currently not being worked on, it is a feature request that might get implemented in the future.

@a4jp-com
Copy link

Okay. Thanks for the honest reply @Djennez. I'll post this update in the plugin download area of WordPress. No one has said this up till now which isn't very good. This should have been made clear years ago.

Is it okay to fork the current plugin and make a Japanese version. I'll just edit the character count code. What is the licence of the current free version? Is it a General Public License?

@Pcosta88
Copy link
Contributor

Pcosta88 commented Apr 2, 2020

Please inform the customer of conversation # 598441 when this conversation has been closed.

@7creo
Copy link

7creo commented Apr 20, 2020

Okay. Thanks for the honest reply @Djennez. I'll post this update in the plugin download area of WordPress. No one has said this up till now which isn't very good. This should have been made clear years ago.

Is it okay to fork the current plugin and make a Japanese version. I'll just edit the character count code. What is the licence of the current free version? Is it a General Public License?

Did you success on this yet? I still don't understand how this situation can be ignored by the developers for such a long time. I would never purchase a paid version of that with this issue, as it makes the plugin 50% useless.

@Pcosta88
Copy link
Contributor

Pcosta88 commented May 6, 2020

Please inform the customer of conversation # 607816 when this conversation has been closed.

@michaelbriantina
Copy link

@a4jp-com
Copy link

a4jp-com commented Jul 13, 2020

Can you include a conditional statement based on the page language that either counts words or characters based on the language set?

Example 1:

var lang = document.documentElement.lang if (lang === ‘ja’) { ...do something - Japanese character count code here... }

Example 2:

 <?php if(ICL_LANGUAGE_CODE=='zh' || ICL_LANGUAGE_CODE=='ja' || ICL_LANGUAGE_CODE=='ko'){ ?>
...Chinese/Japanese/Korean character count code...
<?php } ?> 

I'm not sure how you have separated other languages but I'm sure you already have code like this in your plugin for other languages.

@crpeng
Copy link

crpeng commented Aug 24, 2020

I also need this improvement for Chinese language.

@a4jp-com
Copy link

T-T. Whatever happened to the programmer that was assigned to fixing this problem?

@suascat
Copy link

suascat commented Sep 15, 2020

Please inform the customer of conversation # 649514 when this conversation has been closed.

@priscillamc
Copy link

@michaelbriantina
Copy link

Please inform the customer of conversation # 729318 when this conversation has been closed.

@a4jp-com
Copy link

Is there a way to just get a character count instead of word count? You already have the code in the plugin. You just need a function for the option.

@ogodoabiola
Copy link

Please inform the customer of conversation #744721 when this conversation has been closed.

@Isildur00
Copy link

Please inform the customer of conversation #760565 when this conversation has been closed.

@a4jp-com
Copy link

a4jp-com commented Jul 5, 2021

WordPress has build code to be able to count words in Asian languages. We should take this as inspiration. WordPress actually ships the word counting code only in the Chinese language pack, the third link shows where to find it.

Patch for word-count.js
https://core.trac.wordpress.org/ticket/20738
https://core.trac.wordpress.org/ticket/30966

Where to find the language pack:
https://core.trac.wordpress.org/ticket/33454

Can we find out what is happening if you already have the code please?

@Djennez Djennez transferred this issue from Yoast/YoastSEO.js Aug 6, 2021
@amyuki
Copy link

amyuki commented Nov 17, 2021

Can you just simply counting words like what WordPress Editor do?

@michaelbriantina
Copy link

Please inform the customer of conversation # 894142 when this conversation has been closed.

@mmikhan
Copy link
Member

mmikhan commented May 12, 2022

We are going to close the issue since supporting for Asian language is a generic term as there are a whole lot languages available in Asia. Therefore, we will split the issue and open new ones for specific languages. If you want to see if Yoast SEO has support for Chinese, Korean, or Vietnamese, do feel free to open a specific issue (if there's none already) so that we can see how many users are requesting for what language.

That said, Yoast SEO v18.0 has already included support for Japanese (which is one of the Asian languages as well.)

@mmikhan mmikhan closed this as completed May 12, 2022
@a4jp-com
Copy link

Thanks for adding Japanese. I never noticed you did that. ♥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests