-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use proper BCP 47 language tags for Chinese #1
Comments
I agree. zh-CN is not the technically correct way of unambiguously specifying a language, but its arguably more commonly used on the Web (Accept-Language, navigator.languages). So I'd keep it in the example. In the meanwhile, I've created a PR to include script subtag in the example. |
I think the time when it was commonly used on the Web as a way to refer to Simplified Chinese was many years ago. Nowadays zh-Hans works fine pretty well everywhere. And by using zh-CN prominently in your example you only promote the incorrect usage. So i still think you should change it. I can refer this issue to the i18n WG if you like. I think it's fine to mention zh-CN as something that a user may type in, but which should be interpreted to mean zh-Hans, which you do in #8. But i think it should be framed to look like the recogniser is correcting incorrect input (which it is, since the actual script/orthography is very important for handwriting). I see an implication in the quoted text above (esp. because it doesn't even mention zh-Hans) that zh-CN is an appropriate way of referring to SC. It's really not. It's only appropriate if the language tag ignores script information and actually focuses on the region – which it may do, for example, when what's important is the spoken language (although that's problematic wrt zh too unless there's an implicit association of zh with cmn), or the locale (eg. for location services, legal reasons, etc.) |
I woundn't say using "zh-CN" here is incorrect, given:
|
Hi @r12a , we have a question about language tag for non-standard "languages". We have handwriting models for recognizing geometric shapes and/or user guestures (e.g. a square), what language tag could we use for this case? I see there is a "zxx" primary tag for "No linguistic content; Not applicable". Is it suitable? For example, use "zxx-Shape" for the above recognizer. Or is private subtags more suitable? |
I think it's best to avoid private subtags if at all possible, and |
There are really two choices that occur to me here. One is to use |
Closing this issue. zh_CN and zh_Hans convey different meanings, "zh_CN" means "Chinese as used in mainland China", "zh_Hans" means "Simplified Chinese regardless of where it's used". Web applications should choose whichever is more suitable for their use cases. We allow the browser implementation and the underlying recognizer to make reasonable assumptions about the script (considering different handwriting recognizer implementations identifies their models differently). For shape / user gesture models, we will use a zxx private tag ("zxx-x-shape"), following this precedence: MLKit shape detection models. |
https://github.com/WICG/handwriting-recognition/blob/main/explainer.md
zh-CN is presumably meant to indicate Simplified Chinese, which is also used in Singapore. That's why it is better to use zh-Hans as the language tag, rather than zh-CN (and zh-Hant, rather than zh-TW).
Please change the example.
The text was updated successfully, but these errors were encountered: