Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow four character country code #255

Merged

Conversation

@gwynjudd
Copy link
Contributor

@gwynjudd gwynjudd commented Mar 27, 2017

No description provided.

@BenjaminVanRyseghem
Copy link
Owner

@BenjaminVanRyseghem BenjaminVanRyseghem commented Mar 27, 2017

can you add a unit test please?

@gwynjudd gwynjudd force-pushed the gwynjudd:language-codes branch from 468a586 to 4286390 Mar 27, 2017
@gwynjudd
Copy link
Contributor Author

@gwynjudd gwynjudd commented Mar 27, 2017

@BenjaminVanRyseghem
Copy link
Owner

@BenjaminVanRyseghem BenjaminVanRyseghem commented Mar 28, 2017

now there is only the test 😄

@BenjaminVanRyseghem
Copy link
Owner

@BenjaminVanRyseghem BenjaminVanRyseghem commented Mar 28, 2017

I just learn that the ISO convention is {2 digits}-{2 digits}, so maybe this PR is going in the wrong direction? What do you think?

@ArmorDarks an opinion?

@ArmorDarks
Copy link

@ArmorDarks ArmorDarks commented Mar 28, 2017

@BenjaminVanRyseghem In short, yes, it's like that, you can read in details about it here.

So, by GNU Gettext it usually strictly {2 digits}-{2 digits}.

However, on practice it slightly different, because some locales can occur in few "variants". For instance, serbian sr_CS can be cyrillic, or latin, and it is quite common to define it based on Microsoft's cultures: sr-Latn-CS for latin and sr-Cyrl-CS for cyrillic. Here you can see, that language code (do not confuse with complete locale or country code!) actually defined not by two characters, but by {2 chars}-{4 chars}.

Pure GNU way to define same would be sr_CS@latin (because used by GNU ISOs does not allow non 2-chars language codes), however, sr-Latn-CS-like variant is so often occurring in Node environment, that I usually recommend to account that possible format during parsing too (in other words, to expect that language code can be and {2 chars}, and {2 chars}-{4 chars} (or {2 chars}-{2 chars}, and {2 chars}-{4 chars}-{2 chars} if we're talking about full locale)).

I can't clearly understand what about this PR, but if it's is compatible with that slight deviation from ISO-only compatible locales, I'm 👍 for it.

@ArmorDarks
Copy link

@ArmorDarks ArmorDarks commented Mar 28, 2017

Just to highlight this once again, title of this PR says Allow four character country code. However, {2 chars}-{4 chars} for country codes not allowed, even in MS cultures. What author probably meant is to allow {2 chars}-{4 chars} format for language code, which sounds ok to me, even despite it is deviation from GNU classic format,

@gwynjudd
Copy link
Contributor Author

@gwynjudd gwynjudd commented Mar 28, 2017

@ArmorDarks sorry yes I am probably using the wrong terminology, but the main goal is to allow e.g. zh-HANS as language code which has 4 chars in the second part

@gwynjudd
Copy link
Contributor Author

@gwynjudd gwynjudd commented Mar 28, 2017

@BenjaminVanRyseghem

now there is only the test 😄

Is it ok? The intent of this fix is just to make it so that this existing test doesn't fail for e.g. zh-Hans

@BenjaminVanRyseghem
Copy link
Owner

@BenjaminVanRyseghem BenjaminVanRyseghem commented Mar 29, 2017

ah ok 😄

@BenjaminVanRyseghem BenjaminVanRyseghem merged commit 9ce6b1d into BenjaminVanRyseghem:develop Mar 29, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gwynjudd gwynjudd deleted the gwynjudd:language-codes branch Mar 29, 2017
@BenjaminVanRyseghem
Copy link
Owner

@BenjaminVanRyseghem BenjaminVanRyseghem commented May 2, 2017

in numbro 1.11.0

@ArmorDarks
Copy link

@ArmorDarks ArmorDarks commented Jul 29, 2017

Btw, finally figured completely is {2 chars}-{4 chars} a valid language code or not.

Turns out, it is based on (relatively) new specs IETF BCP 47, which todays becomes de-facto standard and replaces old ISO 369-1, which was quite limited.

For instance, BCP 47 declares that locale can have following format:

language-Script-REGION-variants

Script is Hans in our case, defined by BCP 47. Language should be used according to ISO 639-1. Region based on ISO 3166. BCP 47 combines them in something, called language tag.

As turns out, even HTML5 accepts language tag as a valid html lang.

A bit more info:

Unfortunately, seems that Gettext does not switch to that standard yet and continues to use ISO 639-1 for language and ISO 3166 for region only (so, in Gettext only locales like {language_REGION} are valid), and script or any other modifier should be denoted by @:

language_REGION@script

But I really hope they switch to BCP 47, since it definately much better geared tower modern standards.

MS cultures seems to resemble BCP 47 standard, but I can't tell for sure — it differs from one MS document to another.

@ArmorDarks ArmorDarks mentioned this pull request Jul 29, 2017
0 of 4 tasks complete
@BenjaminVanRyseghem
Copy link
Owner

@BenjaminVanRyseghem BenjaminVanRyseghem commented Jul 31, 2017

Thanks for the memo.

I am all in for leaving behind MS notations as they've proven many times to not be reliable in their "best practices".

BCP 47 sounds good. Let's go for it 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.