Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow four character country code #255

Merged

Conversation

gwynjudd
Copy link
Contributor

No description provided.

@BenjaminVanRyseghem
Copy link
Owner

can you add a unit test please?

@gwynjudd
Copy link
Contributor Author

@BenjaminVanRyseghem done

@BenjaminVanRyseghem
Copy link
Owner

now there is only the test 😄

@BenjaminVanRyseghem
Copy link
Owner

I just learn that the ISO convention is {2 digits}-{2 digits}, so maybe this PR is going in the wrong direction? What do you think?

@ArmorDarks an opinion?

@ArmorDarks
Copy link

@BenjaminVanRyseghem In short, yes, it's like that, you can read in details about it here.

So, by GNU Gettext it usually strictly {2 digits}-{2 digits}.

However, on practice it slightly different, because some locales can occur in few "variants". For instance, serbian sr_CS can be cyrillic, or latin, and it is quite common to define it based on Microsoft's cultures: sr-Latn-CS for latin and sr-Cyrl-CS for cyrillic. Here you can see, that language code (do not confuse with complete locale or country code!) actually defined not by two characters, but by {2 chars}-{4 chars}.

Pure GNU way to define same would be sr_CS@latin (because used by GNU ISOs does not allow non 2-chars language codes), however, sr-Latn-CS-like variant is so often occurring in Node environment, that I usually recommend to account that possible format during parsing too (in other words, to expect that language code can be and {2 chars}, and {2 chars}-{4 chars} (or {2 chars}-{2 chars}, and {2 chars}-{4 chars}-{2 chars} if we're talking about full locale)).

I can't clearly understand what about this PR, but if it's is compatible with that slight deviation from ISO-only compatible locales, I'm 👍 for it.

@ArmorDarks
Copy link

Just to highlight this once again, title of this PR says Allow four character country code. However, {2 chars}-{4 chars} for country codes not allowed, even in MS cultures. What author probably meant is to allow {2 chars}-{4 chars} format for language code, which sounds ok to me, even despite it is deviation from GNU classic format,

@gwynjudd
Copy link
Contributor Author

@ArmorDarks sorry yes I am probably using the wrong terminology, but the main goal is to allow e.g. zh-HANS as language code which has 4 chars in the second part

@gwynjudd
Copy link
Contributor Author

@BenjaminVanRyseghem

now there is only the test 😄

Is it ok? The intent of this fix is just to make it so that this existing test doesn't fail for e.g. zh-Hans

@BenjaminVanRyseghem
Copy link
Owner

ah ok 😄

@BenjaminVanRyseghem BenjaminVanRyseghem merged commit 9ce6b1d into BenjaminVanRyseghem:develop Mar 29, 2017
@gwynjudd gwynjudd deleted the language-codes branch March 29, 2017 20:35
@BenjaminVanRyseghem
Copy link
Owner

in numbro 1.11.0

@ArmorDarks
Copy link

ArmorDarks commented Jul 29, 2017

Btw, finally figured completely is {2 chars}-{4 chars} a valid language code or not.

Turns out, it is based on (relatively) new specs IETF BCP 47, which todays becomes de-facto standard and replaces old ISO 369-1, which was quite limited.

For instance, BCP 47 declares that locale can have following format:

language-Script-REGION-variants

Script is Hans in our case, defined by BCP 47. Language should be used according to ISO 639-1. Region based on ISO 3166. BCP 47 combines them in something, called language tag.

As turns out, even HTML5 accepts language tag as a valid html lang.

A bit more info:

Unfortunately, seems that Gettext does not switch to that standard yet and continues to use ISO 639-1 for language and ISO 3166 for region only (so, in Gettext only locales like {language_REGION} are valid), and script or any other modifier should be denoted by @:

language_REGION@script

But I really hope they switch to BCP 47, since it definately much better geared tower modern standards.

MS cultures seems to resemble BCP 47 standard, but I can't tell for sure — it differs from one MS document to another.

@BenjaminVanRyseghem
Copy link
Owner

Thanks for the memo.

I am all in for leaving behind MS notations as they've proven many times to not be reliable in their "best practices".

BCP 47 sounds good. Let's go for it 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants