New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow four character country code #255

Merged
merged 1 commit into from Mar 29, 2017

Conversation

Projects
None yet
3 participants
@gwynjudd
Contributor

gwynjudd commented Mar 27, 2017

No description provided.

@BenjaminVanRyseghem

This comment has been minimized.

Show comment
Hide comment
@BenjaminVanRyseghem

BenjaminVanRyseghem Mar 27, 2017

Owner

can you add a unit test please?

Owner

BenjaminVanRyseghem commented Mar 27, 2017

can you add a unit test please?

@gwynjudd

This comment has been minimized.

Show comment
Hide comment
@gwynjudd
Contributor

gwynjudd commented Mar 27, 2017

@BenjaminVanRyseghem

This comment has been minimized.

Show comment
Hide comment
@BenjaminVanRyseghem

BenjaminVanRyseghem Mar 28, 2017

Owner

now there is only the test 😄

Owner

BenjaminVanRyseghem commented Mar 28, 2017

now there is only the test 😄

@BenjaminVanRyseghem

This comment has been minimized.

Show comment
Hide comment
@BenjaminVanRyseghem

BenjaminVanRyseghem Mar 28, 2017

Owner

I just learn that the ISO convention is {2 digits}-{2 digits}, so maybe this PR is going in the wrong direction? What do you think?

@ArmorDarks an opinion?

Owner

BenjaminVanRyseghem commented Mar 28, 2017

I just learn that the ISO convention is {2 digits}-{2 digits}, so maybe this PR is going in the wrong direction? What do you think?

@ArmorDarks an opinion?

@ArmorDarks

This comment has been minimized.

Show comment
Hide comment
@ArmorDarks

ArmorDarks Mar 28, 2017

@BenjaminVanRyseghem In short, yes, it's like that, you can read in details about it here.

So, by GNU Gettext it usually strictly {2 digits}-{2 digits}.

However, on practice it slightly different, because some locales can occur in few "variants". For instance, serbian sr_CS can be cyrillic, or latin, and it is quite common to define it based on Microsoft's cultures: sr-Latn-CS for latin and sr-Cyrl-CS for cyrillic. Here you can see, that language code (do not confuse with complete locale or country code!) actually defined not by two characters, but by {2 chars}-{4 chars}.

Pure GNU way to define same would be sr_CS@latin (because used by GNU ISOs does not allow non 2-chars language codes), however, sr-Latn-CS-like variant is so often occurring in Node environment, that I usually recommend to account that possible format during parsing too (in other words, to expect that language code can be and {2 chars}, and {2 chars}-{4 chars} (or {2 chars}-{2 chars}, and {2 chars}-{4 chars}-{2 chars} if we're talking about full locale)).

I can't clearly understand what about this PR, but if it's is compatible with that slight deviation from ISO-only compatible locales, I'm 👍 for it.

ArmorDarks commented Mar 28, 2017

@BenjaminVanRyseghem In short, yes, it's like that, you can read in details about it here.

So, by GNU Gettext it usually strictly {2 digits}-{2 digits}.

However, on practice it slightly different, because some locales can occur in few "variants". For instance, serbian sr_CS can be cyrillic, or latin, and it is quite common to define it based on Microsoft's cultures: sr-Latn-CS for latin and sr-Cyrl-CS for cyrillic. Here you can see, that language code (do not confuse with complete locale or country code!) actually defined not by two characters, but by {2 chars}-{4 chars}.

Pure GNU way to define same would be sr_CS@latin (because used by GNU ISOs does not allow non 2-chars language codes), however, sr-Latn-CS-like variant is so often occurring in Node environment, that I usually recommend to account that possible format during parsing too (in other words, to expect that language code can be and {2 chars}, and {2 chars}-{4 chars} (or {2 chars}-{2 chars}, and {2 chars}-{4 chars}-{2 chars} if we're talking about full locale)).

I can't clearly understand what about this PR, but if it's is compatible with that slight deviation from ISO-only compatible locales, I'm 👍 for it.

@ArmorDarks

This comment has been minimized.

Show comment
Hide comment
@ArmorDarks

ArmorDarks Mar 28, 2017

Just to highlight this once again, title of this PR says Allow four character country code. However, {2 chars}-{4 chars} for country codes not allowed, even in MS cultures. What author probably meant is to allow {2 chars}-{4 chars} format for language code, which sounds ok to me, even despite it is deviation from GNU classic format,

ArmorDarks commented Mar 28, 2017

Just to highlight this once again, title of this PR says Allow four character country code. However, {2 chars}-{4 chars} for country codes not allowed, even in MS cultures. What author probably meant is to allow {2 chars}-{4 chars} format for language code, which sounds ok to me, even despite it is deviation from GNU classic format,

@gwynjudd

This comment has been minimized.

Show comment
Hide comment
@gwynjudd

gwynjudd Mar 28, 2017

Contributor

@ArmorDarks sorry yes I am probably using the wrong terminology, but the main goal is to allow e.g. zh-HANS as language code which has 4 chars in the second part

Contributor

gwynjudd commented Mar 28, 2017

@ArmorDarks sorry yes I am probably using the wrong terminology, but the main goal is to allow e.g. zh-HANS as language code which has 4 chars in the second part

@gwynjudd

This comment has been minimized.

Show comment
Hide comment
@gwynjudd

gwynjudd Mar 28, 2017

Contributor

@BenjaminVanRyseghem

now there is only the test 😄

Is it ok? The intent of this fix is just to make it so that this existing test doesn't fail for e.g. zh-Hans

Contributor

gwynjudd commented Mar 28, 2017

@BenjaminVanRyseghem

now there is only the test 😄

Is it ok? The intent of this fix is just to make it so that this existing test doesn't fail for e.g. zh-Hans

@BenjaminVanRyseghem

This comment has been minimized.

Show comment
Hide comment
@BenjaminVanRyseghem
Owner

BenjaminVanRyseghem commented Mar 29, 2017

ah ok 😄

@BenjaminVanRyseghem BenjaminVanRyseghem merged commit 9ce6b1d into BenjaminVanRyseghem:develop Mar 29, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@gwynjudd gwynjudd deleted the gwynjudd:language-codes branch Mar 29, 2017

@BenjaminVanRyseghem

This comment has been minimized.

Show comment
Hide comment
@BenjaminVanRyseghem

BenjaminVanRyseghem May 2, 2017

Owner

in numbro 1.11.0

Owner

BenjaminVanRyseghem commented May 2, 2017

in numbro 1.11.0

@ArmorDarks

This comment has been minimized.

Show comment
Hide comment
@ArmorDarks

ArmorDarks Jul 29, 2017

Btw, finally figured completely is {2 chars}-{4 chars} a valid language code or not.

Turns out, it is based on (relatively) new specs IETF BCP 47, which todays becomes de-facto standard and replaces old ISO 369-1, which was quite limited.

For instance, BCP 47 declares that locale can have following format:

language-Script-REGION-variants

Script is Hans in our case, defined by BCP 47. Language should be used according to ISO 639-1. Region based on ISO 3166. BCP 47 combines them in something, called language tag.

As turns out, even HTML5 accepts language tag as a valid html lang.

A bit more info:

Unfortunately, seems that Gettext does not switch to that standard yet and continues to use ISO 639-1 for language and ISO 3166 for region only (so, in Gettext only locales like {language_REGION} are valid), and script or any other modifier should be denoted by @:

language_REGION@script

But I really hope they switch to BCP 47, since it definately much better geared tower modern standards.

MS cultures seems to resemble BCP 47 standard, but I can't tell for sure — it differs from one MS document to another.

ArmorDarks commented Jul 29, 2017

Btw, finally figured completely is {2 chars}-{4 chars} a valid language code or not.

Turns out, it is based on (relatively) new specs IETF BCP 47, which todays becomes de-facto standard and replaces old ISO 369-1, which was quite limited.

For instance, BCP 47 declares that locale can have following format:

language-Script-REGION-variants

Script is Hans in our case, defined by BCP 47. Language should be used according to ISO 639-1. Region based on ISO 3166. BCP 47 combines them in something, called language tag.

As turns out, even HTML5 accepts language tag as a valid html lang.

A bit more info:

Unfortunately, seems that Gettext does not switch to that standard yet and continues to use ISO 639-1 for language and ISO 3166 for region only (so, in Gettext only locales like {language_REGION} are valid), and script or any other modifier should be denoted by @:

language_REGION@script

But I really hope they switch to BCP 47, since it definately much better geared tower modern standards.

MS cultures seems to resemble BCP 47 standard, but I can't tell for sure — it differs from one MS document to another.

@ArmorDarks ArmorDarks referenced this pull request Jul 29, 2017

Open

BCP 47 language tag #275

0 of 4 tasks complete
@BenjaminVanRyseghem

This comment has been minimized.

Show comment
Hide comment
@BenjaminVanRyseghem

BenjaminVanRyseghem Jul 31, 2017

Owner

Thanks for the memo.

I am all in for leaving behind MS notations as they've proven many times to not be reliable in their "best practices".

BCP 47 sounds good. Let's go for it 😄

Owner

BenjaminVanRyseghem commented Jul 31, 2017

Thanks for the memo.

I am all in for leaving behind MS notations as they've proven many times to not be reliable in their "best practices".

BCP 47 sounds good. Let's go for it 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment