Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework locale specific pluralization rules #137

Closed
wants to merge 1 commit into from

Conversation

@schleyfox
Copy link
Member

commented Sep 6, 2019

I regenerated pluralization rules according to internal sources that are
used in our non-JavaScript environments. This is needed to ensure
consistency between translations regardless of platform. The plural rule
functions were automatically generated from a Java source, so they are
somewhat more verbose and less hand-factored.

Names of plural types are purely mnemonic and do not imply a specific
relationship between two languages, only that their pluralType functions
would be the same.

Polyglot supports locales that we don't use internally that were
contributed by the community. I've ported those over to maintain
compatibility. There were several places where exact locales were
specified. I changed those to be based on language-only when doing so
would not change the result. Behavior for pt-BR is preserved, as it does
have a different pluralization rule than pt.

There were a couple areas of disagreement between sources. In these
cases, I went with what we were using internally, which tends to be close to
https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals

The specific changes in this PR are

  1. tr from germanLike to chineseLike
  2. lt rule changed
  3. tl from frenchLike to tagalog

Additionally, per our records, fa would be chineseLike instead of
germanLike, but as we do not yet support this language, I have left it
as is.

I also generated tests based on some other pluralization specs we had in a different implementation.

/cc @mereskin-zz

Rework locale specific pluralization rules
I regenerated pluralization rules according to internal sources that are
used in our non-JavaScript environments. This is needed to ensure
consistency between translations regardless of platform. The plural rule
functions were automatically generated from a Java source, so they are
somewhat more verbose and less hand-factored.

Names of plural types are purely mnemonic and do not imply a specific
relationship between two languages, only that their pluralType functions
would be the same.

Polyglot supports locales that we don't use internally that were
contributed by the community. I've ported those over to maintain
compatibility. There were several places where exact locales were
specified. I changed those to be based on language-only when doing so
would not change the result. Behavior for pt-BR is preserved, as it does
have a different pluralization rule than pt.

There were a couple areas of disagreement between sources. In these
cases, I went with what we were using internally, which tends to be close to
https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals

The specific changes in this PR are

1. tr from germanLike to chineseLike
2. lt rule changed

Additionally, per our records, fa would be chineseLike instead of
germanLike, but as we do not yet support this language, I have left it
as is.
@ljharb

This comment has been minimized.

Copy link
Collaborator

commented Sep 7, 2019

We should be sticking to CLDR as the source of truth; what's used internally that deviates from that?


it('pluralizes in Turkish', function () {

This comment has been minimized.

Copy link
@ljharb

ljharb Sep 7, 2019

Collaborator

can these tests be kept? deleting tests makes it harder to know if it's a breaking change or not.

This comment has been minimized.

Copy link
@schleyfox

schleyfox Sep 7, 2019

Author Member

These are breaking. I deleted these tests because the rules for turkish and lithuanian changed completely, so I opted to cover those cases with new tests checking the exemplar plural rules. Testing each locale's categorization is different from what these tests are doing.

@schleyfox
Copy link
Member Author

left a comment

I'm working off an internal list reviewed by our linguists. Most things do match up with CLDR, but there are a couple deviations from either CLDR or what's implemented in polyglot. Specifically

  1. CLDR (and polyglot prior to this PR) has Turkish as germanLike , while we internally have it as chineseLike, which matches the Mozilla definition
  2. Polyglot has Tagalog as frenchLike, but CLDR (and us internally) have a tagalog specific rule
  3. CLDR (and Polyglot) have one rule for Lithuanian while Mozilla (and us internally) use a completely different one , however I'm double checking which we want to use, as I have some indication that we meant to use the CLDR form
  4. CLDR has Azerbaijani using germanLike while our linguists indicated that it should use chineseLike with the note that this appeared to be a bug in CLDR.

I did not audit the categorizations that did not change to see whether those were in compliance with CLDR or some other source. Additionally, our pluralization does not support decimals, which leads to some mismatches with CLDR definitions for number of forms (in czechLike and lithuanian and possibly others)

Conceptually, I agree that it would be good to use CLDR as the source of truth, but our internal review recommended some deviations.


it('pluralizes in Turkish', function () {

This comment has been minimized.

Copy link
@schleyfox

schleyfox Sep 7, 2019

Author Member

These are breaking. I deleted these tests because the rules for turkish and lithuanian changed completely, so I opted to cover those cases with new tests checking the exemplar plural rules. Testing each locale's categorization is different from what these tests are doing.

@ljharb

This comment has been minimized.

Copy link
Collaborator

commented Sep 7, 2019

It would probably be ideal to have some sort of option, to switch between airbnb internal classifications, and CLDR classifications.

@schleyfox

This comment has been minimized.

Copy link
Member Author

commented Sep 7, 2019

I'd agree, though it's worth noting that what we currently have implemented is not entirely in line with CLDR and is far from complete (and I'm not going to make it complete).

I suppose I could expose plural types/mappings as an option that can be passed to the constructor and then take my marbles and go home.

@ljharb

This comment has been minimized.

Copy link
Collaborator

commented Sep 7, 2019

That approach would allow the source of truth to be internal, in airbnb-i18n. Your call.

@schleyfox

This comment has been minimized.

Copy link
Member Author

commented Sep 9, 2019

Closing in favor of #138

@schleyfox schleyfox closed this Sep 9, 2019

@ljharb ljharb deleted the rework-pluralization-rules branch Sep 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.