Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework locale specific pluralization rules #137

Closed
wants to merge 1 commit into from

Conversation

schleyfox
Copy link
Contributor

@schleyfox schleyfox commented Sep 6, 2019

I regenerated pluralization rules according to internal sources that are
used in our non-JavaScript environments. This is needed to ensure
consistency between translations regardless of platform. The plural rule
functions were automatically generated from a Java source, so they are
somewhat more verbose and less hand-factored.

Names of plural types are purely mnemonic and do not imply a specific
relationship between two languages, only that their pluralType functions
would be the same.

Polyglot supports locales that we don't use internally that were
contributed by the community. I've ported those over to maintain
compatibility. There were several places where exact locales were
specified. I changed those to be based on language-only when doing so
would not change the result. Behavior for pt-BR is preserved, as it does
have a different pluralization rule than pt.

There were a couple areas of disagreement between sources. In these
cases, I went with what we were using internally, which tends to be close to
https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals

The specific changes in this PR are

  1. tr from germanLike to chineseLike
  2. lt rule changed
  3. tl from frenchLike to tagalog

Additionally, per our records, fa would be chineseLike instead of
germanLike, but as we do not yet support this language, I have left it
as is.

I also generated tests based on some other pluralization specs we had in a different implementation.

/cc @mereskin-zz

I regenerated pluralization rules according to internal sources that are
used in our non-JavaScript environments. This is needed to ensure
consistency between translations regardless of platform. The plural rule
functions were automatically generated from a Java source, so they are
somewhat more verbose and less hand-factored.

Names of plural types are purely mnemonic and do not imply a specific
relationship between two languages, only that their pluralType functions
would be the same.

Polyglot supports locales that we don't use internally that were
contributed by the community. I've ported those over to maintain
compatibility. There were several places where exact locales were
specified. I changed those to be based on language-only when doing so
would not change the result. Behavior for pt-BR is preserved, as it does
have a different pluralization rule than pt.

There were a couple areas of disagreement between sources. In these
cases, I went with what we were using internally, which tends to be close to
https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals

The specific changes in this PR are

1. tr from germanLike to chineseLike
2. lt rule changed

Additionally, per our records, fa would be chineseLike instead of
germanLike, but as we do not yet support this language, I have left it
as is.
@ljharb
Copy link
Collaborator

ljharb commented Sep 7, 2019

We should be sticking to CLDR as the source of truth; what's used internally that deviates from that?


it('pluralizes in Turkish', function () {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these tests be kept? deleting tests makes it harder to know if it's a breaking change or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are breaking. I deleted these tests because the rules for turkish and lithuanian changed completely, so I opted to cover those cases with new tests checking the exemplar plural rules. Testing each locale's categorization is different from what these tests are doing.

Copy link
Contributor Author

@schleyfox schleyfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working off an internal list reviewed by our linguists. Most things do match up with CLDR, but there are a couple deviations from either CLDR or what's implemented in polyglot. Specifically

  1. CLDR (and polyglot prior to this PR) has Turkish as germanLike , while we internally have it as chineseLike, which matches the Mozilla definition
  2. Polyglot has Tagalog as frenchLike, but CLDR (and us internally) have a tagalog specific rule
  3. CLDR (and Polyglot) have one rule for Lithuanian while Mozilla (and us internally) use a completely different one , however I'm double checking which we want to use, as I have some indication that we meant to use the CLDR form
  4. CLDR has Azerbaijani using germanLike while our linguists indicated that it should use chineseLike with the note that this appeared to be a bug in CLDR.

I did not audit the categorizations that did not change to see whether those were in compliance with CLDR or some other source. Additionally, our pluralization does not support decimals, which leads to some mismatches with CLDR definitions for number of forms (in czechLike and lithuanian and possibly others)

Conceptually, I agree that it would be good to use CLDR as the source of truth, but our internal review recommended some deviations.


it('pluralizes in Turkish', function () {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are breaking. I deleted these tests because the rules for turkish and lithuanian changed completely, so I opted to cover those cases with new tests checking the exemplar plural rules. Testing each locale's categorization is different from what these tests are doing.

@ljharb
Copy link
Collaborator

ljharb commented Sep 7, 2019

It would probably be ideal to have some sort of option, to switch between airbnb internal classifications, and CLDR classifications.

@schleyfox
Copy link
Contributor Author

I'd agree, though it's worth noting that what we currently have implemented is not entirely in line with CLDR and is far from complete (and I'm not going to make it complete).

I suppose I could expose plural types/mappings as an option that can be passed to the constructor and then take my marbles and go home.

@ljharb
Copy link
Collaborator

ljharb commented Sep 7, 2019

That approach would allow the source of truth to be internal, in airbnb-i18n. Your call.

@schleyfox
Copy link
Contributor Author

Closing in favor of #138

@schleyfox schleyfox closed this Sep 9, 2019
@ljharb ljharb deleted the rework-pluralization-rules branch September 10, 2019 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants