Expand internationalisation logic to add pluralisation #2804

vanitabarrett · 2022-08-22T12:57:51Z

What

Expand our internationalisation logic to handle pluralisation. This will allow users to be able to translate hardcoded strings in the character count component.

For example: “You have 1 character remaining” vs “You have 20 characters remaining”

Needs to handle languages other than English, that may have different plural forms, see https://cldr.unicode.org/index/cldr-spec/plural-rules

Why

Some components will need to be able to switch between different strings based on the value of something, e.g: the character count component will need to switch the count message based on the count (see example above)

Who needs to work on this

Developers

Who needs to review this

Developers

Done when

Internationalisation logic expanded to introduce support for pluralisation + unit tests
As noted in feedback, ensure there is only one key for 'zero’ translations
Internationalisation logic expanded to introduce support for number formatting, where relevant + tests
Nicely handles when a required plural form for a language is not provided, e.g: a clear error message (see related comment from pre-release testing)

romaricpascal · 2022-09-12T10:06:49Z

As noted in feedback, ensure there is only one key for 'zero’ translations

A quick thought as I read the feedback: isn't this point more linked to the character count component itself (where the meaning of under_limit_zero and over_limit_zero are the same) that the pluralisation of translations itself? It feels more up to the component to check "oh, I've got zero, so I use that special key". If that's the case, it's maybe more part of #2805/#2701.

romaricpascal · 2022-09-13T11:12:03Z

So, I tried to wrap my head around what would need to happen based on the spike and it seems like it's a two part update:

A. I18n constructor function to receive both locale and translations, so it can decide how to pick pluralised form:

The shape of the options passed to the constructor in the spike allowed for both to be passed, but no longer in the current implementation. Was that to provide the minimal API that'd make the tests pass or because a different direction is taken? (the locale option does not appear in the final comment for the Javascript and Nunjucks API
The default locale was pulled from the document's lang in the spike. Should that be kept? This may cause confusion if the locale in the doc doesn't match the language of the translation strings and it feels safer to be explicit by passing a locale alongside the strings. I may have missed some comments with a decision on this, though
Do we need other options? I was thinking specifically about something to let people set the function for picking plural rules, in case we missed a locale. Again, this may be something that was already discussed and I missed

Feels like this can be implemented on its own ahead ot the changes to the t method (see below).

B. Then there's the meaty bit of updating I18n.t(key, {count: 123, otherPart:...}) and adding relevant tests:

Compute the suffix to append to the requested key for lookin up the translated string. That suffix depends on count and the locale set in the I18n object.
In the spike, it uses Intl.PluralRules and falls back to two maps for mapping:
- BCP 47/RFC 5646 locale codes to groups of shared ways to handle plural rules,
- then said groups to functions converting the count to a suffix that'll be appended to the key.
Two questions sprung to mind:
a. What's the reason for rolling our own, vs. polyfilling? Was it to avoid the weight of another polyfill (15+KB for this one, with a few dependencies for it as well, judging by polyfill-library, not sure if included or not)?
b. Kind of tied to A.3. : The maps load all rules for all languages. Were there discussions for loading only what's needed that I may have missed?
Lookup the corresponding translation based on that key, showing an explicit error if the key is missing
Finally, interpolate the translation with whatever data is provided (count included). The issue mentions " Internationalisation logic expanded to introduce support for number formatting, where relevant + tests", but:
a. I couldn't find any discussion on that topic specifically, nor example in the spike (is it a different spike maybe?)
b. This feels orthogonal to pluralising so maybe worth its own issue: people may want to format numbers in regular translations (eg. to show a price with 2 digit decimals).

36degrees · 2022-09-13T12:21:28Z

So, I tried to wrap my head around what would need to happen based on the spike and it seems like it's a two part update:

A. I18n constructor function to receive both locale and translations, so it can decide how to pick pluralised form:

The shape of the options passed to the constructor in the spike allowed for both to be passed, but no longer in the current implementation. Was that to provide the minimal API that'd make the tests pass or because a different direction is taken? (the locale option does not appear in the final comment for the Javascript and Nunjucks API

The latter, I believe – there's no need for a locale option for the functionality implemented so far, so we haven't added it.

I think the intent was to include locale in the options object, rather than add a new argument?

The default locale was pulled from the document's lang in the spike. Should that be kept? This may cause confusion if the locale in the doc doesn't match the language of the translation strings and it feels safer to be explicit by passing a locale alongside the strings. I may have missed some comments with a decision on this, though

I think there are a few routes we could go down:

calculate the lang for the $module by recursively looking for a lang attribute, starting with the $module itself and up to and including the root element
calculate the lang for the $module by recursively looking for a lang attribute, starting with the $module itself and up to and including the root element, but allow this to be overridden by config
explicitly require it to be set in config
explicitly require it to be set in config (but we have to 'fall back' to something, either the document or just hardcoded 'en'?)

With the first two options I think this might mean the component class working out the locale and passing it to the i18n object? Unless we pass the $module itself so that the i18n class can recurse through the parent elements?

I'm not 100% sure about the second option as it'd mean allowing for both a lang and a data-locale attribute on the $module? 🤔

Do we need other options? I was thinking specifically about something to let people set the function for picking plural rules, in case we missed a locale. Again, this may be something that was already discussed and I missed

I'd suggest we can add this in the future if we find there's a need for it.

Feels like this can be implemented on its own ahead ot the changes to the t method (see below).

B. Then there's the meaty bit of updating I18n.t(key, {count: 123, otherPart:...}) and adding relevant tests:

Compute the suffix to append to the requested key for lookin up the translated string. That suffix depends on count and the locale set in the I18n object.
In the spike, it uses Intl.PluralRules and falls back to two maps for mapping:

BCP 47/RFC 5646 locale codes to groups of shared ways to handle plural rules,

then said groups to functions converting the count to a suffix that'll be appended to the key.
Two questions sprung to mind:
a. What's the reason for rolling our own, vs. polyfilling? Was it to avoid the weight of another polyfill (15+KB for this one, with a few dependencies for it as well, judging by polyfill-library, not sure if included or not)?
b. Kind of tied to A.3. : The maps load all rules for all languages. Were there discussions for loading only what's needed that I may have missed?

Suspect @querkmachine is the best person to answer these ones!

Lookup the corresponding translation based on that key, showing an explicit error if the key is missing

Finally, interpolate the translation with whatever data is provided (count included). The issue mentions " Internationalisation logic expanded to introduce support for number formatting, where relevant + tests", but:
a. I couldn't find any discussion on that topic specifically, nor example in the spike (is it a different spike maybe?)
b. This feels orthogonal to pluralising so maybe worth its own issue: people may want to format numbers in regular translations (eg. to show a price with 2 digit decimals).

I think number formatting should be tackled separately in #2568

romaricpascal · 2022-09-13T14:28:42Z

I think the intent was to include locale in the options object, rather than add a new argument?

I should have exemplified there for clarity, sorry. The spike was indeed receiving options as two keys in an object:

{
  locale: "en",
  translations: {
    key1: 'Translated string'
  }
}

In its current state, the constructor expects the translations straight away:

{
  key1: 'Translated string'
}

I was wondering if this was because of putting in just the code to make tests pass or another decision (as having locale + translations next to each other in an object looks a pretty good option)

I think there are a few routes we could go down:

calculate the lang for the $module by recursively looking for a lang attribute, starting with the $module itself and up to and including the root element

calculate the lang for the $module by recursively looking for a lang attribute, starting with the $module itself and up to and including the root element, but allow this to be overridden by config

explicitly require it to be set in config

explicitly require it to be set in config (but we have to 'fall back' to something, either the document or just hardcoded 'en'?)

With the first two options I think this might mean the component class working out the locale and passing it to the i18n object? Unless we pass the $module itself so that the i18n class can recurse through the parent elements?

I'm not 100% sure about the second option as it'd mean allowing for both a lang and a data-locale attribute on the $module? 🤔

At the I18n class level, it feels a bit weird to have to wrangle with the concept of a $module that's in the DOM and that you can use to look further up for a lang information. It feels more like the responsibility of the components themselves to gather the right arguments to pass to their I18n instance.

I like the idea of looking up the DOM for the closest [lang] and picking that as a fallback. It's a double edged sword, though, as it'd allow to set the value for multiple components at once, but at the same time, risk having components with translation strings in one language and their pluralisation rules in different locale by mistake. Is it complexity we need from the start though, or something that may be easier provided through a base class in v5.0?

Unrelated: All that talk about locale made me think we should likely consider it for the data attributes expected by the character count (added a comment to that effect on that issue).

I'd suggest we can add this in the future if we find there's a need for it.

Suspect @querkmachine is the best person to answer these ones!

I think number formatting should be tackled separately in #2568

👍🏻 x3 😄

36degrees · 2022-09-13T14:55:14Z

I was wondering if this was because of putting in just the code to make tests pass or another decision (as having locale + translations next to each other in an object looks a pretty good option)

Unsure! Possibly another one for @querkmachine when it's back tomorrow…

I like the idea of looking up the DOM for the closest [lang] and picking that as a fallback. It's a double edged sword, though, as it'd allow to set the value for multiple components at once, but at the same time, risk having components with translation strings in one language and their pluralisation rules in different locale by mistake. Is it complexity we need from the start though, or something that may be easier provided through a base class in v5.0?

All good points. As you suggest, maybe we should keep it simple and expect an explicit data-i18n.locale for now, and we can look at doing something 'cleverer' in 5.0?

If we were going to go down that route I think I'd prefer to hardcode 'en' as the fallback rather than looking at the document lang, purely to avoid encouraging users adding a lang attribute on the document if the whole document isn't translated (e.g. the header and footer are in English, but the contents of the main are translated). I might be overthinking it though…

querkmachine · 2022-09-14T08:54:15Z

For some background context: Part of my reasoning for including locale as an explicit option in the initial spike was because of the need to hardcode pluralisation rules for legacy IE.

I was envisaging a situation where a user might need to translate a component into a language which the hardcoded rules don't support; for example, Mongolian. However, Mongolian has the same plural forms as English, so a user could choose to use Mongolian translation strings but set the locale option to 'en' to use English language pluralisation logic.

Since then I've come to think that this functionality is probably extraneous and won't be needed by 98+% of users. It will also not be necessary when we remove JS support from IE, as the appropriate plural form information will already be available in evergreen browsers as Intl.PluralRules.

If we opt to keep it as a configuration option, I worry that it's more likely to cause confusion unless we make the intended use more explicit: such as splitting the option in two and naming them something like overridePluralRulesLocale and overrideNumberFormatLocale.

So, my thinking here was that, rather than allowing the user to provide a locale property as we did in the spikes, we would instead find and use the lang attribute of a parent element or from the document level.

My reasoning for this is that the lang attribute needs to be set anyway for accessibility compliance, both on the page-level and for any sections of the page that are in different languages to the rest of the page (WCAG criterion 3.1.1 (Level A) and 3.1.2 (Level AA)).

By making lang be the only way of doing this, we would be enforcing (or, at least, very heavily encouraging) users to comply with accessibility guidelines, and prevent issues with one locale being given in data attributes/JS configuration that isn't aligned with the actual language in use. Keeping it simple now, and only adding that complexity if users feel they need it, seems the better route to take.

My hypothesis is that doing it this way shouldn't add unnecessary code weight as we already include an Element.prototype.closest polyfill in Frontend, though that could always be wrong!

querkmachine · 2022-09-14T10:52:53Z

For the other questions I just noticed (cc: @romaricpascal)

a. What's the reason for rolling our own, vs. polyfilling? Was it to avoid the weight of another polyfill (15+KB for this one, with a few dependencies for it as well, judging by polyfill-library, not sure if included or not)?

Avoiding polyfills and their dependencies and simplicity of implementation. We don't actually need a lot of the PluralRules functionality at the moment, we only need to support positive integers (not negatives, decimals, and ranges as the spec supports).

b. Kind of tied to A.3. : The maps load all rules for all languages. Were there discussions for loading only what's needed that I may have missed?

I included a whole bunch of languages in the spike mainly for completion's sake (i.e. the work is already done and there if we need it). In practice, services are only provided in a few different languages—English, Welsh, and very occassionally Arabic—although GOV.UK content pages use many more. I imagined we would be able to cut down on the size of these if and when we had a better idea of what languages to prioritise.

We don't have any existing architecture for Frontend to conditionally load certain files, or having users load specific configuration files, which is why they're all bundled together for now. We've discussed using dynamic imports in the past, but this would be something we would introduce post-IE support, at which point we'd be removing the hardcoded plural rule maps anyway.

vanitabarrett added javascript localisation javascript squad labels Aug 22, 2022

vanitabarrett mentioned this issue Aug 22, 2022

Allow users to override hardcoded strings in our component JavaScript (internationalisation) #1708

Closed

26 tasks

vanitabarrett added this to Backlog 🗄 in Design System Sprint Board via automation Aug 22, 2022

kellylee-gds moved this from Backlog 🗄 to Sprint Backlog 🏃🏼‍♀️ in Design System Sprint Board Aug 24, 2022

36degrees added this to the [NEXT] milestone Sep 12, 2022

36degrees mentioned this issue Sep 13, 2022

Add ability to pass translation strings into character count component via data-attributes #2701

Closed

5 tasks

querkmachine self-assigned this Sep 14, 2022

querkmachine moved this from Sprint Backlog 🏃🏼‍♀️ to In progress 📝 in Design System Sprint Board Sep 14, 2022

querkmachine mentioned this issue Sep 15, 2022

Add pluralisation support to i18n functionality #2853

Merged

querkmachine moved this from In progress 📝 to Needs review 🔍 in Design System Sprint Board Sep 15, 2022

querkmachine closed this as completed in #2853 Sep 15, 2022

querkmachine moved this from Needs review 🔍 to Ready to release 🚀 in Design System Sprint Board Sep 15, 2022

36degrees mentioned this issue Sep 29, 2022

Allow CharacterCount component to receive i18n config via JS #2887

Merged

36degrees moved this from Ready to release 🚀 to Done 🏁 in Design System Sprint Board Nov 14, 2022

claireashworth mentioned this issue Feb 8, 2023

Supporting multiple languages alphagov/govuk-design-system-backlog#99

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand internationalisation logic to add pluralisation #2804

Expand internationalisation logic to add pluralisation #2804

vanitabarrett commented Aug 22, 2022 •

edited by querkmachine

Loading

romaricpascal commented Sep 12, 2022

romaricpascal commented Sep 13, 2022 •

edited

Loading

36degrees commented Sep 13, 2022 •

edited

Loading

romaricpascal commented Sep 13, 2022

36degrees commented Sep 13, 2022

querkmachine commented Sep 14, 2022

querkmachine commented Sep 14, 2022 •

edited

Loading

Expand internationalisation logic to add pluralisation #2804

Expand internationalisation logic to add pluralisation #2804

Comments

vanitabarrett commented Aug 22, 2022 • edited by querkmachine Loading

What

Why

Who needs to work on this

Who needs to review this

Done when

romaricpascal commented Sep 12, 2022

romaricpascal commented Sep 13, 2022 • edited Loading

36degrees commented Sep 13, 2022 • edited Loading

romaricpascal commented Sep 13, 2022

36degrees commented Sep 13, 2022

querkmachine commented Sep 14, 2022

querkmachine commented Sep 14, 2022 • edited Loading

vanitabarrett commented Aug 22, 2022 •

edited by querkmachine

Loading

romaricpascal commented Sep 13, 2022 •

edited

Loading

36degrees commented Sep 13, 2022 •

edited

Loading

querkmachine commented Sep 14, 2022 •

edited

Loading