i18n: allow strings with duplicate message and descriptions #12723

brendankenny · 2021-06-29T23:40:40Z

I started looking into fatal collect-strings collisions after https://github.com/GoogleChrome/lighthouse/pull/12697/files#r658831618 when it became clear that that lighthouse-core/audits/accessibility/aria-progressbar-name.js | description has been accidentally colliding with lighthouse-core/audits/accessibility/aria-treeitem-name.js | description for quite some time and we (and TC) haven't noticed.

It seems like something has changed with collisions. At least from what I can understand from inspecting some of the intermediate files in the pipeline, internally those two strings got deduped into a single message for translation (they share an ID which is hashed from the message and meaning), then properly duplicated again when dumped into our LHL format. Basically exactly what you'd hope for. From docs I've looked at now, it's actually encouraged that we allow colliding strings and only add a meaning if we really want the strings to be translated separately, so it's possible this was a bug in the handler that ingested our CTC format that was incidentally fixed in the last few years.

To test this, I took four currently colliding accessibility messages and made their descriptions the same as well so they'd be full collisions. They appear to have gone through a full TC roundtrip once, but I'm running again just to be sure :)

If this all looks good, we can probably follow up with setting a lot of the current collision list be full duplicates (no reason each stack pack needs "WordPress"/"Drupal"/"Joomla" in their descriptions when the string is just about using "HTML5 video"). Sometimes an explicit approach like @adamraine's in #12714 (comment) will make more sense when a lot of strings are explicitly being shared between files, but it's also nice to know that strings that collide by happenstance are no big deal.

brendankenny · 2021-06-29T23:43:01Z

lighthouse-core/scripts/i18n/collect-strings.js

@@ -598,54 +598,43 @@ function writeStringsToCtcFiles(locale, strings) {
 }

 /**
- * This function does three things:
+ * This function does two things:


sorry @patrickhulce, this would have been nice to know before your changes. And I've bungled up your nice cascade of checks a bit :)

Haha, no worries. It's still more straightforward to read IMO so still a win :)

brendankenny · 2021-06-29T23:47:24Z

lighthouse-core/scripts/i18n/collect-strings.js

-      throw new Error(`Each strings' \`message\` or \`description\` must be different for the translation pipeline. The following keys did not have unique \`meaning\` values:\n\n${debugCollisionsMessage}`);
+    // We have duplicate messages with different descriptions. Disambiguate using `meaning` for TC.
+    for (const ctc of messageGroup) {
+      ctc.meaning = ctc.description;


There's a few ways to do this. For instance, if this messageGroup had three strings with one description and two strings with a different description, the three-string set could get no meaning while the two-string set did get a meaning.

However since this is currently a hypothetical situation and meaning is hashed into the ID (not explicitly included in the messages), it doesn't matter a whole lot and assigning a meaning to any messageGroup with multiple descriptions is a lot simpler.

brendankenny · 2021-06-29T23:48:27Z

lighthouse-core/scripts/i18n/collect-strings.js

  }

-  // We survived fatal collisions, now check that the known collisions match our known list.
-  const collidingMessages = allCollisions.map(collision => collision[1].message).sort();
+  // Check that the known collisions match our known list.


happy to bikeshed on "collisions" since it feels not as descriptive now

brendankenny · 2021-06-30T00:21:11Z

Second extract/dump was successful, so I think we're good here. I do have a lingering fear I'm missing something (some placeholder business or something? Feels like that should be handled with @example, though...)

brendankenny · 2021-07-02T03:07:19Z

thanks!

i18n: allow duplicate message/description pairs

dd7e5b6

brendankenny requested a review from a team as a code owner June 29, 2021 23:40

brendankenny requested review from connorjclark and removed request for a team June 29, 2021 23:40

google-cla bot added the cla: yes label Jun 29, 2021

devtools-bot assigned connorjclark Jun 29, 2021

devtools-bot added the waiting4reviewer label Jun 29, 2021

brendankenny commented Jun 29, 2021

View reviewed changes

brendankenny mentioned this pull request Jul 2, 2021

scripts(i18n): support es modules in collect-strings #12741

Merged

connorjclark approved these changes Jul 2, 2021

View reviewed changes

brendankenny added the land-when-ci-is-green label Jul 2, 2021

devtools-bot merged commit 776fc93 into master Jul 2, 2021

devtools-bot deleted the i18n-collisions branch July 2, 2021 03:09

devtools-bot removed the land-when-ci-is-green label Jul 2, 2021

brendankenny mentioned this pull request Oct 5, 2022

i18n: handle string placeholder collisions #14432

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i18n: allow strings with duplicate message and descriptions #12723

i18n: allow strings with duplicate message and descriptions #12723

brendankenny commented Jun 29, 2021

brendankenny Jun 29, 2021

patrickhulce Jun 30, 2021

brendankenny Jun 29, 2021

brendankenny Jun 29, 2021

brendankenny commented Jun 30, 2021

brendankenny commented Jul 2, 2021

i18n: allow strings with duplicate message and descriptions #12723

i18n: allow strings with duplicate message and descriptions #12723

Conversation

brendankenny commented Jun 29, 2021

brendankenny Jun 29, 2021

Choose a reason for hiding this comment

patrickhulce Jun 30, 2021

Choose a reason for hiding this comment

brendankenny Jun 29, 2021

Choose a reason for hiding this comment

brendankenny Jun 29, 2021

Choose a reason for hiding this comment

brendankenny commented Jun 30, 2021

brendankenny commented Jul 2, 2021