Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n: allow strings with duplicate message and descriptions #12723

Merged
merged 1 commit into from
Jul 2, 2021

Conversation

brendankenny
Copy link
Member

I started looking into fatal collect-strings collisions after https://github.com/GoogleChrome/lighthouse/pull/12697/files#r658831618 when it became clear that that lighthouse-core/audits/accessibility/aria-progressbar-name.js | description has been accidentally colliding with lighthouse-core/audits/accessibility/aria-treeitem-name.js | description for quite some time and we (and TC) haven't noticed.

It seems like something has changed with collisions. At least from what I can understand from inspecting some of the intermediate files in the pipeline, internally those two strings got deduped into a single message for translation (they share an ID which is hashed from the message and meaning), then properly duplicated again when dumped into our LHL format. Basically exactly what you'd hope for. From docs I've looked at now, it's actually encouraged that we allow colliding strings and only add a meaning if we really want the strings to be translated separately, so it's possible this was a bug in the handler that ingested our CTC format that was incidentally fixed in the last few years.

To test this, I took four currently colliding accessibility messages and made their descriptions the same as well so they'd be full collisions. They appear to have gone through a full TC roundtrip once, but I'm running again just to be sure :)

If this all looks good, we can probably follow up with setting a lot of the current collision list be full duplicates (no reason each stack pack needs "WordPress"/"Drupal"/"Joomla" in their descriptions when the string is just about using "HTML5 video"). Sometimes an explicit approach like @adamraine's in #12714 (comment) will make more sense when a lot of strings are explicitly being shared between files, but it's also nice to know that strings that collide by happenstance are no big deal.

@brendankenny brendankenny requested a review from a team as a code owner June 29, 2021 23:40
@brendankenny brendankenny requested review from connorjclark and removed request for a team June 29, 2021 23:40
@google-cla google-cla bot added the cla: yes label Jun 29, 2021
@@ -598,54 +598,43 @@ function writeStringsToCtcFiles(locale, strings) {
}

/**
* This function does three things:
* This function does two things:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry @patrickhulce, this would have been nice to know before your changes. And I've bungled up your nice cascade of checks a bit :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, no worries. It's still more straightforward to read IMO so still a win :)

throw new Error(`Each strings' \`message\` or \`description\` must be different for the translation pipeline. The following keys did not have unique \`meaning\` values:\n\n${debugCollisionsMessage}`);
// We have duplicate messages with different descriptions. Disambiguate using `meaning` for TC.
for (const ctc of messageGroup) {
ctc.meaning = ctc.description;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few ways to do this. For instance, if this messageGroup had three strings with one description and two strings with a different description, the three-string set could get no meaning while the two-string set did get a meaning.

However since this is currently a hypothetical situation and meaning is hashed into the ID (not explicitly included in the messages), it doesn't matter a whole lot and assigning a meaning to any messageGroup with multiple descriptions is a lot simpler.

}

// We survived fatal collisions, now check that the known collisions match our known list.
const collidingMessages = allCollisions.map(collision => collision[1].message).sort();
// Check that the known collisions match our known list.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to bikeshed on "collisions" since it feels not as descriptive now

@brendankenny
Copy link
Member Author

Second extract/dump was successful, so I think we're good here. I do have a lingering fear I'm missing something (some placeholder business or something? Feels like that should be handled with @example, though...)

@brendankenny
Copy link
Member Author

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants