Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n: Implement a solution for interpolation of React elements in strings #9846

Open
aduth opened this issue Sep 12, 2018 · 12 comments

Comments

Projects
None yet
8 participants
@aduth
Copy link
Member

commented Sep 12, 2018

Problem: Most any sprintf implementation, including that of @wordpress/i18n, is expected to return a string value. In a React application, this can be problematic if the string contains HTML markup, as the HTML will be escaped, at least without using dangerouslySetInnerHTML, which is discouraged. This can sometimes lead to developers implementing workarounds using concatenation of partial localized strings (e.g. #5767 (comment)), which is not advisable as it cannot be translated accurately in many languages. This is also noted in the i18n for WordPress Developers guide:

Use format strings instead of string concatenation—sprintf(__('Replace %1$s with %2$s'), $a, $b); is always better than __('Replace ').$a.__(' with ').$b; .

https://codex.wordpress.org/I18n_for_WordPress_Developers#Best_Practices

Task: Find a way to allow for React elements to be inserted within localized strings safely.

Prior Art:

Additional Considerations:

  • We still need to be able to extract the string which would be surfaced to translators, presumably including placeholders where a JSXExpressionContainer occurs
  • There should be a reasonably-ergonomic offering for those developers who choose not to use Babel tooling

Brainstorming:

It would be nice if a developer could simply wrap their string and elements in a component which performs the localization:

<Interpolate>
	Check out this link to my <a href={ url }>website</a>.
</Interpolate>

There are, of course, complications:

  • How do we extract the string?
    • Presumably it would take the form Check out this link to my <a href="%s">website</a>. We'd need to find a way to associate the JSX expression to the placeholder (and vice-versa, inject the variable value where the placeholder occurs when rendering into the UI).
  • Pluralization / Context
  • Nested custom component elements
    • For example, what if the website link was a <WebsiteLink /> element?
  • Encouraging best practices to avoid extracted strings being indecipherable

It might be simpler to associate placeholders via a known limited subset of expression values allowed. This could also be used for linting enforcement of best-practices:

<Interpolate>
	{ ( getPlaceholder ) => (
		<Fragment>Check out this link to my <a href={ getPlaceholder( '%s' ) }>website</a>.</Fragment>
	) }
</Interpolate>

Aside: It is be feasible for a custom Babel plugin to transform an element like as written in the first snippet (which has arguably nicer semantics) to this latter format.

To address pluralization, perhaps having a special child component type mapping to the index of the plural form corresponding to the count (an advantage here is supporting >2 plural forms (correction below)):

<Interpolate count={ 2 }>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>website</a>.
	</Interpolate.PluralForm>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>websites</a>.
	</Interpolate.PluralForm>
</Interpolate>

cc @mcsf @jorgefilipecosta @swissspidy

@mcsf

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2018

We've talked about this in private, but noting that auditing our current strings would help us prioritize this task.

@ockham

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2018

It would be nice if a developer could simply wrap their string and elements in a component which performs the localization:

<Interpolate>
	Check out this link to my <a href={ url }>website</a>.
</Interpolate>

Agree 👍

I've also been thinking about a syntax like this for a while. Another benefit is that we could connect an Interpolate component to React context (for on-the-fly locale switching at last). (This however raises a question about how to deal with strings outside components.)

There are, of course, complications:

  • How do we extract the string?

    • Presumably it would take the form Check out this link to my <a href="%s">website</a>. We'd need to find a way to associate the JSX expression to the placeholder (and vice-versa, inject the variable value where the placeholder occurs when rendering into the UI).
  • Pluralization / Context

  • Nested custom component elements

    • For example, what if the website link was a <WebsiteLink /> element?
  • Encouraging best practices to avoid extracted strings being indecipherable

Shouldn't a regular JSX parser (as used by Babel and friends) be helpful here (and thus be feasible by a Babel plugin)? In particular, the task at hand shares some traits with shallow rendering, as provided by Enzyme or react-test-renderer, doesn't it? Would parsing the "top level" (no components expansion) be sufficient here, followed by concatenation of all text nodes' strings, with placeholders inserted for non-text nodes?

It might be simpler to associate placeholders via a known limited subset of expression values allowed. This could also be used for linting enforcement of best-practices:

<Interpolate>
	{ ( getPlaceholder ) => (
		<Fragment>Check out this link to my <a href={ getPlaceholder( '%s' ) }>website</a>.</Fragment>
	) }
</Interpolate>

Aside: It is be feasible for a custom Babel plugin to transform an element like as written in the first snippet (which has arguably nicer semantics) to this latter format.

That's reassuring, since the render prop syntax isn't exactly nice 😆

To address pluralization, perhaps having a special child component type mapping to the index of the plural form corresponding to the count (an advantage here is supporting >2 plural forms):

<Interpolate count={ 2 }>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>website</a>.
	</Interpolate.PluralForm>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>websites</a>.
	</Interpolate.PluralForm>
</Interpolate>

Bit verbose 🙁 but yeah, might be the tersest we can do.

@nerrad

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2018

This is a bit confusing for me:

<Interpolate count={ 2 }>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>website</a>.
	</Interpolate.PluralForm>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>websites</a>.
	</Interpolate.PluralForm>
</Interpolate>

I'm likely just misunderstanding things, but something like this might be a bit clearer?

<Interpolate count={ 2 }>
	<Interpolate.SingularForm>
		Check out this link to my <a href={ url }>website</a>.
	</Interpolate.SingularForm>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>websites</a>.
	</Interpolate.PluralForm>
</Interpolate>
@aduth

This comment has been minimized.

Copy link
Member Author

commented Sep 18, 2018

@nerrad I think this may have been a misunderstanding on my part around >2 plural forms. I believe you're correct that, at least as how we express in English, there would be only the singular and plural form to include in the code (corresponding to msgid and msgid_plural in gettext).

References:

@swissspidy

This comment has been minimized.

Copy link
Member

commented Sep 18, 2018

Yeah just singular + plural in the code needed..

@nerrad

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2018

I think we should also shoot for a goal where there is no html in the translation string (how feasible that is I don't know yet). Otherwise the string translation is inherently fragile.

@aduth

This comment has been minimized.

Copy link
Member Author

commented Sep 18, 2018

The tools for including markup in localized string exist in WordPress PHP, so it's assumed there will be a similar expectation in JavaScript.

Our usage in Gutenberg thus far has been extremely limited, which is why I've personally temporarily shelved this as an issue.

As one data point, @blowery had done a brief audit of component interpolation in Calypso and had found at least 800+ uses in the codebase. Embedding a, em, and strong tags are the most common amongst these.

@iandunn

This comment has been minimized.

Copy link
Member

commented May 2, 2019

@coreymckrill

This comment has been minimized.

@tn3rb

This comment has been minimized.

Copy link

commented May 2, 2019

re: singular && plural

please keep in mind that some languages use different rules and forms for pluralization, such as:

  • a rule for 0 things
  • a rule for 1 thing
  • a rule for 2 things
  • a rule for 3 things
  • a rule for everything else

Here's a page that discusses this: https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals

An example from that list is for Icelandic and Macedonian where there is one rule for any number that ends in 1, except for 11 (1, 21, 31, 41, 51, 61...) and another rule for everything else (0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...).

So a simple singular vs plural system is not going to work.

Ultimately it would be better to simply use the count to pick the corresponding translation and then have a default to cover all other cases.

I can't give a good real world example because I'm a dumb one language kinda guy, so please just use your imagination and pretend that these were valid in something other than english

<Interpolate count={ 2 }>
	<PluralRule count={ 0 }>
		no things for you
	</PluralRule>
	<PluralRule count={ 1 }>
		there is one thing
	</PluralRule>
	<PluralRule count={ 2 }>
		(in Yoda voice) two things there are
	</PluralRule>
	<PluralRule>
		there are %s things
	</PluralRule>
</Interpolate>

For the Icelandic and Macedonian example from the MDN docs, the Interpolate component would have to utilize some kind of rule callback to decide what translation to return.

(EDIT)
my apologies, I missed that >2 plural forms had been mentioned

@swissspidy

This comment has been minimized.

Copy link
Member

commented May 2, 2019

@iandunn @coreymckrill Very interesting, thanks for sharing!

Sounds like we should be able to provide a function like arrayTokenReplace to developers, and do the tokenSplit part automatically behind the scenes. In this case, usage would not be much different from sprintf() (yay for developers) but still allow using HTML for placeholder substitution.

@aduth What do you think of that? Would this something that could be even added to Tannin directly?

@aduth

This comment has been minimized.

Copy link
Member Author

commented May 3, 2019

Incoming stream of consciousness:

I like that the approach is not specifically tailored to the React, but rather to the generic problem of implementing substitution of non-string values. It does impose some challenge on the implementer to handle an array of parts; it just-so-happens to work pretty well† out of the box as a React children value, though if the goal is to be more generic, it's not immediately obvious what this might look like in other settings.

† Pretty well: React may complain about missing keys, which might need to be included as part of the abstraction.

Sounds like we should be able to provide a function like arrayTokenReplace to developers, and do the tokenSplit part automatically behind the scenes. [...] Would this something that could be even added to Tannin directly?

I'm not clear what's proposed to be included where. Functionally this sounds similar to a utility function like sprintf, where the return value may be an array instead of a string. In which case, something like tokenSplit is a common lower-level function, one which may be reasonable to include in tannin, or certainly @wordpress/i18n.

At which point, you could have (pending beautification):

sprintf = ( str, ...args ) => tokenSplit( str ).reduce( ( result, part, i ) => result + ( i % 2 ? args[ ( i - 1 ) / 2 ] : part ), '' );
Interpolate = ( props ) => tokenSplit( Children.toArray( props.children ).map( ( child, i ) => typeof child === 'string' ? child : '%' + i + '$s' ).join( '' ) ).reduce( ( result, part, i ) => [ ...result, ( i % 2 ? cloneElement( Children.toArray( props.children )[ ( i - 1 ) / 2 ], { key: i } ) : part ) ], [] );

Two things I note:

  • It requires some abuse of types, likely in using "string" type for elements
  • While this satisfies the runtime implementation, it still requires enhancements to string extraction to handle "Interpolate".

At this point, I re-read your comment and perhaps you're suggesting that this be put more on the developer to handle? So in their own renders, e.g.

function Accusation( props ) {
	const { who, where, weapon } = props;

	return <div>{ eprintf( __( 'I accuse %1$s in the %2$s with the %3$s!' ), <em>{ who }</em>, <em>{ where }</em>, <em>{ weapon }</em> ) }</div>;
}

The benefit here being that no changes are required to string extraction.

For both this and Interpolate above, I'm not sure we're doing much to help solve the problem of allowing for markup in an extracted string. As in the original comment:

Check out this link to my <a href="%s">website</a>.

As considered, it only works if we substitute the entirety of the link:

Check out this link to my %1$s.

...but then we lose translation (or at least context of the translation of) "website".

While I'm having difficulty coming up with a solution, I could imagine one or a sum of parts of:

  • Allowing as part of considering placeholders or token splitting: Different "types" which cover the opening and closing of an element, where the React implementation would manage recombination into an element.
  • Construct the markup for the generated string at build time (e.g. with a Babel transform) to "trick" (read: be compatible with) extractors
    • e.g. __( 'Check out this link to my <a href="%s">website</a>.' ) would appear to exist in the built code (not the source code), and somehow at runtime it's instead considered more like as if it were 'Check out this link to my %1$s.' or, as in the example with Interpolate, more like a proper element <Fragment>Check out this link to my <a href={ url }>website</a></Fragment>.

I think there may be iterative improvements with element substitutions we could find with something like the utilities mentioned in #9846 (comment) without needing to solve all the problems, but I do think we should have some general idea of what a broader solution might look like to have a better sense of how it all fits together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.