Improve translatability of template and template part rich text HTML #9

bobbingwide · 2020-12-06T17:52:10Z

In issue #7 I demonstrated that it's possible to extract text from the Full Site Editing template and template part .html files. But we noted that this solution suffered from the same problem as for PHP code in that translators didn't get full sentences to translate.

<p>Thou hasn't seen <br> nothing yet</p>

I wrote about the challenges of translating rich text content in Localization of Full Site Editing themes.

Now I want to see if I can implement some of the proposals in that post.

The automatic translations to UK English ( en_GB ) and bbboing ( bb_BB ) work because the translation process doesn’t attempt to make any sense of the content to be translated.

In my opinion, before we can finalize any solution for localizing the HTML we’ll have to agree some ground rules for internationalizing and extraction. The main problem areas that I have considered are:

Sentences broken by inner tags.
Assumptions associated with respecting leading and trailing blanks.
HTML tag’s attributes.
Text that should not be translated.
Gutenberg block’s text attributes that should be translated
Providing contextual help.

Items 4., 5., and 6. can be supported by providing special tools in the block editor.

Multi Lingual Support is Phase 4 of Gutenberg, so we can't realistically expect Gutenberg to provide an environment that can be used by translators in the short term. The best we can do is to improve the extraction, translation and localization processes. giving translators the opportunity to alter markup when it makes sense to do so.

Note: Google’s automatic web page translator handles inner tags. It may not produce the best translation, but it certainly is easy to use. If we extract the translatable text in sensible sized chunks we could easily make use of Google's translation service to give the human translators a head start.

Requirements

Extract rich text to retain as much context as possible.
Allow translators a certain amount of free rein with regards to the sequence of nested HTML tags.
Automatically apply the translations to produce the locale specific versions of each template and template part.
Do not depend on logic to respect whitespace in the original text.
No need to prevent the translator from seeing text marked as translate="no".
Do prevent the translator from translating Gutenberg block attributes marked as non-translatable.

Optionally,

Support automatic translation of untranslated text using Google's Cloud translation service.

The text was updated successfully, but these errors were encountered:

bobbingwide · 2020-12-06T18:38:28Z

Proposal for extracting rich text.

- for each outer rich text tag found
   if it has inner tags 
      extract text using the rich text route
  else 
      extract translatable attributes and inner tags recursively (current solution )

rich text route - extract

-  copy tag and inner tags to new DOMdocument
-  save as HTML
-  strip outer tag ( and attributes )
-  add as the string to be translated

Proposal for localization

- for each outer rich text  tag 
- if it has inner tags
     apply translations using the rich text route
  else
    apply as per current solution

rich text route - localize

- convert translation to DOMdocument
- replace existing inner nodes with translated content

Q. Should we use the rich text route for each translation?

…n the update routine.

…t a single #text child as rich text.

bobbingwide · 2020-12-09T09:46:31Z

Couple of things to fix.

Need to add strong to the list of acceptable rich text tags.
Need to trim strings to be translated.
Need to remove carriage returns ( \r) and line feeds (\n ) from strings to be translated.
( the biggy ) translation of rich text in list items was stopping after one item with rich text was translated.

The last problem was satisfied by changing the for loop.
From

foreach ( $node->childNodes as $child_node ) {
   ...
   $this->extract_strings( $child_node );
}

To

for ( $currentNode = 0; $currentNode < $node->childNodes->length; $currentNode++ ) {
   ...
   $this->extract_strings( $node->childNodes[$currentNode] );
}

It seems that the replaceChild method performed in DOM_string_updater:::replace_node() messed up the current position in the foreach loop.

bobbingwide added the enhancement label Dec 6, 2020

bobbingwide self-assigned this Dec 6, 2020

bobbingwide added a commit that referenced this issue Dec 8, 2020

Issue #9 - improve translatability - extract rich text strings

697cb36

bobbingwide added a commit that referenced this issue Dec 8, 2020

Issue #9 - refactor extractRichText() to use add_rich_text_string() i…

a938f6b

…n the update routine.

bobbingwide added a commit that referenced this issue Dec 8, 2020

Issue #9 strip new line characters from rich text strings. Don't trea…

0d8ce7e

…t a single #text child as rich text.

bobbingwide mentioned this issue Dec 8, 2020

Create localized versions of the templates and template parts bobbingwide/fizzie#46

Open

5 tasks

bobbingwide added a commit that referenced this issue Dec 9, 2020

Issue #9 - ensure more rich text strings are translated

e5f4391

bobbingwide mentioned this issue Dec 18, 2020

Update oik translatable strings to cater for translations not respecting leading or trailing blanks bobbingwide/oik#171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve translatability of template and template part rich text HTML #9

Improve translatability of template and template part rich text HTML #9

bobbingwide commented Dec 6, 2020 •

edited

Loading

bobbingwide commented Dec 6, 2020

bobbingwide commented Dec 9, 2020

Improve translatability of template and template part rich text HTML #9

Improve translatability of template and template part rich text HTML #9

Comments

bobbingwide commented Dec 6, 2020 • edited Loading

Requirements

bobbingwide commented Dec 6, 2020

bobbingwide commented Dec 9, 2020

bobbingwide commented Dec 6, 2020 •

edited

Loading