Extract and translate placeholder texts #36983

hacodeorg · 2020-09-30T03:07:17Z

Task FND-1209:
This is the first step to enable translation of placeholder texts. We extract placeholder texts from .level files during the i18n sync-in step.

Since placeholder texts can be empty strings, binary numbers or just several question marks, we require the strings to have at least 3 consecutive alphabetic characters.

Example

A puzzle with placeholder texts https://studio.code.org/s/coursee-2020/stage/9/puzzle/1:

Those texts are defined in a .level file:

code-dot-org/dashboard/config/scripts/levels/courseE_aboutme_1_2020.level

Lines 129 to 131 in 8e8171c

    
           <block type="text"> 
        
             <title name="TEXT">That's me! Rikki! I like to code, hangout with Thuy, and eat ice cream!</title> 
        
           </block>

After sync-in step, placeholder texts are extracted to i18n/locales/source/course_content/2020/coursee-2020.json:

{
  "https://studio.code.org/s/coursee-2020/stage/9/puzzle/1": {
    "placeholder_texts": {
      "b63151482630edd1589a9ac24d107c49": "That's me! Rikki! I like to code, hangout with Thuy, and eat ice cream!",
      "810896b14fc6615f0c76628b9b6a727e": "That's my best friend Thuy! She's really good at sports!",
      "5d48ecdd8e3baeb99c6a8dcb7faa13dd": "Ice cream is my favorite treat! But I probably shouldn't eat it on the couch...",
      "a50ec512bdd98483a7cdcd88cbe11933": "That's my pet rabbit, Ms. Lolipop! I have no idea why I named her that!",
      "6f97975a0bda0c9219915161149cbb26": "That's my computer! I code on it ALL the time!",
      "1c6636ce086fef9305c6e3d25e37d5b2": "Here's a secret: Thuy is extremely ticklish!",
      "fce3b93eaa64a8869c47ba25daa8887f": "Yummy!",
      "41a18649f969a00ec0b2feba20db997f": "I think I like this color better on you, Ms. Lolipop!",
      "756afb811df2eb3ec88a9e98f3dcaa8d": "This computer can't handle my mad coding skills!"
    }
  }
}

After sync-down step, translations for placeholder texts are downloaded to i18n/locales/<locale>/course_content/2020/coursee-2020.json.

{
  "https://studio.code.org/s/coursee-2020/stage/9/puzzle/1": {
    "placeholder_texts": {
      "b63151482630edd1589a9ac24d107c49": "toi la rikki",
      "810896b14fc6615f0c76628b9b6a727e": "day la ban Thuy!",
      "5d48ecdd8e3baeb99c6a8dcb7faa13dd": "kem rat ngon....",
      "a50ec512bdd98483a7cdcd88cbe11933": "day la tho!",
      "6f97975a0bda0c9219915161149cbb26": "day la may tinh!",
      "1c6636ce086fef9305c6e3d25e37d5b2": "day la bi mat",
      "fce3b93eaa64a8869c47ba25daa8887f": "ngon!",
      "41a18649f969a00ec0b2feba20db997f": "mau nay cool!",
      "756afb811df2eb3ec88a9e98f3dcaa8d": "toi qua gioi"
    }
  }
}

After sync-out step, the translations are distributed to dashboard/config/locales/placeholder_texts.<locale>.json.
Example of dashboard/config/locales/placeholder_texts.vi-VN.json:

{
  "vi-VN": {
    "data": {
      "placeholder_texts": {
        "courseE_aboutme_1_2020": {
          "b63151482630edd1589a9ac24d107c49": "toi la rikki",
          "810896b14fc6615f0c76628b9b6a727e": "day la ban Thuy!",
          "5d48ecdd8e3baeb99c6a8dcb7faa13dd": "kem rat ngon....",
          "a50ec512bdd98483a7cdcd88cbe11933": "day la tho!",
          "6f97975a0bda0c9219915161149cbb26": "day la may tinh!",
          "1c6636ce086fef9305c6e3d25e37d5b2": "day la bi mat",
          "fce3b93eaa64a8869c47ba25daa8887f": "ngon!",
          "41a18649f969a00ec0b2feba20db997f": "mau nay cool!",
          "756afb811df2eb3ec88a9e98f3dcaa8d": "toi qua gioi"
        }
      }
    }
  }
}

Rendering the translations:

English	Vietnamese

Testing story

Run bin/i18n/sync-in.rb to extract placeholder strings from dashboard/config/scripts/levels/courseE_aboutme_1_2020.level file.
Manually create a sync-down output at i18n/locales/vi-VN/course_content/2020/coursee-2020.json.
Manually create /tmp/codeorg_changes.json, /tmp/codeorg-markdown_changes.json, /tmp/hour-of-code_changes.json with content.
Run bin/i18n/sync-out.rb to distribute translations to dashboard/config/locales/placeholder_texts.vi-VN.json.
Go to http://localhost-studio.code.org:3000/s/coursee-2020/stage/9/puzzle/1/lang/vi to see the translations in Vietnamese.

Reviewer Checklist:

Tests provide adequate coverage
Privacy and Security impacts have been assessed
Code is well-commented
New features are translatable or updates will not break translations
Relevant documentation has been added or updated
User impact is well-understood and desirable
Pull Request is labeled appropriately
Follow-up work items (including potential tech debt) are tracked and linked

Hamms · 2020-09-30T17:56:52Z

bin/i18n/sync-in.rb

+        next unless text_title&.content =~ /[a-zA-Z]{3,}/
+
+        # Use only alphanumeric characters in lower cases as string key
+        text_key = text_title.content.gsub(/[^a-zA-Z0-9_ ]/, '').split.join('_').downcase


I don't love the idea of inferring an identifier from the content of the string. When we've done things like this in the past, it ends up causing problems when the content changes and strings unexpectedly go missing, or when similar content is used in multiple places and the mapping ends up being non-unique.

Is there anything else we could use as a unique identifier here?

How about using a MD5 hash? It will keep an 1:1 relationship between an ID and a string.
Another option is to use a combination of script id, level id and string position, such as script_11_level_399_str_1.
Did we use any of the above options in the past?

We use string contents as IDs for function_definitions and behavior_names, is that because those strings are usually short and contain only alphabetic characters?

code-dot-org/bin/i18n/sync-in.rb

Line 91 in 49a8ed6

i18n_strings['function_definitions'][name.content] = function_definition

code-dot-org/bin/i18n/sync-in.rb

Line 99 in 49a8ed6

i18n_strings['behavior_names'][name.content] = name.content if name

It's because we were also unable to find a better option there. 🙃 Like I said, we've done this in the past but it's ended up being more fragile than we'd like.

An MD5 hash does address the issues of potential collisions, but we're still ending up with an identifier that's dependent on the content, rather than an identifier that can consistently identify content as it changes. That might be too much to ask for, though.

I'd love to see at least a mockup of the other end of this functionality; the code that's responsible for finding a translation given a block. I think that'll give us a better sense of which direction is best to go here.

The sync-out and rendering pieces of this functionality is shorter than I thought so I add them to this PR.

The rendering piece still uses string content as ID for now, just so we can verify it can render translations correctly.

hacodeorg · 2020-10-05T18:02:54Z

Elijah and I discussed this PR further on Slack and decided to go with a MD5-key solution for now. We will explore a generalizable way to easily add unique, reproducible identifiers to XML (in this case .level file).

Hamms

I'd love to see a test here! Specifically a test of the localized_text_blocks functionality in https://github.com/code-dot-org/code-dot-org/blob/staging/dashboard/test/models/blockly_test.rb

Otherwise, this looks great! Thanks for taking the time to dig into some options here

…fier.

hacodeorg · 2020-10-07T09:54:34Z

I'd love to see a test here! Specifically a test of the localized_text_blocks functionality in https://github.com/code-dot-org/code-dot-org/blob/staging/dashboard/test/models/blockly_test.rb

Otherwise, this looks great! Thanks for taking the time to dig into some options here

Thank you for the pointer. Test added.

hacodeorg requested review from Hamms and a team September 30, 2020 03:13

hacodeorg marked this pull request as ready for review September 30, 2020 04:12

Hamms reviewed Sep 30, 2020

View reviewed changes

hacodeorg changed the title ~~Extract placeholder texts from .level files~~ Extract and translate placeholder texts Oct 2, 2020

Hamms approved these changes Oct 6, 2020

View reviewed changes

hacodeorg added 4 commits October 7, 2020 12:15

Extract placeholder texts from level XML files

9048910

Use placeholder text translations

1becb7f

Use a MD5 hash of a placeholder string to be the string unique identi…

12fbb93

…fier.

Add test for localized_text_blocks

0124af5

hacodeorg force-pushed the ha/placeholder-text-sync-in branch from a06ae48 to 0124af5 Compare October 7, 2020 09:50

hacodeorg merged commit 8068b04 into staging Oct 7, 2020

hacodeorg deleted the ha/placeholder-text-sync-in branch October 7, 2020 17:10

hacodeorg mentioned this pull request Jan 18, 2021

Fix placeholder text translation #38618

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract and translate placeholder texts #36983

Extract and translate placeholder texts #36983

hacodeorg commented Sep 30, 2020 •

edited

Hamms Sep 30, 2020

hacodeorg Oct 1, 2020

hacodeorg Oct 1, 2020

Hamms Oct 1, 2020

hacodeorg Oct 2, 2020

hacodeorg commented Oct 5, 2020

Hamms left a comment

hacodeorg commented Oct 7, 2020

	<block type="text">
	<title name="TEXT">That's me! Rikki! I like to code, hangout with Thuy, and eat ice cream!</title>
	</block>

Extract and translate placeholder texts #36983

Extract and translate placeholder texts #36983

Conversation

hacodeorg commented Sep 30, 2020 • edited

Example

Testing story

Reviewer Checklist:

Hamms Sep 30, 2020

Choose a reason for hiding this comment

hacodeorg Oct 1, 2020

Choose a reason for hiding this comment

hacodeorg Oct 1, 2020

Choose a reason for hiding this comment

Hamms Oct 1, 2020

Choose a reason for hiding this comment

hacodeorg Oct 2, 2020

Choose a reason for hiding this comment

hacodeorg commented Oct 5, 2020

Hamms left a comment

Choose a reason for hiding this comment

hacodeorg commented Oct 7, 2020

hacodeorg commented Sep 30, 2020 •

edited