-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract and translate placeholder texts #36983
Conversation
bin/i18n/sync-in.rb
Outdated
next unless text_title&.content =~ /[a-zA-Z]{3,}/ | ||
|
||
# Use only alphanumeric characters in lower cases as string key | ||
text_key = text_title.content.gsub(/[^a-zA-Z0-9_ ]/, '').split.join('_').downcase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the idea of inferring an identifier from the content of the string. When we've done things like this in the past, it ends up causing problems when the content changes and strings unexpectedly go missing, or when similar content is used in multiple places and the mapping ends up being non-unique.
Is there anything else we could use as a unique identifier here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a MD5 hash? It will keep an 1:1 relationship between an ID and a string.
Another option is to use a combination of script id, level id and string position, such as script_11_level_399_str_1
.
Did we use any of the above options in the past?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use string contents as IDs for function_definitions
and behavior_names
, is that because those strings are usually short and contain only alphabetic characters?
code-dot-org/bin/i18n/sync-in.rb
Line 91 in 49a8ed6
i18n_strings['function_definitions'][name.content] = function_definition |
code-dot-org/bin/i18n/sync-in.rb
Line 99 in 49a8ed6
i18n_strings['behavior_names'][name.content] = name.content if name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because we were also unable to find a better option there. 🙃 Like I said, we've done this in the past but it's ended up being more fragile than we'd like.
An MD5 hash does address the issues of potential collisions, but we're still ending up with an identifier that's dependent on the content, rather than an identifier that can consistently identify content as it changes. That might be too much to ask for, though.
I'd love to see at least a mockup of the other end of this functionality; the code that's responsible for finding a translation given a block. I think that'll give us a better sense of which direction is best to go here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sync-out and rendering pieces of this functionality is shorter than I thought so I add them to this PR.
The rendering piece still uses string content as ID for now, just so we can verify it can render translations correctly.
Elijah and I discussed this PR further on Slack and decided to go with a MD5-key solution for now. We will explore a generalizable way to easily add unique, reproducible identifiers to XML (in this case .level file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see a test here! Specifically a test of the localized_text_blocks
functionality in https://github.com/code-dot-org/code-dot-org/blob/staging/dashboard/test/models/blockly_test.rb
Otherwise, this looks great! Thanks for taking the time to dig into some options here
a06ae48
to
0124af5
Compare
Thank you for the pointer. Test added. |
Task FND-1209:
This is the first step to enable translation of placeholder texts. We extract placeholder texts from
.level
files during the i18nsync-in
step.Since placeholder texts can be empty strings, binary numbers or just several question marks, we require the strings to have at least 3 consecutive alphabetic characters.
Example
A puzzle with placeholder texts https://studio.code.org/s/coursee-2020/stage/9/puzzle/1:
Those texts are defined in a .level file:
code-dot-org/dashboard/config/scripts/levels/courseE_aboutme_1_2020.level
Lines 129 to 131 in 8e8171c
After
sync-in
step, placeholder texts are extracted toi18n/locales/source/course_content/2020/coursee-2020.json
:After
sync-down
step, translations for placeholder texts are downloaded toi18n/locales/<locale>/course_content/2020/coursee-2020.json
.After
sync-out
step, the translations are distributed todashboard/config/locales/placeholder_texts.<locale>.json
.Example of
dashboard/config/locales/placeholder_texts.vi-VN.json
:Rendering the translations:
Testing story
bin/i18n/sync-in.rb
to extract placeholder strings from dashboard/config/scripts/levels/courseE_aboutme_1_2020.level file./tmp/codeorg_changes.json
,/tmp/codeorg-markdown_changes.json
,/tmp/hour-of-code_changes.json
with content.bin/i18n/sync-out.rb
to distribute translations to dashboard/config/locales/placeholder_texts.vi-VN.json.Reviewer Checklist: