Sync i18n without merge #23706

Hamms · 2018-07-13T22:49:58Z

Context

We have a desire in our system to ensure that when a source string is updated, we don't lose translation data for existing strings. Since most source string changes are minor tweaks like typo fixes or curriculum adjustments, this allows us to make those changes without a cost in translation saturation.

However! We are unfortunately doing this work twice; once as part of the sync up to Crowdin, which we do with the update_as_unapproved option (which just switches the translations from "approved" to "unapproved" when we update the source string, but doesn't actually discard them) and again during the sync out from Crowdin, where we use merge-translation.rb, in which we decline to update a given string if the translation data received from Crowdin is identical to the English source string.

Unfortunately, what this means is that when we update English source strings, any languages for which that string has not been translated will retain the old source string. This leads to quite a bit of badness, some of which is significant:

diff --git a/pegasus/cache/i18n/ar-SA.yml b/pegasus/cache/i18n/ar-SA.yml
index a0456a6a301..2d671ac8429 100644
--- a/pegasus/cache/i18n/ar-SA.yml
+++ b/pegasus/cache/i18n/ar-SA.yml
@@ -44,7 +44,7 @@ ar-SA:
   csedweek_og_description: الساعة البرنامجية هي حركة عالمية تصل الى عشرات الملايين
     من الطلاب على كافة الاعمار في اكثر من مءة <U+200F>وثمانين لغة وفِي كثر من خمسة وأربعين
     دولة.
-  csedweek_og_description_soon: The Hour of Code is coming December 4-10, 2017. Computer
+  csedweek_og_description_soon: The Hour of Code is coming December 3-9, 2018. Computer
     science is foundational for every student to learn.
   csedweek_og_description_here: The Hour of Code is here. Computer science is foundational
     for every student to learn.

Unfortunately, there are also a handful of places where we have translation data that only exists in these files:

diff --git a/pegasus/cache/i18n/ar-SA.yml b/pegasus/cache/i18n/ar-SA.yml
index a0456a6a301..2d671ac8429 100644
--- a/pegasus/cache/i18n/ar-SA.yml
+++ b/pegasus/cache/i18n/ar-SA.yml
@@ -673,8 +668,8 @@ ar-SA:
   volunteer_engineer_submission_field_email_placeholder: البريد الإلكتروني
   volunteer_engineer_submission_field_email_preference: هل يمكننا إرسال بريد إلكتروني
     إليك حول تحديثات دوراتنا أو الفرص المحلية أو أخبار علوم الكمبيوتر الأخرى؟
-  volunteer_engineer_submission_field_email_preference_privacy: "(لمزيد من التفاصيل
-    انظر سياسة الخصوصية الخاصة بنا)"
+  volunteer_engineer_submission_field_email_preference_privacy: "(See our privacy
+    policy)"
   volunteer_engineer_submission_field_email_preference_yes: نعم
   volunteer_engineer_submission_field_email_preference_no: لا
   volunteer_engineer_submission_final_paragraph: سيقع إدراجكم بالخريطة لتمكين المعلمين

Which means that by merging this, we will indeed lose some translation data.

Process

These changes were originally generated by running a sync with the following code change:

diff --git a/bin/i18n-codeorg/lib/merge-translation.rb b/bin/i18n-codeorg/lib/merge-translation.rb
index d87f13f0274..21089e96db4 100644
--- a/bin/i18n-codeorg/lib/merge-translation.rb
+++ b/bin/i18n-codeorg/lib/merge-translation.rb
@@ -14,10 +14,10 @@ def merge_translation_tree(en_translation, new_translation, prev_translation)
   if !new_translation.is_a?(Hash)
     # Leaf node, a translation.
     # New translation equals to English, and old translation already exists
-    if !prev_translation.nil? && new_translation == en_translation &&
-       new_translation != prev_translation
-      new_translation = prev_translation
-    end
+    #if !prev_translation.nil? && new_translation == en_translation &&
+    #   new_translation != prev_translation
+    #  new_translation = prev_translation
+    #end
     # Crowdin escapes carraige returns, so restore them here
     if new_translation.is_a?(String)
       new_translation = new_translation.gsub(/\\r/, "\r")

Changes were then added in commits that break everything up by content type. When reviewing this (massive) PR, I recommend checking commit-by-commit. The Instructions and Markdown Instructions commits are likely going to be too big to even view in github, but the rest of them should give you at least an overview of what kind of changes are going to happen.

I manually reviewed changes in a handful of files and documented my findings here. I was able to identify a couple of instances in which we loaded in translations directly into the project, and we're working on making sure all the data from those changes gets into Crowdin.

At Tanya's suggestion, I also specifically examined the Minecraft tutorials and /learn in our top ten languages. I found a total of two translations (one in Italian, one in German) which would be lost by these changes, but no more (also added those strings to Crowdin).

Conclusion

This PR is going to fix a lot of bad english strings in other locales throughout our system, and also cause us to lose some translation data. But the ratio of bad to good is strongly in favor of good, and with our spot-checks of critical/risky content, I'm convinced that we are unlikely to lose any high-profile translation data.

Next steps

Remove the merge-translations tool entirely from our merge
Update Crowdin settings so we only bother to sync-out translated strings, not English.

Hamms · 2018-07-14T00:50:19Z

Also going to spot-check the course 1-4 translations before merging

Hamms · 2018-07-17T00:46:01Z

single source of truth for translations

Hamms · 2018-07-17T01:05:45Z

figure out what's up with the outdated apps translations

Hamms · 2018-07-17T01:08:59Z

Investigate: Some strings are not appearing in crowdin

Hamms added 30 commits July 10, 2018 14:39

pegasus i18n updates

8d4c4a5

dashboard i18n updates - Hebrew

cc3fed4

dashboard i18n updates - Italian

c6d4fe4

dashboard i18n updates - Japanese

22a9a55

dashboard i18n updates - Portuguese, Brazilian

b52999b

dashboard i18n updates - Slovak

8c45bdc

dashboard i18n updates - Chinese Simplified

0b027bf

apps i18n updates

bfc3c44

pegasus sync out without merge

aa4be3b

dashboard unplugged sync out without merge

d3ad0ea

dashboard slides sync out without merge

1b36365

dashboard scripts sync out without merge

1d16391

dashboard markdown instructions sync out without merge

679cac1

dashboard instructions sync out without merge

e2db7a8

dashboard failure message overrides sync out without merge

c7520b7

dashboard dsls sync out without merge

3a1b278

dashboard devise sync out without merge

7ee5d28

dashboard data sync out without merge

e09258c

dashboard courses sync out without merge

cdba1e9

dashboard callouts sync out without merge

023b1a4

dashboard authored hints sync out without merge

ba8a9f8

dashboard base sync out without merge

2b26363

apps tutorial explorer sync out without merge

6fb4d54

apps artist sync out without merge

008d1f8

apps playlab sync out without merge

c1a5839

apps maze sync out without merge

4231dfe

apps jigsaw sync out without merge

aab3ff2

apps flappy sync out without merge

5399ac8

apps craft sync out without merge

2b29713

apps common sync out without merge

7c2a250

apps bounce sync out without merge

a8a4c60

Hamms requested a review from tanyaparker July 14, 2018 00:49

tanyaparker approved these changes Jul 16, 2018

View reviewed changes

Hamms merged commit 8ee3ce3 into staging Jul 16, 2018

Hamms deleted the test-no-merge-4 branch July 16, 2018 22:18

Hamms mentioned this pull request Jul 16, 2018

Remove merge operation from i18n sync #23730

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync i18n without merge #23706

Sync i18n without merge #23706

Hamms commented Jul 13, 2018 •

edited

Hamms commented Jul 14, 2018

Hamms commented Jul 17, 2018

Hamms commented Jul 17, 2018

Hamms commented Jul 17, 2018

Sync i18n without merge #23706

Sync i18n without merge #23706

Conversation

Hamms commented Jul 13, 2018 • edited

Context

Process

Conclusion

Next steps

Hamms commented Jul 14, 2018

Hamms commented Jul 17, 2018

Hamms commented Jul 17, 2018

Hamms commented Jul 17, 2018

Hamms commented Jul 13, 2018 •

edited