Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync i18n without merge #23706

Merged
merged 31 commits into from Jul 16, 2018
Merged

Sync i18n without merge #23706

merged 31 commits into from Jul 16, 2018

Conversation

Hamms
Copy link
Contributor

@Hamms Hamms commented Jul 13, 2018

Context

We have a desire in our system to ensure that when a source string is updated, we don't lose translation data for existing strings. Since most source string changes are minor tweaks like typo fixes or curriculum adjustments, this allows us to make those changes without a cost in translation saturation.

However! We are unfortunately doing this work twice; once as part of the sync up to Crowdin, which we do with the update_as_unapproved option (which just switches the translations from "approved" to "unapproved" when we update the source string, but doesn't actually discard them) and again during the sync out from Crowdin, where we use merge-translation.rb, in which we decline to update a given string if the translation data received from Crowdin is identical to the English source string.

Unfortunately, what this means is that when we update English source strings, any languages for which that string has not been translated will retain the old source string. This leads to quite a bit of badness, some of which is significant:

diff --git a/pegasus/cache/i18n/ar-SA.yml b/pegasus/cache/i18n/ar-SA.yml
index a0456a6a301..2d671ac8429 100644
--- a/pegasus/cache/i18n/ar-SA.yml
+++ b/pegasus/cache/i18n/ar-SA.yml
@@ -44,7 +44,7 @@ ar-SA:
   csedweek_og_description: الساعة البرنامجية هي حركة عالمية تصل الى عشرات الملايين
     من الطلاب على كافة الاعمار في اكثر من مءة <U+200F>وثمانين لغة وفِي كثر من خمسة وأربعين
     دولة.
-  csedweek_og_description_soon: The Hour of Code is coming December 4-10, 2017. Computer
+  csedweek_og_description_soon: The Hour of Code is coming December 3-9, 2018. Computer
     science is foundational for every student to learn.
   csedweek_og_description_here: The Hour of Code is here. Computer science is foundational
     for every student to learn.

Unfortunately, there are also a handful of places where we have translation data that only exists in these files:

diff --git a/pegasus/cache/i18n/ar-SA.yml b/pegasus/cache/i18n/ar-SA.yml
index a0456a6a301..2d671ac8429 100644
--- a/pegasus/cache/i18n/ar-SA.yml
+++ b/pegasus/cache/i18n/ar-SA.yml
@@ -673,8 +668,8 @@ ar-SA:
   volunteer_engineer_submission_field_email_placeholder: البريد الإلكتروني
   volunteer_engineer_submission_field_email_preference: هل يمكننا إرسال بريد إلكتروني
     إليك حول تحديثات دوراتنا أو الفرص المحلية أو أخبار علوم الكمبيوتر الأخرى؟
-  volunteer_engineer_submission_field_email_preference_privacy: "(لمزيد من التفاصيل
-    انظر سياسة الخصوصية الخاصة بنا)"
+  volunteer_engineer_submission_field_email_preference_privacy: "(See our privacy
+    policy)"
   volunteer_engineer_submission_field_email_preference_yes: نعم
   volunteer_engineer_submission_field_email_preference_no: لا
   volunteer_engineer_submission_final_paragraph: سيقع إدراجكم بالخريطة لتمكين المعلمين

Which means that by merging this, we will indeed lose some translation data.

Process

These changes were originally generated by running a sync with the following code change:

diff --git a/bin/i18n-codeorg/lib/merge-translation.rb b/bin/i18n-codeorg/lib/merge-translation.rb
index d87f13f0274..21089e96db4 100644
--- a/bin/i18n-codeorg/lib/merge-translation.rb
+++ b/bin/i18n-codeorg/lib/merge-translation.rb
@@ -14,10 +14,10 @@ def merge_translation_tree(en_translation, new_translation, prev_translation)
   if !new_translation.is_a?(Hash)
     # Leaf node, a translation.
     # New translation equals to English, and old translation already exists
-    if !prev_translation.nil? && new_translation == en_translation &&
-       new_translation != prev_translation
-      new_translation = prev_translation
-    end
+    #if !prev_translation.nil? && new_translation == en_translation &&
+    #   new_translation != prev_translation
+    #  new_translation = prev_translation
+    #end
     # Crowdin escapes carraige returns, so restore them here
     if new_translation.is_a?(String)
       new_translation = new_translation.gsub(/\\r/, "\r")

Changes were then added in commits that break everything up by content type. When reviewing this (massive) PR, I recommend checking commit-by-commit. The Instructions and Markdown Instructions commits are likely going to be too big to even view in github, but the rest of them should give you at least an overview of what kind of changes are going to happen.

I manually reviewed changes in a handful of files and documented my findings here. I was able to identify a couple of instances in which we loaded in translations directly into the project, and we're working on making sure all the data from those changes gets into Crowdin.

At Tanya's suggestion, I also specifically examined the Minecraft tutorials and /learn in our top ten languages. I found a total of two translations (one in Italian, one in German) which would be lost by these changes, but no more (also added those strings to Crowdin).

Conclusion

This PR is going to fix a lot of bad english strings in other locales throughout our system, and also cause us to lose some translation data. But the ratio of bad to good is strongly in favor of good, and with our spot-checks of critical/risky content, I'm convinced that we are unlikely to lose any high-profile translation data.

Next steps

  • Remove the merge-translations tool entirely from our merge
  • Update Crowdin settings so we only bother to sync-out translated strings, not English.

@Hamms Hamms requested a review from tanyaparker July 14, 2018 00:49
@Hamms
Copy link
Contributor Author

Hamms commented Jul 14, 2018

Also going to spot-check the course 1-4 translations before merging

@Hamms Hamms merged commit 8ee3ce3 into staging Jul 16, 2018
@Hamms Hamms deleted the test-no-merge-4 branch July 16, 2018 22:18
@Hamms
Copy link
Contributor Author

Hamms commented Jul 17, 2018

@Hamms
Copy link
Contributor Author

Hamms commented Jul 17, 2018

@Hamms
Copy link
Contributor Author

Hamms commented Jul 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants