New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Upload string translation status to Redshift #40738

Merged

daynew merged 12 commits into staging from fnd-1508-update-i18n-translation-status-redshift-table

Aug 25, 2021

Member

daynew commented May 24, 2021 •

edited

We want to have a Redshift table which we can query to tell if a given string_key has been translated in a given language. This PR queries all the known string_key's, checks their translation status in every language, and then uploads that status to the analysis.i18n_string_translation_status Redshift table.

One thing to note is that before we upload the translation status for a string_key, we first delete the existing data for it. This is to keep the code simple by not having to figure out if our operation is an insert or an update. If we always delete the existing data first, then adding the fresh data to Redshift will always be an insert operation. An alternative to this is following the Redshift Merge examples, but the complexity of that seems unnecessary for our use-case.

Links

JIRA

Testing story

Ran locally can confirmed records were deleted and then re-inserted

Output of running the script.

Redshift query showing how many rows were inserted

Sample Redshift data

Deployment strategy

Follow-up work

Privacy

Security

Caching

PR Checklist:

Tests provide adequate coverage
Privacy and Security impacts have been assessed
Code is well-commented
New features are translatable or updates will not break translations
Relevant documentation has been added or updated
User impact is well-understood and desirable
Pull Request is labeled appropriately
Follow-up work items (including potential tech debt) are tracked and linked


          Upload string translation status to Redshift

d3cc5df

daynew requested a review from hacodeorg

May 24, 2021 23:24

daynew requested a review from a team as a code owner

May 24, 2021 23:24

daynew added 2 commits

May 24, 2021 17:12


          Fixing some comments

000a603


          Merge branch 'staging' of github.com:code-dot-org/code-dot-org into f…

ef70c10

…nd-1508-update-i18n-translation-status-redshift-table

daynew commented

View reviewed changes

bin/i18n/translation_status/update_translation_status.rb Outdated Show resolved Hide resolved

annaxuphoto reviewed

View reviewed changes

bin/i18n/translation_status/update_translation_status.rb Outdated Show resolved Hide resolved

Contributor

annaxuphoto commented May 26, 2021

The testing story is nice, but are there unit test cases we can add as well? Also, are there any cost concerns with deleting and re-inserting entries?

hacodeorg reviewed

View reviewed changes

bin/i18n/translation_status/update_translation_status.rb Outdated

+                  # Record the translation status for every string_key, i.e. "data.script.name.express-2020.title"
+                  string_keys.each do |string_key|
+                    # Maybe there is bad data in the database and we recorded an empty string
+                    next if string_key.blank?

Contributor

hacodeorg May 27, 2021

If this ever happens, should we know about it? You can fire a Honeybadger error event, then keep going. Or you can let the script raises an exception, it will be reported to Honeybadger automatically.

Member Author

daynew Jul 2, 2021

I don't think it's an issue worth distracting developers to resolve.

bin/i18n/translation_status/update_translation_status.rb Outdated

+                  end
+                  # Create the rows to be inserted in the analysis.i18n_string_translation_status Redshift table
+                  # https://docs.aws.amazon.com/redshift/latest/dg/c_Examples_of_INSERT_30.html
+                  values = translation_statuses.map do |status|

Contributor

hacodeorg May 27, 2021

How many rows do you expect translation_statuses would have (number of unique string keys in 90 days x number of locales)? If it is a significant number, is there any risks in saving everything in memory (resources contention on production-daemon) and sending a long query string to Redshift?

Member Author

daynew Jul 2, 2021

I don't know about the risk, but I do know that the query successfully ran. If we run into issues with this failing in the future, we can modify this to do batches instead. I'd like to keep it simple for now, and then make it more robust once we actually start hitting limitations.

bin/i18n/translation_status/update_translation_status.rb Outdated

+                # inserting them. We do this because Redshift doesn't have unique keys, so if we inserted data for a string_key which
+                # already exists, then there would be two rows in the database for the string_key.
+                delete_translation_status(redshift_client, day_count)
+                insert_translation_status(redshift_client, day_count)

Contributor

hacodeorg May 27, 2021

What would happen if the deletion succeeds but the insertion fails? Can we recover?

Member Author

daynew Jul 2, 2021

The recovery would be to run the script again. There isn't a backup, but I don't think that's a big deal because we just need the script to run again successfully. It will add back all the data which was deleted.

bin/i18n/translation_status/update_translation_status.rb Outdated

               end
+              # This script is scheduled to run regularly on the i18n-dev server.
+              update_translation_status(90)

Contributor

hacodeorg May 27, 2021

Could you add explanation for the value 90? It isn't clear to me why we have to go back 3 months if this script runs everyday.

Member Author

daynew Jul 2, 2021

Oh I was doing this because the existing data is pretty old. I turned off the logging because we have some pending performance improvements assigned to me this sprint. I will make sure to check in a more reasonable time like 7 days or something.

daynew added 7 commits

May 27, 2021 17:36


          Merge branch 'staging' of github.com:code-dot-org/code-dot-org into f…

4617eea

…nd-1508-update-i18n-translation-status-redshift-table


          Pausing work

2aac1ff


          Merge branch 'staging' of github.com:code-dot-org/code-dot-org into f…

f19e053

…nd-1508-update-i18n-translation-status-redshift-table


          Merge branch 'staging' of github.com:code-dot-org/code-dot-org into f…

2ae12f8

…nd-1508-update-i18n-translation-status-redshift-table


          Merge branch 'staging' of github.com:code-dot-org/code-dot-org into f…

033627c

…nd-1508-update-i18n-translation-status-redshift-table


          Merge branch 'staging' of github.com:code-dot-org/code-dot-org into f…

8e1e796

…nd-1508-update-i18n-translation-status-redshift-table


          Made the Translation Status Updater unit testable

405efd5

daynew force-pushed the fnd-1508-update-i18n-translation-status-redshift-table branch from 7cd1594 to 405efd5 Compare

July 2, 2021 12:24

Member Author

daynew commented Jul 2, 2021

The testing story is nice, but are there unit test cases we can add as well? Also, are there any cost concerns with deleting and re-inserting entries?

I have added some unit tests. Let me know if they match what you had in mind.

annaxuphoto reviewed

View reviewed changes

bin/i18n/test/translation_status/test_update_translation_status.rb

+                  updater = I18n::TranslationStatus::Updater.new
+                  updater.insert_translation_status(redshift, translation_service, locales, string_keys, current_time)
+                end
+              end

Contributor

annaxuphoto Jul 7, 2021

Thanks for adding unit tests and figuring out mocks for Redshift! Could we add one more test for updating entries - check that we appropriately delete existing data and insert the new data correctly.

Member Author

daynew Aug 24, 2021

I added a test!

daynew added 2 commits

August 16, 2021 15:03


          Merge branch 'staging' of github.com:code-dot-org/code-dot-org into f…

9a80d1c

…nd-1508-update-i18n-translation-status-redshift-table


          Added unit test for update_translation_status

ceee069

annaxuphoto approved these changes

View reviewed changes

Contributor

annaxuphoto left a comment

thanks for accommodating all my requests!

daynew merged commit 1adc04e into staging

daynew deleted the fnd-1508-update-i18n-translation-status-redshift-table branch

August 25, 2021 17:19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment