Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve efficiency of "warm" updates for very large courses #5109

Open
ragesoss opened this issue Aug 10, 2022 · 0 comments
Open

Improve efficiency of "warm" updates for very large courses #5109

ragesoss opened this issue Aug 10, 2022 · 0 comments

Comments

@ragesoss
Copy link
Member

This example course has more than 1 million edits spread across several wikis, and more than 50 thousand Commons uploads: https://outreachdashboard.wmflabs.org/courses/Wikimedia_Belgium/Wiki_Loves_Heritage_Belgium_(2021)

A "cold" update (performed by importing the course to a database without any of the revisions or related data) takes a very long time (on the order of a day). A "warm" update (done by running UpdateCourseStats again after an update has been completed, so that there are no new revisions or other records that aren't already in the database) takes just under an hour on my machine.

To some degree at least, the vast majority of updates in production are warm updates, so if we can find ways to make these more efficient, it would have a significant impact on production load and/or update speed.

With the debug_updates flag set, we can get logs that show how long each step of the update process took for the warm update:

 @end_time=Wed, 10 Aug 2022 18:35:37.833946750 UTC +00:00,
 @error_count=0,
 @full_update=false,
 @sentry_logs=
  {:start=>Wed, 10 Aug 2022 17:37:49.380506019 UTC +00:00,
   :revisions_imported=>Wed, 10 Aug 2022 17:41:28.702692768 UTC +00:00,
   :revision_scores_imported=>Wed, 10 Aug 2022 17:44:39.357237938 UTC +00:00,
   :uploads_imported=>Wed, 10 Aug 2022 18:11:27.268229550 UTC +00:00,
   :categories_updated=>Wed, 10 Aug 2022 18:11:27.281185546 UTC +00:00,
   :article_status_updated=>Wed, 10 Aug 2022 18:12:13.996681757 UTC +00:00,
   :average_pageviews_updated=>Wed, 10 Aug 2022 18:12:14.782478096 UTC +00:00,
   :articles_courses_updated=>Wed, 10 Aug 2022 18:28:53.849961217 UTC +00:00,
   :courses_users_updated=>Wed, 10 Aug 2022 18:34:33.211676875 UTC +00:00,
   :course_cache_updated=>Wed, 10 Aug 2022 18:34:50.161216936 UTC +00:00,
   :wikidata_stats_updated=>Wed, 10 Aug 2022 18:35:37.805343836 UTC +00:00},
 @sentry_tag_uuid="8ec5d8cf-b9ab-4bfc-a816-f77352578e0a",
 @start_time=Wed, 10 Aug 2022 17:37:49.379815055 UTC +00:00>

Time taken (in seconds):

  • revisions_imported: 219
  • revision_scores_imported: 191
  • uploads_imported: 1608
  • categories_updated: 0
  • article_status_updated: 47
  • average_pageviews_updated: 1
  • articles_courses_updated: 999
  • courses_users_updated: 339
  • course_cache_updated: 17
  • wikidata_stats_updated: 48

How can we speed up the slow steps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant