Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update level_sources gh-ost migration #41927

Merged
merged 2 commits into from
Jan 3, 2022

Conversation

sureshc
Copy link
Contributor

@sureshc sureshc commented Aug 6, 2021

Refresh gh-ost level_sources migration (#30425) which we never were able to run successfully back in 2019. Modify it to use the primary database and not the replica (which no longer exists).

Links

Testing story

Executed in dry run mode:

2021-08-06 19:40:13 INFO starting gh-ost 1.1.2
2021-08-06 19:40:13 INFO Migrating `dashboard_production`.`level_sources`
2021-08-06 19:40:14 INFO inspector connection validated on db-production.code.org:3306
2021-08-06 19:40:14 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on `dashboard_production`.*
2021-08-06 19:40:14 INFO binary logs validated on db-production.code.org:3306
2021-08-06 19:40:14 INFO Inspector initiated on ip-172-17-1-15:3306, version 5.7.12-log
2021-08-06 19:40:14 INFO Table found. Engine=InnoDB
2021-08-06 19:40:14 INFO Estimated number of rows via EXPLAIN: 1202720564
2021-08-06 19:40:14 INFO Master forced to be db-production.code.org:3306
2021-08-06 19:40:14 INFO log_slave_updates validated on db-production.code.org:3306
2021-08-06 19:40:14 INFO streamer connection validated on db-production.code.org:3306
2021-08-06 19:40:14 INFO Connecting binlog streamer at mysql-bin-changelog.018611:3508389
[2021/08/06 19:40:14] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql db-production.code.org 3306 db    false false <nil> false UTC true 0 0s 0s 0 false}
[2021/08/06 19:40:14] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin-changelog.018611, 3508389)
[2021/08/06 19:40:14] [info] binlogsyncer.go:203 register slave for master server db-production.code.org:3306
2021-08-06 19:40:14 INFO applier connection validated on db-production.code.org:3306
2021-08-06 19:40:14 INFO rotate to next log from mysql-bin-changelog.018611:0 to mysql-bin-changelog.018611
[2021/08/06 19:40:14] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.018611, 3508389)
2021-08-06 19:40:14 INFO applier connection validated on db-production.code.org:3306
2021-08-06 19:40:14 INFO will use time_zone='SYSTEM' on applier
2021-08-06 19:40:14 INFO Examining table structure on applier
2021-08-06 19:40:14 INFO Applier initiated on ip-172-17-1-15:3306, version 5.7.12-log
2021-08-06 19:40:14 INFO Dropping table `dashboard_production`.`_level_sources_ghc`
2021-08-06 19:40:14 INFO Table dropped
2021-08-06 19:40:14 INFO Creating changelog table `dashboard_production`.`_level_sources_ghc`
2021-08-06 19:40:14 INFO Changelog table created
2021-08-06 19:40:14 INFO Creating ghost table `dashboard_production`.`_level_sources_gho`
2021-08-06 19:40:14 INFO Ghost table created
2021-08-06 19:40:14 INFO Altering ghost table `dashboard_production`.`_level_sources_gho`
2021-08-06 19:40:14 INFO Ghost table altered
2021-08-06 19:40:14 INFO Altering ghost table AUTO_INCREMENT value `dashboard_production`.`_level_sources_gho`
2021-08-06 19:40:14 INFO Ghost table AUTO_INCREMENT altered
2021-08-06 19:40:14 INFO Intercepted changelog state GhostTableMigrated
2021-08-06 19:40:14 INFO Created postpone-cut-over-flag-file: /tmp/gh-ost.cutover
2021-08-06 19:40:14 INFO Waiting for ghost table to be migrated. Current lag is 0s
2021-08-06 19:40:14 INFO Handled changelog state GhostTableMigrated
2021-08-06 19:40:14 INFO Chosen shared unique key is PRIMARY
2021-08-06 19:40:14 INFO Shared columns are id,level_id,md5,data,created_at,updated_at,hidden
2021-08-06 19:40:14 INFO Listening on unix socket file: /tmp/gh-ost.dashboard_production.level_sources.sock
2021-08-06 19:40:14 INFO Migration min values: [1]
2021-08-06 19:40:14 INFO Migration max values: [1418354673]
2021-08-06 19:40:14 INFO Waiting for first throttle metrics to be collected
2021-08-06 19:40:14 INFO First throttle metrics collected
# Migrating `dashboard_production`.`level_sources`; Ghost table is `dashboard_production`.`_level_sources_gho`
# Migrating ip-172-17-1-15:3306; inspecting ip-172-17-1-15:3306; executing on production-daemon
# Migration started at Fri Aug 06 19:40:13 +0000 2021
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 100; max-load: Threads_running=30; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/gh-ost.cutover [set]
# panic-flag-file: /tmp/gh-ost.panic.flag
# Serving on unix socket: /tmp/gh-ost.dashboard_production.level_sources.sock
2021-08-06 19:40:14 INFO Row copy complete
# Migrating `dashboard_production`.`level_sources`; Ghost table is `dashboard_production`.`_level_sources_gho`
# Migrating ip-172-17-1-15:3306; inspecting ip-172-17-1-15:3306; executing on production-daemon
# Migration started at Fri Aug 06 19:40:13 +0000 2021
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 100; max-load: Threads_running=30; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/gh-ost.cutover [set]
# panic-flag-file: /tmp/gh-ost.panic.flag
# Serving on unix socket: /tmp/gh-ost.dashboard_production.level_sources.sock
Copy: 0/1202720564 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin-changelog.018611:3522449; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: N/A
Copy: 0/0 100.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin-changelog.018611:3522449; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: due
2021-08-06 19:40:14 INFO New table structure follows
CREATE TABLE `_level_sources_gho` (
  `id` bigint(11) unsigned NOT NULL,
  `level_id` int(11) DEFAULT NULL,
  `md5` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
  `data` varchar(20000) COLLATE utf8_unicode_ci NOT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `hidden` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `index_level_sources_on_level_id_and_md5` (`level_id`,`md5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
[2021/08/06 19:40:14] [info] binlogsyncer.go:164 syncer is closing...
[2021/08/06 19:40:14] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
2021-08-06 19:40:14 INFO Closed streamer connection. err=<nil>
[2021/08/06 19:40:14] [info] binlogsyncer.go:179 syncer is closed
2021-08-06 19:40:14 INFO Dropping table `dashboard_production`.`_level_sources_ghc`
2021-08-06 19:40:14 INFO Table dropped
2021-08-06 19:40:14 INFO Dropping table `dashboard_production`.`_level_sources_gho`
2021-08-06 19:40:14 INFO Table dropped
2021-08-06 19:40:14 INFO Done migrating `dashboard_production`.`level_sources`
2021-08-06 19:40:14 INFO Removing socket file: /tmp/gh-ost.dashboard_production.level_sources.sock
2021-08-06 19:40:14 INFO Tearing down inspector
2021-08-06 19:40:14 INFO Tearing down applier
2021-08-06 19:40:14 INFO Tearing down streamer
2021-08-06 19:40:14 INFO Tearing down throttler
# Done

Deployment strategy

  1. DONE - Re-enable binary logging on the production cluster via Temporarily Enable binary logging on production Aurora database cluster  #44104 (requires restart - see deployment steps in that Pull Request)
  2. Disable Aurora binlog filtering on production database cluster by setting the Parameter Group setting aurora_enable_repl_bin_log_filtering=0 (this is a dynamic configuration setting).
  3. Comment out update_census_mapbox job on production-daemon which runs ~9PM-Midnight PST each day because it carries out all of its work in a single database transaction and impacts
  4. DROP the temp table _level_sources_gho created by the last failed execution of this migration
  5. This Pull Request hasn’t been merged yet, so checkout this file on git checkout origin/update-level-sources-gh-ost-migration -- bin/oneoff/gh-ost_migrations/level_sources.sh
  6. Execute this shell script in a screen on production-daemon. It will create the empty new table by copying the existing table, and will then apply the schema change to the empty table, and will then delete the new table as a dry-run.
  7. Edit the deployed shell script to UN-comment out the --execute flag and re-run the script for realz in a screen
  8. Detach from the screen and monitor database and system performance and terminate the gh-ost migration if it is negatively impacting system performance.
  9. Tell gh-ost to complete cut-over when it is done copying over all of the rows from the existing table by deleting the cutover flag file /tmp/gh-ost.cutover
  10. Re-enable Aurora binlog filtering by setting aurora_enable_repl_bin_log_filtering=1
  11. After we’re certain the new table is working correctly, delete the old table.
  12. Delete this shell script file from production-daemon so that when this change is merged and released to production it can be deployed.
  13. Disable binary logging (requires restart)
  14. Scale down database instances from Hour of Code instance types to standard instance types
  15. re-enable update_census_mapbox job

Follow-up work

Privacy

Security

Caching

PR Checklist:

  • Tests provide adequate coverage
  • Privacy and Security impacts have been assessed
  • Code is well-commented
  • New features are translatable or updates will not break translations
  • Relevant documentation has been added or updated
  • User impact is well-understood and desirable
  • Pull Request is labeled appropriately
  • Follow-up work items (including potential tech debt) are tracked and linked

@sureshc sureshc changed the title Update level_sources migration Update level_sources gh-ost migration Aug 6, 2021
@jamescodeorg jamescodeorg added the no-reminders Use this label to hide this PR from Slack reminders label Aug 19, 2021
@sureshc sureshc requested review from a team and removed request for wjordan December 14, 2021 20:21
Copy link
Contributor

@bencodeorg bencodeorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, are you doing this imminently? One missing last step I think would be a Rails migration to sync other envs

Copy link
Contributor

@cat5inthecradle cat5inthecradle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, process makes sense.

@sureshc
Copy link
Contributor Author

sureshc commented Jan 3, 2022

Migration completed 2022-01-02 ~7PM PST

@sureshc sureshc merged commit 7037eb0 into staging Jan 3, 2022
@sureshc sureshc deleted the update-level-sources-gh-ost-migration branch January 3, 2022 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-reminders Use this label to hide this PR from Slack reminders
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants