Skip to content

Commit

Permalink
FIX: Optimize performance of UnescapeEventName migration on large sit…
Browse files Browse the repository at this point in the history
…es (#284)

On large sites, the migration can take up to 4 hours which is not
something we want. This commit adds the following optimisations:

1. Add an index for notifications `WHERE notifcation_type IN (27, 28)`
1. Increase each batch size to 10000, the smaller the batch size the more
times we have to scan the notifications table.
1. Avoid having to start the scan of the notifications table from id 1
and instead start from the first event notification type.

Follow-up to 2719b9e
  • Loading branch information
tgxworld committed Jun 15, 2022
1 parent 53fb585 commit 7a0f360
Showing 1 changed file with 18 additions and 4 deletions.
22 changes: 18 additions & 4 deletions db/post_migrate/20220613073844_unescape_event_name.rb
Expand Up @@ -3,10 +3,14 @@
class UnescapeEventName < ActiveRecord::Migration[7.0]
disable_ddl_transaction!

TEMP_INDEX_NAME = "_temp_discourse_calendar_unescape_event_name_migration"

def up
# event notifications
start = 1
limit = DB.query_single("SELECT MAX(id) FROM notifications WHERE notification_type IN (27, 28)").first.to_i
DB.exec("CREATE INDEX CONCURRENTLY #{TEMP_INDEX_NAME} ON notifications(id) WHERE notification_type IN (27, 28)")
start, limit = DB.query_single("SELECT MIN(id), MAX(id) FROM notifications WHERE notification_type IN (27, 28)")

return if !start

notifications_query = <<~SQL
SELECT id, data
Expand All @@ -16,27 +20,35 @@ def up
notification_type IN (27, 28) AND
data::json ->> 'topic_title' LIKE '%&%'
ORDER BY id ASC
LIMIT 1000
LIMIT 10000
SQL

while true
if start > limit
break
end

max_seen = -1

DB.query(notifications_query, start: start).each do |record|
id = record.id

if id > max_seen
max_seen = id
end

data = JSON.parse(record.data)
unescaped = CGI.unescapeHTML(data["topic_title"])
next if unescaped == data["topic_title"]
data["topic_title"] = unescaped

DB.exec(<<~SQL, data: data.to_json, id: id)
UPDATE notifications SET data = :data WHERE id = :id
SQL
end
start += 1000

start += 10000

if max_seen > start
start = max_seen + 1
end
Expand All @@ -57,6 +69,8 @@ def up
UPDATE discourse_post_event_events SET name = :unescaped_name WHERE id = :id
SQL
end
ensure
DB.exec("DROP INDEX IF EXISTS #{TEMP_INDEX_NAME}")
end

def down
Expand Down

0 comments on commit 7a0f360

Please sign in to comment.