-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New sharing_disabled backfill script #21681
Conversation
Do we still have to consider validation, like in @caleybrock 's earlier PR #21671? |
I believe the |
This is awesome, I'm definitely going to use this approach next time I have to do a big data migration. Good find! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
user.update! sharing_disabled: true | ||
num_students_updated += 1 | ||
puts "Updated #{num_students_updated} students so far." if num_students_updated % 100000 == 0 | ||
User.where('birthday IS NULL OR birthday > ?', min_birthday).in_batches(of: batch_size) do |where| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the variable should be named something like users
instead of where
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or batch
values.each do |_id, properties| | ||
properties['sharing_disabled'] = true | ||
end | ||
User.import([:id, :properties], values, validate: false, on_duplicate_key_update: [:properties]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@islemaster what does on_duplicate_key_update
do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default mode for the activerecord-import
gem is a bulk-insert, but here we want to do a bulk-update. The on_duplicate_key_update
setting turns this into an upsert operation that will update the properties
column when primary keys match (in this case id
).
See also: On Duplicate Key Update on the activerecord-import wiki
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshlory See https://github.com/zdennis/activerecord-import/wiki/On-Duplicate-Key-Update
TL;DR - it's an upsert, when the key (id) already exists update properties
Uses
activerecord-import
to rapidly update a value in theproperties
JSON blob in ourusers
table for roughly 35 million users, in batches of 10,000. Will ran some tests and found that this approach can update 10k rows in about 1.6 seconds so we're estimating a little over 90 minutes to run this query (against 97% of our user rows!).Next steps
We need to deploy this change so that the
activerecord-import
gem gets installed on production-console, then we'll manually run the oneoff script to update user rows.Why
We're going with this approach (updating so many rows) instead of some other approaches we discussed where application logic is used to minimize the number of rows we need to update, because:
Awesomeness
@wjordan wrote:
We can't wait to see what sort of other speedups this permits...