-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding script to update time_spent to seconds from milliseconds #36398
Conversation
# time_spent should be transformed as such | ||
# (time_spent.to_f/1000).ceil.to_i | ||
|
||
UserLevel.where("time_spent > ?", 0).where.not(time_spent: nil).each do |user_level| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to mitigate the impact of running this query on such a big table would be to use a method like find_each
(info) that can be chained to a where
, that gets the records in batches rather than trying to instantiate them all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, it might make sense to do this in a transaction so that if something goes wrong, you won't end up in a state where some records are in seconds and other are in milliseconds. A good example is probably this script - although you would want all of your updates to happen in one transaction. That might negate the benefits of using find_each
, though, depending on how transactions work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your approach! this seems like a great way to combine the transaction + batches :)
|
||
UserLevel.find_in_batches(start: 93_221_000, finish: 2_811_329_000, batch_size: 5000) do |user_level_slice| | ||
puts "PROCESSING: slice #{slice} with starting id #{user_level_slice.first.id}." | ||
ActiveRecord::Base.transaction do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if UserLevels could have been deleted, do we also need the ending id of each slice? And is there anything you want to capture if it fails, like which row it failed on? (maybe this is captured in whatever error message you get by default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ending id: I don't think we need it. It'll just be staring_id + 5000
Row it failed on: Hmm. I thought about this for a bit. I'm super curious if you have additional thoughts on the matter. This is my first migration, so I don't have much experience to go on. My gut feeling: It might be useful as a learning experience for me, but for the migration itself, I don't think it would end up saving any time. Since the migration logic is so short and the slices are relatively small, the time spent writing error logging would outweigh the benefits of having that logging.
|
||
UserLevel.find_in_batches(start: 93_221_000, finish: 2_811_329_000, batch_size: 5000) do |user_level_slice| | ||
puts "PROCESSING: slice #{slice} with starting id #{user_level_slice.first.id}." | ||
ActiveRecord::Base.transaction do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to wrap each slice of 5000 rows in a single transaction. Can we log any row that we fail to update (Rescue any error and log the id
and the reason) and then move on the next row?
user_level.save(touch: false, validate: false) | ||
end | ||
end | ||
UserLevel.find_each(start: 93_221_000, finish: 2_811_329_000, batch_size: 5000) do |user_level| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use find_each.with_index
to track where we are and occasionally (every 5000 rows?) output where we are, so we can monitor this long running process similar to the way we were previously logging the start of each new slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
end | ||
if user_level.time_spent && user_level.time_spent > 0 | ||
user_level.time_spent = (user_level.time_spent.to_f / 1000).ceil.to_i | ||
user_level.save(touch: false, validate: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ActiveRecord update_column
method might be appropriate to use here. We have callbacks on this model, and performance might improve and we might avoid unintended side effects by skipping callbacks.
https://apidock.com/rails/ActiveRecord/Persistence/update_column
From running this on the adhoc, I'm seeing only the column we care about being updated Before: After: Similarly, entries that should not be updated have not been changed. |
I know we're pretty committed to this solution, but wanted to note here a couple of other possible solutions ... A single SQL statement that updates all 600K rows? UPDATE user_levels
SET time_spent = time_spent / 1000
WHERE time_spent IS NOT NULL; Or SELECT & export the |
The time_spent field in the user_levels table has been recording time in milliseconds. This will cause time_spent to max out at ~25 days due to mysql's integer size. Instead we want to record time in seconds. This will cause time_spent to max out at ~68 years.
Recording time_spent was temporarily disabled by this pr so we can make this change on all existing UserLevel records.
Links
Testing story
Reviewer Checklist: