Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script: Soft delete unused poste_urls #36071

Merged
merged 7 commits into from
Jul 31, 2020

Conversation

molly-moen
Copy link
Contributor

@molly-moen molly-moen commented Jul 29, 2020

PLC-937
Script to set all poste_urls to have a deleted_at value except those in the list in poste_url_constants.

Links

Testing story

I verified the script soft deletes all urls in poste_urls except those in the list. I ran the script against a sample table with about 2000 rows. Suresh ran an update against the whole table on a production clone and it took 1 minute 22 seconds.

Reviewer Checklist:

  • Tests provide adequate coverage
  • Code is well-commented
  • New features are translatable or updates will not break translations
  • Relevant documentation has been added or updated
  • User impact is well-understood and desirable
  • Pull Request is labeled appropriately
  • Follow-up work items (including potential tech debt) are tracked and linked

@molly-moen molly-moen requested review from sureshc and a team July 29, 2020 23:42
@bencodeorg
Copy link
Contributor

Is there opposition to just running:
UPDATE poste_urls SET deleted_at = NOW()
Then doing the update step (maybe do that via the script you have here)?

I do have a vague recollection of our production DB having a setting preventing updates without where clauses, but looks like that might be avoidable?

@molly-moen
Copy link
Contributor Author

Is there opposition to just running:
UPDATE poste_urls SET deleted_at = NOW()
Then doing the update step (maybe do that via the script you have here)?

I do have a vague recollection of our production DB having a setting preventing updates without where clauses, but looks like that might be avoidable?

@bencodeorg The opposition was the concern trying to update all 3.5 million rows in one shot could have some performance impacts, this way we do one at a time. Also the issue you noted about not being able to update without a where clause.


puts "Marked all urls as deleted..."

DB[:poste_urls].where(id: POSTE_URLS_TO_KEEP).update(deleted_at: nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way to test this? I think Sequel will generate a long WHERE condition (id IN [x,y,z,d,.........]) which might generate a SQL statement that’s excessive in length and which might be inefficient to carry out the multi-row update. We might have to iterate over all of the URLs to that we want to keep and update each one individually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally (using the same list of ids) and it seemed to work fine, although it was against a smaller table. I can update to iterative if it's safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the list of IDs was the same, we’re good. We’re not writing to this table anymore, so even if the UPDATE of a long list of rows selected by ID is slow, we’re fine.

@sureshc sureshc self-requested a review July 31, 2020 22:12
@molly-moen molly-moen merged commit 3609751 into staging Jul 31, 2020
@molly-moen molly-moen deleted the molly/soft_delete_old_poste_urls branch July 31, 2020 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants