-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User purge: Pegasus contacts and forms #22795
Conversation
7fc1591
to
9d0f12c
Compare
b7f41ce
to
36802cc
Compare
@jeremydstone I haven't fully digested this Pull Request and written up my comments. I'm not confident enough in my knowledge of Contact Rollups and Pardot to review this on my own. Got time to provide some feedback? |
# Though we have the DB tables in all environments, we only sync data from the production | ||
# environment with Pardot. | ||
if rack_env == :production | ||
pardot_ids = @pegasus_db[:contact_rollups]. | ||
pardot_ids = contact_rollups_recordset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this recordset, in the legitimate case, can only ever have one record? (A single userid or email address.) If so, can we enforce that? Either by just passing in a single Pardot ID to this function, or by throwing if recordset count > 1. Since this deletes data, we want to be really careful and have defense in depth. For example, right now a bug could get introduced in our calling code where we miss a where clause (similar to a recent bug that actually happened in prod), which could pass in a recordset of all contact rollups and delete them all in Pardot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. I'm concerned that in the situation where our records are malformed, we have some regulatory obligation to delete all rows that might contain PII for the user in question. In particular, this is going to get more complicated as we separate out the account-delete case from the email-delete case, which are going to follow different rules.
Maybe what needs to happen is a pre-flight validation system that doesn't delete anything unless all affected records appear to be well-formed. Then if everything looks good we proceed with deletes. If not, maybe the record goes into a queue for manual review - allowing an engineer to decide "oh, two contact rows, that's fine" vs "oops, we don't want to delete 10,000 contact rows."
Generally looks good. This deletes data so we need to think this through seven ways from Sunday! I'd like to make sure we've thought through defensiveness, diagnostic ability, and testing. See one comment where I think we can definitely improve defensiveness. Is there any other defensiveness/sanity checks we can add where we throw if something's not right? Any place where we can refuse to go on if it's going to affect more rows than we think reasonable/ We should add enough diagnostics/logging so we can see what's going on. Right now it looks like everything is just going to get deleted silently. We should log (to file on server or S3, Slack, somewhere) what's going on in detail so we can diagnose. If someone says, "Hey, my account X is no longer there," we should have the diagnostic ability to look in logs and see A) did we delete it and if so B) was it intentional or did the code misfire. We'll also have to think about how we can safely test the Pardot deletion in advance. We do have a separate Pardot test environment and we should probably rig that up. Otherwise what if some runaway code starts deleting unintended things in prod. |
Thanks for the detailed review! Since I put up this PR, the account deletion requirements have grown and changed out from under me. I'm working on a pretty significant rethink of our approach in this tech spec and I'll take some time today and capture the concerns you raised there - defensiveness, diagnostics, auditing, Pardot testing, etc. In the meantime I believe the changes and test coverage in this PR are a useful step in the right direction, mostly reusable under the new plan, and none of the affected code is being run in a production context right now. I'd like to merge it as a safe step forward, and not make further code changes until the tech spec is settled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a step in the right direction per Brad's comments.
Continuing work from spec: Ensures we remove user data from pegasus tables
contacts
,contact_rollups
,forms
andform_geos
.I've had to take a somewhat different approach with the Pegasus tables, constructing Sequel queries in helpers instead of verifying desired changes through the Rails models.
Note: This is an incremental step forward, with changes to code that we're not actively using in production. There's a pretty significant rethink to how account deletion + Pardot works on the way (discussions have been going on in Slack) but I think there's still useful testing/scaffolding in this PR that I'll want to build on top of going forward.