Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User purge: Pegasus contacts and forms #22795

Merged
merged 12 commits into from
Jun 13, 2018
Merged

Conversation

islemaster
Copy link
Contributor

@islemaster islemaster commented May 31, 2018

Continuing work from spec: Ensures we remove user data from pegasus tables contacts, contact_rollups, forms and form_geos.

I've had to take a somewhat different approach with the Pegasus tables, constructing Sequel queries in helpers instead of verifying desired changes through the Rails models.

Note: This is an incremental step forward, with changes to code that we're not actively using in production. There's a pretty significant rethink to how account deletion + Pardot works on the way (discussions have been going on in Slack) but I think there's still useful testing/scaffolding in this PR that I'll want to build on top of going forward.

@islemaster islemaster force-pushed the user-purge-pegasus-contacts branch from 7fc1591 to 9d0f12c Compare May 31, 2018 18:30
@sureshc
Copy link
Contributor

sureshc commented Jun 7, 2018

@jeremydstone I haven't fully digested this Pull Request and written up my comments. I'm not confident enough in my knowledge of Contact Rollups and Pardot to review this on my own. Got time to provide some feedback?

# Though we have the DB tables in all environments, we only sync data from the production
# environment with Pardot.
if rack_env == :production
pardot_ids = @pegasus_db[:contact_rollups].
pardot_ids = contact_rollups_recordset.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this recordset, in the legitimate case, can only ever have one record? (A single userid or email address.) If so, can we enforce that? Either by just passing in a single Pardot ID to this function, or by throwing if recordset count > 1. Since this deletes data, we want to be really careful and have defense in depth. For example, right now a bug could get introduced in our calling code where we miss a where clause (similar to a recent bug that actually happened in prod), which could pass in a recordset of all contact rollups and delete them all in Pardot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I'm concerned that in the situation where our records are malformed, we have some regulatory obligation to delete all rows that might contain PII for the user in question. In particular, this is going to get more complicated as we separate out the account-delete case from the email-delete case, which are going to follow different rules.

Maybe what needs to happen is a pre-flight validation system that doesn't delete anything unless all affected records appear to be well-formed. Then if everything looks good we proceed with deletes. If not, maybe the record goes into a queue for manual review - allowing an engineer to decide "oh, two contact rows, that's fine" vs "oops, we don't want to delete 10,000 contact rows."

@jeremydstone
Copy link

Generally looks good. This deletes data so we need to think this through seven ways from Sunday! I'd like to make sure we've thought through defensiveness, diagnostic ability, and testing. See one comment where I think we can definitely improve defensiveness. Is there any other defensiveness/sanity checks we can add where we throw if something's not right? Any place where we can refuse to go on if it's going to affect more rows than we think reasonable/

We should add enough diagnostics/logging so we can see what's going on. Right now it looks like everything is just going to get deleted silently. We should log (to file on server or S3, Slack, somewhere) what's going on in detail so we can diagnose. If someone says, "Hey, my account X is no longer there," we should have the diagnostic ability to look in logs and see A) did we delete it and if so B) was it intentional or did the code misfire.

We'll also have to think about how we can safely test the Pardot deletion in advance. We do have a separate Pardot test environment and we should probably rig that up. Otherwise what if some runaway code starts deleting unintended things in prod.

@islemaster
Copy link
Contributor Author

Thanks for the detailed review! Since I put up this PR, the account deletion requirements have grown and changed out from under me. I'm working on a pretty significant rethink of our approach in this tech spec and I'll take some time today and capture the concerns you raised there - defensiveness, diagnostics, auditing, Pardot testing, etc.

In the meantime I believe the changes and test coverage in this PR are a useful step in the right direction, mostly reusable under the new plan, and none of the affected code is being run in a production context right now. I'd like to merge it as a safe step forward, and not make further code changes until the tech spec is settled.

Copy link

@jeremydstone jeremydstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a step in the right direction per Brad's comments.

@islemaster islemaster merged commit e07f40c into staging Jun 13, 2018
@islemaster islemaster deleted the user-purge-pegasus-contacts branch June 13, 2018 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants