New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC Deletion Process #57
Conversation
@Bento007 - is this RFC in progress and not ready for community review? |
* This introduces the ability to permanently destroy data from the DSS. | ||
* There is no limit on the amount of data that can be rapidly permanently deleted. | ||
* Permanently deletion does not occur until after the grace period has elapsed. This could | ||
be problematic if files need to be removed sooner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wranglers say that the general expectation based on other archiving services is that when it comes to consent issues is to have complete removal in 24-48 hours. They are going to discuss this further at the wrangler F2F on Mon 4th Feb. For redaction not linked to consent it's okay to take weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the presentations from Dave Bernick from a FISMA compliance perspective speed of deletion doesn't need to be so quick. That being said from a reputational perspective (and considering any physical deletion request will need to navigate a certain amount of bureaucracy before the technical request is made) the final step taking 24 to 48hrs seems like a good aim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant for my last review to be a "Request changes"
|
||
|Bundle.Version|admin|reason|AWS Deletion Markers (key,ID)|GCP Previous Generations (key, previous generation)| | ||
|--------------|-----|------|--------------------|------------------------| | ||
|1234-2134-3145|admin@email.com|consent|(file.obj, 1234), ...| (file.obj, 4321), ...| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that the deletion table is an implementation detail of the DSS, internal to DSS and not exposed in the API? I think it's OK and appropriate to list the details of it here, as long as that fact is called out. More generally, throughout this RFC, more clarity is necessary to separate the background information, the API for the deletion actions, and the implementation details of the deletion service infrastructure.
### Drawbacks and Limitations [optional] | ||
|
||
* This introduces the ability to permanently destroy data from the DSS. | ||
* There is no limit on the amount of data that can be rapidly permanently deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be addressed by a suite of safety checks that could throttle deletions in the deletion daemon based on size delta, object count delta, etc. and send alerts to the DCP operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should strive to build a system that relies on data wranglers passing ad hoc information to service operators, and service operators running maintenance scripts, for an operation as core to the data lifecycle as deletion. Such a system is error-prone, non-scalable, and stifles data operations by keeping data management tools out of the hands of data operators.
More generally, I think we should re-examine our architectural assumptions and principles here, and consider how this RFC corresponds to the core DCP principle of federated services with boundaries defined by public, documented HTTP APIs. This principle is key to our agility.
### Tombstones | ||
|
||
Tombstones are markers places in the DSS to indicate previously existing data has been removed. Tombstones exist for | ||
files and bundles stored in the DSS. Tombstones come in two varieties, versioned and unversioned tombstones. A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tombstones exist for files and bundles stored in the DSS.
- This seems to contradict the earlier point about there only being physical deletion for files.
A user must have explicit permission to perform a **logical deletion** of a bundle. For a **physical deletion** of a | ||
bundle, the user must have explicit permission to **physically delete** files and bundles. | ||
|
||
(!) The deletion of a bundle does not handle the deletion of secondary analysis bundles referencing this bundle via the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthshahva you are considering this for redaction design?
9445318
to
85a9bcb
Compare
Last call Feb. 22, 2019March 15: Last call for oversight review