[SIP-27] Proposal for Paranoid Deletes #18386
Replies: 8 comments 4 replies
-
Issue-Label Bot is automatically applying the label Links: app homepage, dashboard and code for this bot. |
Beta Was this translation helpful? Give feedback.
-
@john-bodley are you thinking that the cascade deletes will also be soft/paranoid? In general I would be very in favor of moving to soft-deletes. |
Beta Was this translation helpful? Give feedback.
-
@willbarrett I think it depends on the entity type. For charts, dashboards, and datasources I think these should be soft-deletes, however cascade deletes are valid for removing columns/metrics once/if the datasource/table is hard deleted. I think there's merit in enumerating which tables should support soft and cascading deletes. |
Beta Was this translation helpful? Give feedback.
-
+1 for this proposal, we are thinking of implementing this @ dropbox as well either through APIs & some heuristics. It would be great to have 1st class support for this in Superset |
Beta Was this translation helpful? Give feedback.
-
+1 for this proposal
will either of you be interested in collaborating on this feature? @john-bodley @bkyryliuk? |
Beta Was this translation helpful? Give feedback.
-
Replying here @junlincc (although i'm not John or Bogdan 😛 ): I think the thought is to have the time between soft delete and hard delete be configured by the Superset admin. Note that the discussion here is more from the admin perspective than the user perspective. From the user's viewpoint, once something is soft deleted, they won't be able to find it in the Superset application (or maybe will only be able to find it in a trash can page). But if it's right after the deletion occurs, they could talk to the admin or self serve recover it from the trash can. But once the time between soft and hard delete expires, the entity is gone for good |
Beta Was this translation helpful? Give feedback.
-
+1 to @etr2460 point time to live should be configurable in the superset config or per database and yeah it should be set by admins As for implementation, it is unlikely that we would be able to commit to it in the next couple quarters. |
Beta Was this translation helpful? Give feedback.
-
thanks both! @etr2460 and @bkyryliuk
I'm not very familiar with the role permission in Superset tbh. By reading the documentation, seems like we should change the logic of can_delete, and add something like can_restore? Model & Action: models are entities like Dashboard, Slice, or User. Each model has a fixed set of permissions, like can_edit, can_show, can_delete, can_list, can_add, and so on. For example, you can allow a user to delete dashboards by adding can_delete on Dashboard entity to a role and granting this user that role. Please educate me or lmk if I missed anything 😅 |
Beta Was this translation helpful? Give feedback.
-
[SIP] Proposal for Paranoid Deletes
Motivation
At Airbnb we have a vast number of entities housed within Superset. Our deployment has tens of thousands of charts (both manually and procedurally generated), thousands of dashboards, and tens of thousands of registered datasources and tables (both physical and virtual).
In a recent analysis of a specific Druid NoSQL (native) cluster, from a sample of ~ 5k charts only 34% of charts rendered, i.e., returned a 200 status code from the
/supserset/slice_json
route.The following chart shows the renderability of charts as a function of last saved, which shows that a chart's viability often decays over time due to creep in the datasource metadata and the saved chart parameters.
Ideally we would like to have a mechanism to clean up obsolete resources (charts, dashboards, or datasources) in a somewhat paranoid manner, i.e., using soft deletes. This should help keep our deployment at a manageable size and improve the perceived reliability and quality (from a usability standpoint) of Superset assets.
Proposed Change
The proposed solution was originally mentioned by @etr2460 but I thought it was worthwhile formalizing this as a SIP. This borrows an idea from Ruby where we first soft delete records my marking them as deleted (with an associated timestamp) before performing a hard delete (deleting the record n-days later). Users could be prompted that their charts were being deleted and they can take corrective action to undelete it if they see fit.
There's actually a Python package sqla-paranoid which brings transparent soft deletes to SQLAlchemy which we could use or replicate. The TL;DR is this would add a
deleted_at
(ordeleted_on
for consistency) column which would track soft deleted records. Records which are soft deleted wouldn't show up in the CRUD views by default unless the filter was enabled (not unlike how SQL Lab Views are ignored by default in thetablemodelview
).Records could be marked using a hook, trigger, or cron as deletable based on various criterion using cascading context:
Charts
† Note this could leverage a cron (or similar) and be based on customizable rules, i.e., x of the last n-days.
Dashboards
Tables/Datasources
New or Changed Public Interfaces
We would need to updated the data model and leverage
sql-paranoid
(or similar) for enabling the soft-deletes. We would also need to update the FAB views to handle filtering/exclusion of soft deleted records. Finally we would need to implement triggers or similar to i) soft delete records, and ii) hard delete records.New dependencies
The only new dependency would be
sqla-paranoid
(no public license) if we decided not to write this ourself. Note the package only contains several hundred lines of codes.Migration Plan and Compatibility
We would need to update the schema to include the
deleted_at
(ordeleted_on
) column for certain tables. Note I think we only need this for charts, dashboards, and datasources (the cascade deletes should handle the cleanup of columns and metrics).Rejected Alternatives
None.
to: @etr2460 @mistercrunch @villebro @willbarrett
cc: @vylc
Beta Was this translation helpful? Give feedback.
All reactions