Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting groups is not scalable #10770

Closed
jdalsem opened this issue Feb 15, 2017 · 25 comments
Closed

Deleting groups is not scalable #10770

jdalsem opened this issue Feb 15, 2017 · 25 comments
Milestone

Comments

@jdalsem
Copy link
Member

jdalsem commented Feb 15, 2017

Currently when deleting content (and in particular groups) this will be done recursively as part of the action performed by the user that requests the delete.

Because deletes are (most of the time) recursive this causes issues where there is a lot of related content to be deleted. This issue is very apparent when deleting groups. When deleting a group all, content inside the group will also be deleted. So the following example could happen:

  1. Delete group
  2. Remove all pages in the group
  3. Remove all subpage from all those pages
  4. Remove all comments on all those subpages
  5. Remove all activity that mentions creation of subpages and comments
  6. Remove all metadata related to the entities
  7. Remove all relationships with all that content and the group

In active groups this could easily be thousands of records to be deleted. All entities/metadata/relationships will be individually deleted and will trigger events/hooks as needed.

It makes sense that this kind of approach will cause the deleting user to have to wait for a long time and eventually even time-out without knowing if the group will be deleted.

What kind of approach would make the delete action 'instant', or 'scheduled' without causing issues like still seeing the group or its content? But also leave all events/hooks working...

Would really like some suggestions @Elgg/core

@hypeJunction
Copy link
Contributor

Scheduled tasks API would be great. We currently rely on vroom for many of those tasks, but if the core had a built-in scheduling capability that would great.

@mrclay
Copy link
Member

mrclay commented Feb 17, 2017

What kind of approach would make the delete action 'instant', or 'scheduled' without causing issues like still seeing the group or its content? But also leave all events/hooks working

Theoretically this is what enabled should let us address. We run a few queries (no events) to rapidly disable all descendent content and schedule the delete task.

@jdalsem
Copy link
Member Author

jdalsem commented Feb 17, 2017

recursive disable is also not scalable and will (currently) also trigger events....

you could go for disable all where guid = <disabled_guid> or parent_guid = <disabled_guid> query on entities and tackle a delete in a backgroundtask/cron, but you will not know 'why' an entity is disable when performing the background task...

Maybe the easiest way is to add a 'deleted' column and use that for recursive deletes....

Bare in mind that there is no 'quick' query to delete all things in a group. (Think of comments... as they have not the group as its container)

@hypeJunction
Copy link
Contributor

I don't think adding columns is the way to go. We already have a queue table, we just need to add API to queue actions and perform them on cron.

@jdalsem
Copy link
Member Author

jdalsem commented Feb 17, 2017

that only solves removing the wait time from user... still need to have something for the time in between... or are you willing to accept the risk pages returning errors because entities are partially deleted?

With a delayed delete we also should provide some status notification to user that deletion has been finished

@hypeJunction
Copy link
Contributor

Ah, I see what you are saying. Not sure.

@mrclay
Copy link
Member

mrclay commented Feb 17, 2017

Strawman 1:

  • Without events, create "deleted_descendant_of" relationships with all the entity's descendants. This may take several queries working recursively, but should be pretty quick.
  • Without events, set the entity and all the related descendants to disabled using a couple queries and queue a task.
  • (optional) If the deleted entity is a user, begin the task by reassigning ownership of descendent groups to an admin user, disabling them, and removing the relationship.
  • In the task, regular recursive delete the entity with events.
  • If event handlers care about why the entity's current state is disabled, they can sniff for the "deleted_descendant_of" relationship.

@iionly
Copy link
Contributor

iionly commented Feb 17, 2017

Not sure if it makes any sense in this context... there's the banned state for user entities and groups can be restricted. Banned users can't login and non-members can't access restricted group content. Could a similar approach be used for "locking" entities while their container entities (or container of container entities) are in the process of deletion by a background job? For deletion of groups not even group members could access group pages anymore but being redirected instead and permission hooks could deny edit/delete/annotate (whatever...) permissions.

@mrclay
Copy link
Member

mrclay commented Feb 17, 2017

@iionly In the scenario I described above, the entity and all its descendants would be disabled in an atomic query, so no need to lock anything. Unless I'm misunderstanding you.

@iionly
Copy link
Contributor

iionly commented Feb 18, 2017

As I understand the approach you are suggestion it would still be necessary to (recursively) add some relationsship to every entity and to disable every entity contained by the container entity (e.g. group, user) that is to be deleted. I don't know how fast this can be done (without listening to hooks and events). But still I think there could be some race condition occuring where a user might do something (commenting, liking, add some content, joining a group etc.) while the deletion of the container entity is prepared. New data might not have the relationship added and might not get disabled before the actual delete process is starting as a background task. Could the addition of the relationship and disabling of entities already be done as a background task or would the user have to wait for that to finish? What if there's an error occuring during addition of relationships and disabling leaving this task unfinished. Could the deletions be safely done as a background job then?

My suggestion would be to add some flag only to the top-level container entity that is going to be deleted (user, group, or some other type of container entities where appropriate - Tidypics albums entities containing many images would come to my mind here).

Maybe "locking" was a bad choice of wording here. But the point is that there would be no relationship/metadata added to every child entity recursively. Instead of that there would be a check done in case a user tries to add some additional data to the container entity or to any child entity that is to be deleted (e.g. canEdit, canComment, canDelete and what else might matter here) would check if the container entity or any entity within the chain of container entities up to group or user would have the "locked_for_deletion" flag. If the flag is set, you can't add any new comment/annotation/metadata anymore (and might get an error message explaining why).

Though the "locked_for_deletion" flag only being set for the top-level container entity would make it necessary to always check the existence of such a flag. So, it might have a bit of a negative effect on performance. On the other hand there are permission checks necessary already anyway (canEdit, canComment, ...). So the overhead might not be too large.

@hypeJunction
Copy link
Contributor

One other option, which would also address a problem of accidental content removal, is to just add deleted column to the entities table. SQL queries would be updated to always exclude deleted entities. We could add admin option to purge deleted entities and files (we would probably need to update our API to not actually wipe owned files).

@mrclay
Copy link
Member

mrclay commented Feb 20, 2017

The deleted column would make deleting all the descendants very quick: First mark the entity deleted, then run this query until it stops affecting any rows.:

UPDATE entities
SET deleted = 1
WHERE deleted = 0
AND (
  container_guid IN (
    SELECT guid FROM entities WHERE deleted = 1
  ) OR owner_guid IN (
    SELECT guid FROM entities WHERE deleted = 1
  )
)

@hypeJunction
Copy link
Contributor

We still need to be triggering the events, no?

@mrclay
Copy link
Member

mrclay commented Feb 20, 2017

Argh, yes, and the task won't know why deleted = 1 in a descendant. Sooo... Maybe instead of deleted we use state (tinyint), so we can have discrete BEING_DELETED and DELETED states.

If the state is either the entity won't show up in queries, but BEING_DELETED entities will have the "delete" event fired, and if it fails we'll revert the state of all its descendants.

@mrclay
Copy link
Member

mrclay commented Feb 20, 2017

What happens now if, say, a child entity refuses to delete? Stranded entities with deleted owner/container is a separate issue.

@mrclay
Copy link
Member

mrclay commented Feb 20, 2017

Strawman 2:

  • Add state (TINYINT) column to entities. 0 = NORMAL, 1 = BEING_DELETED, 2 = DELETED.
  • Set the deleted entity to DELETED.
  • Without events, use a recursive query to mark all descendants as BEING_DELETED, effectively hiding them from the entity system.
  • If in an action, queue a task for the following:
  • Recursive delete the entity's children with events (if event fails, revert state to NORMAL)

@hypeJunction
Copy link
Contributor

Sounds feasible. Perhaps the same column could help with draft/published state - what we do now (see blogs) is too hacky

@mrclay
Copy link
Member

mrclay commented Feb 20, 2017

I can imagine needing to know whether a deleted entity was a draft or not. Maybe it should just be deleted_state. I'm in general hesitant to expose something like this to plugin API.

If access control were granular, things like drafts would easier to manage.

@jdalsem
Copy link
Member Author

jdalsem commented Feb 23, 2017

I like the STATE column... we could use it also for disabled states... it would remove the need for an extra where clause

@jdalsem
Copy link
Member Author

jdalsem commented Nov 24, 2017

a solution for this will not make it to Elgg 3.0

@jdalsem jdalsem removed this from the Elgg 3.0.x milestone Nov 24, 2017
@hypeJunction
Copy link
Contributor

I am proposing we stop doing this ridiculous clean up operation with all the evens fired and batch deletes. It's ok to trigger an event for entities, but all the related stuff has no use. We fire events for metadata, annotations, owned annotations, relationships, private settings. Because of that we can't just remove everything from the database in one go. If someone wants to do something with a deleted annotation when entity is deleted they can just listen to entity delete event.

@mrclay
Copy link
Member

mrclay commented Nov 27, 2017

I'd also vote to not call all these delete events for items that happen to be tied to a deleted entity. Do we have any use cases for these?

@iionly
Copy link
Contributor

iionly commented Nov 27, 2017

A theoretical use case could be the Userpoints plugin. I'm currently adding support for deletion of points awarded for annotation creation (for example liking/unliking - where unliking and then re-liking was a loophole for gaining userpoints). But the use case is theoretical because I've also rewritten the point deletion on deletion of entitities. If it works as intended, the deletion of the entity triggers the deletion of all userpoint entitities related to this entity directly including userpoint entities awarded for related annotations/metadata. So, this would be the approach suggested by @hypeJunction with the annotation/metadata/relationships deletion events not necessary. As the rewrite of the Userpoints plugin is still a WIP I can't say yet if it will work fine in the end (I'm also still worried about other scaling issues with adding support for point deletion on annotation deletion like for example deletion of a Tidypics album that could contain 1000s of images with even more comments and likes...).

Is there really no "easy" way to handle group deletion by an Elgg cron job? Like putting the group just in kind of a lock-down mode that blocks the access to any group pages or content when clicking on the delete button. If the users are kept out of the group without the possibility to modify any group content (including not commenting/liking for example in the river), the time necessary to delete everything with a cronjob wouldn't matter that much anymore. And maybe the same lockdown mechanism could also be useful in other situations (again: Tidypics albums as one example, or to allow for moderation of deletion by an admin, or locking down a user account instead of banning or deleting the account).

@Facyla
Copy link
Contributor

Facyla commented Nov 29, 2018

Could this issue be considered as a garbage collection task?

I mean, when an entity is deleting, the subsequent hooks apply only to the owned/contained entities (and may block their deletion), without blocking the first deleted entity <=> the deletion batch doesn't prevent the initial entity to be deleted because the deletion of one of its elements failed (which is the expected behaviour), and these are the time-consuming part of the process.
=> as the group deletion should not fail because some hook prevented one of its objects to be deleted, these deletions could be postponed.

Step 1 could become: delete the actual entity without recursivity, and forward the user with the result message, and only then launch the recursive part of the process.

If plugins block some objects deletion, then the user who should be notified by the hook is not necessarily the user who deleted the container, but probably more the owner of the content (eg. "your blog post has been transferred to your personal blog") - which would probably better fit actual notifications rather than site messages.

The main issue is the recursive hooks relating on the actual container data to perform their logic: but since whatever they do and respond does not block the container deletion process anyway, this does not seem serious enough to invalidate that approach.

Then cleanup could be performed by checking object entities with invalid containers.
By invalid containers, i mean containers whose GUID does not exist, which does not include disabled entities: these rather mean that entities that are archived / hidden / banned for some reason, but may be enabled again.
This is a big assumption, but if entities, relations, anything GUID-related have a non-existing GUID, they might be removed from the database by cron tasks, or (i just found that topic) added to the garbage collector as suggested in #10770

Finally there is the visibility issue: if we consider -in the same spirit- that an object entity with an invalid or also inaccessible container should not be displayed at all, then the getter functions could only check for an existing container at the lowest level, by ensuring that the guid exists in entities table.
The container validity check could happen later, when viewing the entity - ie. where the context is required -, and through the container permissions hook.

To give some more background on this, it think there are 2 main use cases when dealing with group deletion: the current logic behind deleting a group implies all its content should be destroyed too, and by content it means both group as container or group as owner.

  • for content owned by the group itself, that seems obvious
  • for content owned by an user inside a group, this may be an issue, as it means deleting user content without user approval nor information.

When some object has a group as container, group deletion may be handled so that the content owned by users is transferred to these users (ie. update container_guid to owner_guid + access_id to 0).
This would rather happen at the entity deletion hook level than at the group level, although subgroups deletion has a feature that allows to transfer content to parent group, and an old plugin used to allow to transfer content from a deleted user to another user.

Depending on the installation context, people tend to consider whether:

  • that the user content is the most valuable (and the container group is only a context),
  • or that the group is a whole and its content should not exist without it (some users happen to publish in it, but do not really retain ownership in that context). (in that case, i sometimes give ownership to some object to the group itself, rather than to the users, and adjust the members permissions accordingly).

I feel there is something to address here, which would make it easier to separate the recursivity by ownership, and by containership: the first is obvious, and recursivity could be handled in a straight-forward way, while the container is much more likely to be handled in various ways by plugin developers.

@jdalsem
Copy link
Member Author

jdalsem commented Apr 12, 2024

With the introduction of trashable entities in #14574 the impact of recursively deleting content has been partially offloaded to a background proces. However we acknowledge that this still is a (theoretical) problem. Real life issues are accepted for know as we have no clear / simple solution for this problem. As there will be no active work on remediating this issue i am closing this ticket

@jdalsem jdalsem closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants