Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Purging a post/comment/user/community does not remove the uploaded media despite the doc claiming it does. #3931

Closed
4 tasks done
Annoyedcrabby opened this issue Sep 2, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@Annoyedcrabby
Copy link

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

Like the title said, purging a user or a comment or a post or a community does not delete the media posted from the database despite the doc claiming otherwise

GridArt_20230903_010957071

Steps to Reproduce

  1. Make a post with picture
  2. Open the image in new page
  3. Purge the post
  4. Refresh the said image, still exists.

Technical Details

N/A

Version

0.18.4

Lemmy Instance URL

Monyet.cc

@Annoyedcrabby Annoyedcrabby added the bug Something isn't working label Sep 2, 2023
@fhke
Copy link

fhke commented Sep 3, 2023

Looking at the codebase, it appears that image uploads to pict-rs are not directly tracked in the Lemmy database. This makes it challenging to purge the image content associated with a post/comment safely, because there is no guarantee that the user who created the post/comment associated with an image is the same user that uploaded the image.
A naive implementation of this functionality could ignore this limitation, but that could easily become a vector for abuse. For example, a user could create a comment linking to other users images then deliberately get banned or delete their comment, which would trigger the deletion of those users images.

A safer version of this functionality could be implemented as follows:

  1. Create a new database table (e.g. user_images) linking image URLs with the corresponding user ID & community ID.
  2. When an image is uploaded, a record is added to the user_images table containing the user ID & the image URL.

The behaviour for removing entities could then be updated as follows:

  • Comment - the body is scanned with a regular expression for links to hosted images. For each image, if the user ID matches the commenter's ID, the image is purged.
  • Post - look up the image URL in the user_images table. If the image exists in the table and the owner matches the commenter's ID, the image is purged.
  • Community - remove all images uploaded to the community.
  • User - remove all of the images uploaded by the user.

There are still some issues with this approach that I can see:

  1. If other users add the image to their post by URL, the link will break. This isn't necessarily negative; for illegal/offensive images it would be preferred behaviour. The same issue also exists for comments that include images from an external site.
  2. Images uploaded prior to the implementation of this functionality would not have a corresponding record in the user_images table, and as such would not be deleted if the owning post was deleted. I can't think of a safe way to retrospectively populate this table unfortunately.
  3. If the same image URL is used in multiple posts/comments by the same user, purging one of these posts/comments will delete the underlying image, breaking the links for all of the posts/comments. Same as number 1, in the case of illegal/offensive images this would likely be preferable. The current flow of image uploads makes it difficult to prevent this, as images are uploaded before the corresponding post/comment.
  4. For a purge of a highly active user or an entire community it's possible that the pict-rs service could become overwhelmed with deletion requests and crash, or the purge request could time out. For this reason it may be preferable to add a deleted column to the user_images table which is set to true when an image is deleted, and add a scheduled task to actually perform the deletion in pict-rs with rate limits.
  5. If an admin wants to delete an image that has been linked to from multiple posts/comments, they would have to locate the initial uploader of the image in order to delete it. For this reason, we may want to implement a force delete image(s) button on posts/comments that deletes associated images without checking ownership.
  6. There are various tricks that users could do to make pict-rs images difficult to purge, like using link shortening services.

I'm happy to work on this, but before I do it would be good to get a Lemmy maintainer's perspective on this proposal.

@Nutomic
Copy link
Member

Nutomic commented Nov 22, 2023

Make sure that you configured a pictrs api key both in pictrs settings and lemmy, otherwise purging will fail. Also make sure to force reload the image. With normal reload, it will appear to still exist due to caching.

I added some tests for purging in #4183 which show that its working.

@fhke Such a table was added in #3927. I suggest you test the functionality in 0.19 and then open a new issue if improvements are still needed.

@Nutomic Nutomic closed this as completed Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants