-
-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logging for pictrs uploads #3927
Conversation
crates/db_schema/src/schema.rs
Outdated
actor_id -> Varchar, | ||
last_refreshed_at -> Timestamptz, | ||
#[max_length = 255] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something went wrong here, these annotations should stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird. this file was autogenerated by my diesel migration. do you guys manually maintain this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it works with autogeneration. Maybe you are using an older version of diesel-cli?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was it. fixed
I think this makes a lot of sense, in fact I was thinking of implementing something similar myself. In the future we can also extend it so that posts, user avatars etc reference this images table instead of storing plain links. Though it looks like the image table isnt really being used now. I think it would make sense to add an endpoint |
Its currently mainly meant just for admins to use via the database. I didn't want to go too far implementing features on this PR. I already had a ton of trouble just getting this to work as is due to rust |
@Nutomic can you tell me wtf the sql format check is complaining about? I'm deving on windows so i can't run sh locally |
b574596
to
7b9e3a4
Compare
Try adding an empty line between the The check script could be changed to use
|
crates/routes/src/images.rs
Outdated
ImageUpload::create(&mut context.pool(), &form) | ||
.await | ||
.map_err(error::ErrorBadRequest)?; | ||
if let Some(uploaded_image) = image.first() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe do this for all images rather than just the first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont really need to handle upload of multiple images at once. Is there some way to disable that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.4 and earlier is hardcoded to 10
0.5 makes this value configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the frontend breaks it up into individual calls with one image, but i suppose other frontends might not do that... Ill see if i can split it up
ffefb94
to
8102beb
Compare
pictrs_alias text NOT NULL, | ||
pictrs_delete_token text NOT NULL, | ||
published timestamptz DEFAULT now() NOT NULL | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, we should have added this a while ago.
@asonix Is there any way to ensure that this is the first / original uploader, rather than using the alias? It'd be nice if this table didn't store the aliases, but pictrs's official unique ID.
Then deletes can delete the image, not just an alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do want to delete just the alias, especially if you are doing something like purgeuser, because we don't want to delete everything they have uploaded, which may point to images that others have uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makotech is right. You do want the alias here & not the internal unique ID. pict-rs intentionally does not expose a way to retrieve the internal ID via the alias. Plus, aliases are already unique.
When a user deletes a post containing an image, that alias
should be deleted, but the the image should not be purged. This would impact availability of that image in other posts that are not deleted.
Purge exists as an internal endpoint for use when an admin decides an image absolutely needs to be removed from all posts at the cost of breaking image links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few use cases here tho if we allow a ton of duplicate images / aliases in the DB, counted as local_user uploads:
- Malicious images. If we delete only the alias link, then that image could still exist on the filesystem, if other people (or the same malicious actor) uploaded it several times as several different aliases. Because you have a ton of duplicate images, you don't know who the original uploader actually is, and whether to purge or just remove the alias.
- Non-malicious images. Having a ton of duplicate images linked to local_users means we don't know whether to purge or delete the alias again, because purging a user could remove safe images added by other people.
Shouldn't the point of this table, be to show who the original uploader is, not someone who possibly uploaded a duplicate?
The purge user function needs to know whether that person is the first / original uploader, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal purge endpoint will purge an image given any alias. There's no distinction in pict-rs itself between an "original" upload and a "duplicate" upload. They are both considered aliases of the same hash.
Caring about who is the original uploader vs who uploaded a duplicate doesn't make sense to me. Ultimately, if duplicates exist, it just means multiple people uploaded the same file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caring about who is the original uploader vs who uploaded a duplicate doesn't make sense to me. Ultimately, if duplicates exist, it just means multiple people uploaded the same file.
Because the main issue, that's the reason for this iirc, is purging users, and completely removing the images that they have originally uploaded. Lets say someone creates an account and :
- Uploads a few original malicious images.
- Uploads a few non-malicous copies of other peoples images (aliases)
When we do a user purge, how do we know we aren't purging non-malicious images that don't belong to them?
For that we need not secondary aliases, but to know who the original uploader is. And that means a unique pictrs id of some kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. The point of this PR is to really log who is uploading what, and to provide a list of images to delete when purging a user.
Pictrs creates an alias for an image if the image sha256 matches, so as to prevent duplicates. If only one alias exists for an image, then deleting the alias will also delete the image.
There is a separate user story that can be made for actually purging an image (original image and all aliases), but we currently don't have that functionality in lemmy. For every other use case though, we don't want to purge images just purely based on the uploader, since they could be uploading normal valid images which might be duplicates of other valid images.
Edit: I refreshed page and didn't realize you guys made more comments. We don't need to know original uploader, we just need to delete aliases. If only one alias exists, the image is also deleted. the non-malicious images won't be deleted in this case, only an alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with makotech, if an admin is deleting a user, that user's images should be deleted
and not purged
. In the majority of cases this will result in the media being removed.
If another user has also uploaded the Bad Image then that other user can also be deleted, which will result in the image being deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with yall then... personally I think a user purge should also purge (and not just remove aliases) for all the images they've uploaded, but I'll go with the consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would cause problems because another user might have uploaded the same image, so that would still be used. In fact this could be abused by a malicious user if he reuploads something like the site icon, then does something bad to get purged. In that case the site icon would get purged as well and its not what we want.
CREATE TABLE image_upload ( | ||
id serial PRIMARY KEY, | ||
local_user_id int REFERENCES local_user ON UPDATE CASCADE ON DELETE CASCADE NOT NULL, | ||
pictrs_alias text NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above, but this table should probably store not the aliases, but the unique internal pictrs ids, then this column made unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mentioned in other comment: this column can be made unique as-is. aliases are unqiue.
Looks good to me. In the future we should make more changes in this area, so that admins can view user uploads via api, and also reference this image_upload table instead of raw urls for avatars, post links etc. |
ec33560
to
035534e
Compare
2334149
to
4c93fda
Compare
pictrs_alias text NOT NULL, | ||
pictrs_delete_token text NOT NULL, | ||
published timestamptz DEFAULT now() NOT NULL | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with yall then... personally I think a user purge should also purge (and not just remove aliases) for all the images they've uploaded, but I'll go with the consensus.
for upload in pictrs_uploads { | ||
delete_image_from_pictrs(&upload.pictrs_alias, &upload.pictrs_delete_token, &context) | ||
.await | ||
.ok(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will potentially be called for hundreds or even thousands of images. You could delete multiple in parallel with join_all, but not sure if pictrs handles that well. Or use a background thread with spawn_try_task
. cc @phiresky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would limit concurrency here to a reasonable number like 32.
Look into tokio's JoinSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some other unavoidable loops in purging, this is pry good for now.
That won't be possible I think, because this will only log locally uploaded pics, whereas avatars and icons could be hosted on other instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks a ton @makotech222 !
Definitely wont be easy. I was thinking a |
Add new table, image_upload to store link between local_user_id and a pictrs image alias. Also update PurgeUser to take this into account.
Also fixed a bug in the url to the pictrs purge that was causing it to always fail.
Should allow for admins to keep track of who is uploading what and when.