Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging for pictrs uploads #3927

Merged
merged 2 commits into from
Sep 6, 2023

Conversation

makotech222
Copy link
Contributor

Add new table, image_upload to store link between local_user_id and a pictrs image alias. Also update PurgeUser to take this into account.

Also fixed a bug in the url to the pictrs purge that was causing it to always fail.

Should allow for admins to keep track of who is uploading what and when.

actor_id -> Varchar,
last_refreshed_at -> Timestamptz,
#[max_length = 255]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something went wrong here, these annotations should stay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. this file was autogenerated by my diesel migration. do you guys manually maintain this file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it works with autogeneration. Maybe you are using an older version of diesel-cli?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was it. fixed

@Nutomic
Copy link
Member

Nutomic commented Sep 1, 2023

I think this makes a lot of sense, in fact I was thinking of implementing something similar myself. In the future we can also extend it so that posts, user avatars etc reference this images table instead of storing plain links.

Though it looks like the image table isnt really being used now. I think it would make sense to add an endpoint /api/v3/local_user/list which returns a paginated list of all local users. This list can be used by admins to see all user accounts and the images uploaded by each user.

@makotech222
Copy link
Contributor Author

Its currently mainly meant just for admins to use via the database. I didn't want to go too far implementing features on this PR. I already had a ton of trouble just getting this to work as is due to rust

@makotech222
Copy link
Contributor Author

@Nutomic can you tell me wtf the sql format check is complaining about? I'm deving on windows so i can't run sh locally

https://woodpecker.join-lemmy.org/repos/129/pipeline/2388/7

@drumlinish
Copy link
Contributor

Try adding an empty line between the CREATE INDEX and at the end of both files.

The check script could be changed to use diff -u so it would show it like this:

--- ./up.sql	2023-09-03 21:44:31.359940340 +0200
+++ tmp_pg_format.sql	2023-09-03 21:44:40.928126619 +0200
@@ -7,4 +7,6 @@
 );
 
 CREATE INDEX idx_image_upload_local_user_id ON image_upload (local_user_id);
+
 CREATE INDEX idx_image_upload_alias ON image_upload (pictrs_alias);
+
--- ./down.sql	2023-09-03 21:44:25.827832471 +0200
+++ tmp_pg_format.sql	2023-09-03 21:46:11.473872548 +0200
@@ -1 +1,2 @@
 DROP TABLE image_upload;
+

ImageUpload::create(&mut context.pool(), &form)
.await
.map_err(error::ErrorBadRequest)?;
if let Some(uploaded_image) = image.first() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do this for all images rather than just the first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont really need to handle upload of multiple images at once. Is there some way to disable that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.4 and earlier is hardcoded to 10
0.5 makes this value configurable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the frontend breaks it up into individual calls with one image, but i suppose other frontends might not do that... Ill see if i can split it up

@makotech222 makotech222 force-pushed the pictrs-log-user branch 3 times, most recently from ffefb94 to 8102beb Compare September 4, 2023 14:41
pictrs_alias text NOT NULL,
pictrs_delete_token text NOT NULL,
published timestamptz DEFAULT now() NOT NULL
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, we should have added this a while ago.

@asonix Is there any way to ensure that this is the first / original uploader, rather than using the alias? It'd be nice if this table didn't store the aliases, but pictrs's official unique ID.

Then deletes can delete the image, not just an alias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do want to delete just the alias, especially if you are doing something like purgeuser, because we don't want to delete everything they have uploaded, which may point to images that others have uploaded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makotech is right. You do want the alias here & not the internal unique ID. pict-rs intentionally does not expose a way to retrieve the internal ID via the alias. Plus, aliases are already unique.

When a user deletes a post containing an image, that alias should be deleted, but the the image should not be purged. This would impact availability of that image in other posts that are not deleted.

Purge exists as an internal endpoint for use when an admin decides an image absolutely needs to be removed from all posts at the cost of breaking image links.

Copy link
Member

@dessalines dessalines Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few use cases here tho if we allow a ton of duplicate images / aliases in the DB, counted as local_user uploads:

  • Malicious images. If we delete only the alias link, then that image could still exist on the filesystem, if other people (or the same malicious actor) uploaded it several times as several different aliases. Because you have a ton of duplicate images, you don't know who the original uploader actually is, and whether to purge or just remove the alias.
  • Non-malicious images. Having a ton of duplicate images linked to local_users means we don't know whether to purge or delete the alias again, because purging a user could remove safe images added by other people.

Shouldn't the point of this table, be to show who the original uploader is, not someone who possibly uploaded a duplicate?

The purge user function needs to know whether that person is the first / original uploader, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal purge endpoint will purge an image given any alias. There's no distinction in pict-rs itself between an "original" upload and a "duplicate" upload. They are both considered aliases of the same hash.

Caring about who is the original uploader vs who uploaded a duplicate doesn't make sense to me. Ultimately, if duplicates exist, it just means multiple people uploaded the same file.

Copy link
Member

@dessalines dessalines Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caring about who is the original uploader vs who uploaded a duplicate doesn't make sense to me. Ultimately, if duplicates exist, it just means multiple people uploaded the same file.

Because the main issue, that's the reason for this iirc, is purging users, and completely removing the images that they have originally uploaded. Lets say someone creates an account and :

  • Uploads a few original malicious images.
  • Uploads a few non-malicous copies of other peoples images (aliases)

When we do a user purge, how do we know we aren't purging non-malicious images that don't belong to them?

For that we need not secondary aliases, but to know who the original uploader is. And that means a unique pictrs id of some kind.

Copy link
Contributor Author

@makotech222 makotech222 Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. The point of this PR is to really log who is uploading what, and to provide a list of images to delete when purging a user.

Pictrs creates an alias for an image if the image sha256 matches, so as to prevent duplicates. If only one alias exists for an image, then deleting the alias will also delete the image.

There is a separate user story that can be made for actually purging an image (original image and all aliases), but we currently don't have that functionality in lemmy. For every other use case though, we don't want to purge images just purely based on the uploader, since they could be uploading normal valid images which might be duplicates of other valid images.

Edit: I refreshed page and didn't realize you guys made more comments. We don't need to know original uploader, we just need to delete aliases. If only one alias exists, the image is also deleted. the non-malicious images won't be deleted in this case, only an alias.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with makotech, if an admin is deleting a user, that user's images should be deleted and not purged. In the majority of cases this will result in the media being removed.

If another user has also uploaded the Bad Image then that other user can also be deleted, which will result in the image being deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with yall then... personally I think a user purge should also purge (and not just remove aliases) for all the images they've uploaded, but I'll go with the consensus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would cause problems because another user might have uploaded the same image, so that would still be used. In fact this could be abused by a malicious user if he reuploads something like the site icon, then does something bad to get purged. In that case the site icon would get purged as well and its not what we want.

CREATE TABLE image_upload (
id serial PRIMARY KEY,
local_user_id int REFERENCES local_user ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
pictrs_alias text NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above, but this table should probably store not the aliases, but the unique internal pictrs ids, then this column made unique.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned in other comment: this column can be made unique as-is. aliases are unqiue.

@Nutomic
Copy link
Member

Nutomic commented Sep 5, 2023

Looks good to me. In the future we should make more changes in this area, so that admins can view user uploads via api, and also reference this image_upload table instead of raw urls for avatars, post links etc.

pictrs_alias text NOT NULL,
pictrs_delete_token text NOT NULL,
published timestamptz DEFAULT now() NOT NULL
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with yall then... personally I think a user purge should also purge (and not just remove aliases) for all the images they've uploaded, but I'll go with the consensus.

migrations/2023-08-31-205559_add_image_upload/up.sql Outdated Show resolved Hide resolved
crates/db_schema/src/impls/local_user.rs Outdated Show resolved Hide resolved
for upload in pictrs_uploads {
delete_image_from_pictrs(&upload.pictrs_alias, &upload.pictrs_delete_token, &context)
.await
.ok();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will potentially be called for hundreds or even thousands of images. You could delete multiple in parallel with join_all, but not sure if pictrs handles that well. Or use a background thread with spawn_try_task. cc @phiresky

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would limit concurrency here to a reasonable number like 32.

Look into tokio's JoinSet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some other unavoidable loops in purging, this is pry good for now.

@dessalines
Copy link
Member

and also reference this image_upload table instead of raw urls for avatars, post links etc.

That won't be possible I think, because this will only log locally uploaded pics, whereas avatars and icons could be hosted on other instances.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks a ton @makotech222 !

@dessalines dessalines merged commit fe3ebea into LemmyNet:main Sep 6, 2023
1 check passed
@Nutomic
Copy link
Member

Nutomic commented Sep 6, 2023

That won't be possible I think, because this will only log locally uploaded pics, whereas avatars and icons could be hosted on other instances.

Definitely wont be easy. I was thinking a link table which can be used to deduplicate external links and hold the mime type. Maybe optionally linked to this image_upload table. Then we also need to handle links in markdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants