Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite images to use local proxy #4035

Merged
merged 55 commits into from Jan 25, 2024
Merged

Rewrite images to use local proxy #4035

merged 55 commits into from Jan 25, 2024

Conversation

Nutomic
Copy link
Member

@Nutomic Nutomic commented Oct 11, 2023

I decided to play around with markdown handling to see how we can rewrite image and link urls. Turns out its pretty simple. So far it does this:

  • Add rel=nofollow to all links to discourage spammers. Note that this wont affect lemmy-ui or other frontends as they do their own markdown rendering. It will only affect emails, RSS feeds and different federated platforms.
  • Rewrite all images to https://lemmy-alpha/api/v3/image_proxy?url={url} so that users dont connect directly to remote servers.

This PR doesnt have any breaking changes so it can be merged whenever.

Todo:

  • Testing
  • Use pictrs 0.5 for image proxying
  • Add config option for image_caching (disabled by default for now)
  • Need to handle image urls differently, so that api clients dont have to manually rewrite them with proxy url (Rewrite image links in markdown markdown-it-rust/markdown-it#36)
  • Call markdown_rewrite_image_links() on all markdown before writing to db (api and apub)
  • Also rewrite images for avatars, banners etc
  • In image_proxy handler, check that image url corresponds to federated image so that it can't be abused as proxy for arbitrary purposes. This means keeping a db table of known remote images, and passing a db connection into markdown parser to write to the table. Remote avatars, banners etc also need to be written.
  • Rewrite markdown links in RSS feeds, emails to set rel=nofollow
  • Cleanup request.rs which generates link metadata and thumbnail
  • Store content type of post.url in database
  • Rename setting cache_remote_thumbnails to disable_external_link_previews
  • Rewrite post url with proxy
  • Federate post url as Activitypub Image based on mime type for better compatibility
  • Add a cache for proxied images with configurable size, stored on disk (or rather leave this to nginx?) -> leave to pictrs

@Nutomic Nutomic changed the title Markdown link rule Rewrite images to use local proxy, add rel=nofollow to links Oct 13, 2023
url: String,
}

async fn image_proxy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asonix would using the lemmy-backend to proxy picture requests be a better approach, or maybe waiting for pictrs v.50 which has picture proxying, and using those pictrs routes? Any pros/cons to either?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxying images through lemmy is fine, but it means the original server must be online and capable of serving the image for each request. using pict-rs' proxy method in 0.5 will cache the proxied image and reduce load on the original server

I think doing this initial proxy work now probably makes sense, and using pict-rs' proxying in the future can be made a lower priority (and give more time to folks to upgrade to 0.5 when it releases before you start depending on 0.5 endpoints)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case @Nutomic just add a TODO somewhere that we can remove this once pictrs 0.5 is released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, adding comment.

Copy link
Member Author

@Nutomic Nutomic Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to use pictrs for proxying. Also added a config option for image proxy which is disabled by default, with comment noting that pictrs 0.5 is required.

@dessalines
Copy link
Member

dessalines commented Oct 23, 2023

One other concern I have that makes me think image proxying should probably be done in front ends: cross-posting.

If a link URL points to an image on instance1.tld/...pictrs/, then its currently possible to see all its cross-posts across the lemmyverse. That might not work correctly if we rewrite all image links.

IE so maybe instead of actually rewriting links, we just provide the image_proxy endpoint, and let front ends use it.

@Nutomic
Copy link
Member Author

Nutomic commented Oct 24, 2023

@dessalines If multiple posts have the same url, those will all be rewritten to the identical image_proxy url so crossposts will still work. If this logic is handled in the frontend then each one will have to reimplement the same logic, doesnt make sense.

@@ -65,7 +65,15 @@ pub struct PictrsConfig {

/// Cache remote images
#[default(true)]
pub cache_remote_images: bool,
pub cache_remote_thumbnails: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting might be unnecessary now. However the file request.rs where its used is quite the mess, not sure when I can get around to cleaning it up (it needs almost a complete rewrite).

@Nutomic Nutomic changed the title Rewrite images to use local proxy, add rel=nofollow to links Rewrite images to use local proxy Oct 26, 2023
@Nutomic Nutomic marked this pull request as ready for review October 26, 2023 10:39
@Nutomic
Copy link
Member Author

Nutomic commented Oct 26, 2023

Ready to review now.

Edit: Did some testing using local docker setup and its working as expected


if !pictrs_config.cache_external_link_previews {
return Ok(
proxy_image_link(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block here is confusing now. Maybe we should remove cache_external_link_previews entirely now, and just leave image_proxy?

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache_external_link_previews and image_proxy seem like two bools that refer to the same thing now then.

I'm tracing down every usage of RemoteImage::create, and sometimes it uses the first setting, sometimes the 2nd. We should simplify and get rid of one of them.

@kroese
Copy link
Contributor

kroese commented Jan 5, 2024

@dessalines The cache_external_link_previews setting has a bit confusing name, I wish it was called store_thumbnails or something like that. Because that would make it much more clear what the setting does.

But it doesn't refer to the same thing, its purpose is to specify wether you want to store thumbnails locally (in pict-rs), or just hotlink to external images. And the image_proxy setting specifies if you want to use a proxy for external thumbnails.

In my case I dont want the images to be proxied and I also dont want to store them locally. So a single setting would not be enough.

@dessalines
Copy link
Member

dessalines commented Jan 5, 2024

@kroese @Nutomic in that case they seem exclusive then, and we have 3 options, and should use an enum rather than 2 bools:

  • Use the pictrs proxy for all external images
  • Save all external images (cache is a bad term anyway)
  • Directly link to all external images

Some names could be {UsePictrsProxy, SaveInPictrs, UseDirectLink}

@kroese
Copy link
Contributor

kroese commented Jan 5, 2024

Yes they are exclusive, because the value of image_proxy has no meaning until you set cache_external_link_previews to false. So I agree an enum would make more sense.

Also in the current release there is a bug that cache_external_link_previews is only applied to thumbnails coming from federated posts. But it should also apply to thumbnails generated from the og:image metadata for local posts. I hope this pullrequests solves that issue.

dessalines and others added 3 commits January 8, 2024 11:56
* Extracting opengraph_data to its own type.

* A few additions for markdown-link-rule.

---------

Co-authored-by: Nutomic <me@nutomic.com>
@Nutomic
Copy link
Member Author

Nutomic commented Jan 8, 2024

Makes sense, Ive converted it to an enum now. You can see it in config/defaults.hjson. For now Im keeping the existing behaviour as default to avoid breaking changes in a minor version. But if the proxying works well might be able to remove StoreLinkPreviews in 0.20 and make ProxyAllImages the default.

@kroese
Copy link
Contributor

kroese commented Jan 8, 2024

I still think the naming is not as clear as it could be. The setting is called image_mode but there are other types of images (inside comments, avatars, banners) to which the setting doesn't apply. Also values like StoreLinkPreviews are way more verbose than needed.

If it was my decision I would just call the setting thumbnails and the values store, link and proxy for example.

@Nutomic
Copy link
Member Author

Nutomic commented Jan 8, 2024

The new image proxying from this PR applies to all images, including avatars, markdown embedded images etc. However the old storing of preview images only applies to post urls. So I think the naming makes sense.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! BTW pictrs v0.5.0 is released now.

Did you want this to go in the upcoming release, or leave it for later? My main concern is I want our release notes to give full instructions on how to upgrade pictrs.

@Nutomic
Copy link
Member Author

Nutomic commented Jan 10, 2024

Should be fine to include it as it doesnt change the main behaviour unless you add the config option. For pictrs we can simply link to the readme.

@Nutomic
Copy link
Member Author

Nutomic commented Jan 10, 2024

Added cache_external_link_previews config option back in to avoid breaking changes.

@dessalines
Copy link
Member

dessalines commented Jan 23, 2024

LGTM, but lets have @phiresky take a look before merging.

And when you can, please add some detailed instructions for LemmyNet/lemmy-ansible#213 , because I'm sure this will cause some issues unless people know the proper way to upgrade pictrs seamlessly.

@dessalines dessalines merged commit e8a52d3 into main Jan 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants