Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch comments and votes (and more) from remote instances #2825

Closed
mdesmedt opened this issue Apr 22, 2023 · 8 comments
Closed

Fetch comments and votes (and more) from remote instances #2825

mdesmedt opened this issue Apr 22, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@mdesmedt
Copy link

Is your proposal related to a problem?

I set up my own Lemmy instance. I remote subscribed to https://lemmy.ml/c/lemmy as a test. All of this works and I can see all the posts. However all the post scores are 1 and I see absolutely no comments. This is similar to the behavior of Mastodon (replies and favs are missing if not previously pushed) where it is also annoying, but it seems much more problematic for a social link aggregator where discovery and hopping between communities is more likely.

Describe the solution you'd like

Please let Lemmy instances fetch comments and scores (and more if applicable) from remote instances hosting the community or post to be displayed. This would go a long way to a unified experience which makes it appear like a monolithic app while still being on Fedi.

Describe alternatives you've considered

If it can't be done on-demand, perhaps at least make Lemmy lazily fetch and synchronize comments and scores for subscribed communities to keep the local instance appear to be in sync with the remote instance.

Additional context

I hope Mastodon and other ActivityPub applications will also address these issues at some point.

@mdesmedt mdesmedt added the enhancement New feature or request label Apr 22, 2023
@dessalines
Copy link
Member

This is called "history fetching", and although we fetch a few recent posts, we don't fetch all historical content, and neither does mastodon. That would require tons of network calls, and potentially blow up the size of your database.

I understand the concern, but lemmy follows the standard of only getting content after you've subscribed.

@mdesmedt
Copy link
Author

This is called "history fetching", and although we fetch a few recent posts, we don't fetch all historical content, and neither does mastodon. That would require tons of network calls, and potentially blow up the size of your database.

I understand the concern, but lemmy follows the standard of only getting content after you've subscribed.

Yeah I understand why Lemmy (and Mastodon) do this, as opposed to systems like Matrix which do full replication, but I think a sane middle ground exists. If I enter https://lemmy.ml/c/lemmy in my search bar and it fetches the first page of 9 posts, I don't see why fetching comments & scores for these posts would trigger "tons of network calls". If the current protocol requires this (I am not expert), perhaps the protocol could be extended to fetch the entire state of a post in a single call.

Microblogging is an ephemeral experience so Mastodon sort-of gets away with this (although I constantly have to "Open original page" to view replies to a boosted post, which still sucks) but if Lemmy wants to enable the rich conversational experience of Reddit with Fedi I believe this aspect of the UI has to be, and could be, improved. Personally I'll take a few more network calls and database entries.

@mdesmedt
Copy link
Author

Fairly basic functionality like linking to posts on various communities, or old posts before you might have subscribed, will not work well unless something like this is implemented. Imagine something like "AskHistorians" where if someone links to an earlier post before you were subscribed and you get, at best, only the question not the replies. And no "Best Of" style communities or crossposting.

Extra replication to view the state of entire posts pre-community-subscription would add more traffic and storage, but I think it would be well worth it to have a richer and more usable platform.

@dessalines
Copy link
Member

If I enter https://lemmy.ml/c/lemmy in my search bar and it fetches the first page of 9 posts, I don't see why fetching comments & scores for these posts would trigger "tons of network calls".

ActivityPub is based on a "push" methodology, IE pushing updates to your subscribers. Most projects like ours do serve an "outbox", but it is limited.

If you want an idea of why history fetching could be dangerous: imagine fetching the entire history of posts and comments for any given subreddit. This would take thousands of paged network calls, be gigabytes of data, and potentially millions of posts and comments.

You can cache / save any individual posts or comments by specifically searching for them from your instance.

@mdesmedt
Copy link
Author

If you want an idea of why history fetching could be dangerous: imagine fetching the entire history of posts and comments for any given subreddit. This would take thousands of paged network calls, be gigabytes of data, and potentially millions of posts and comments.

Sure, absolutely, but that's not a usecase I'm illustrating. For example, the top post on lemmy.ml currently is "Welcome Reddit refugees!" with 30 comments. If I were to view this post for the first time in my instance and it was not previously cached, fetching the score and the comments on this post should, ideally, be a single GET with (I would estimate) somewhat less data transferred than the HTML page viewed by the browser. This would also only need to be transferred once, because my local instance will cache the post just like it would if I were previously subscribed to the community.

I can imagine if the Lemmyverse gets incredibly popular this would potentially cause a bit more database entries on my local instance, but that's where limits on caches could come in (like Mastodon's media cache limits which grow crazy fast). If a cached post gets expired it would have to be refetched, but again how much extra load is this really when a browser can request a post at any time anyway (which currently users would have to).

Currently Lemmy only really works (as in: I see the content which is actually there without having to use a browser to go to a remote instance) if I'm engaging with posts in communities I have previously subscribed to. This is limited compared to the experience offered by the monolithic sites. It need not have to be.

@dessalines
Copy link
Member

You can cache / save any individual posts or comments by specifically searching for them from your instance.

@dessalines dessalines closed this as not planned Won't fix, can't repro, duplicate, stale Apr 23, 2023
@mdesmedt
Copy link
Author

Just to be clear: If I, or someone else, makes a PR which lets users fetch remote posts through their own instance which look complete (with score and comments) unlike the current empty posts and incorrect scores which IMO look broken will you consider taking it, or is sticking with the subscribe/push model a core principle, no matter what?

I'll reiterate that not implementing something like this will really limit how much Lemmy can replace sites like Reddit, where browsing to and interacting with random communities and posts just works. But it's your project...

@Nutomic
Copy link
Member

Nutomic commented Apr 23, 2023

Its definitely possible to implement this with Activitypub. Problem is that Lemmy currently does all fetches of remote data within a single API call. So if you fetch a new community it already fetches the last 20 posts and also has to fetch the user profiles of all post authors. If comments and votes should also be fetched, that would require fetching up to hundreds or thousands of user profiles within a single API request which would be too slow. So it would require adding some kind of background queue to fetch remote data, or other optimizations. All in all its a lot of work for a relatively minor problem with available workarounds. If we had a team of 5 or 10 devs then we would definitely work on issues like this, but as its only the two of us we have to focus on issues which have higher impact for less effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants