Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot sorting pulling up 2 year old posts with no comments #3428

Closed
4 tasks done
calculuschild opened this issue Jun 30, 2023 · 12 comments
Closed
4 tasks done

Hot sorting pulling up 2 year old posts with no comments #3428

calculuschild opened this issue Jun 30, 2023 · 12 comments
Labels
bug Something isn't working

Comments

@calculuschild
Copy link

calculuschild commented Jun 30, 2023

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

Browsing "All" (from the Vlemmy instance), sorting by "hot", is pulling up 2-year-old posts with no comments and maybe one upvote. Seems like an error in the "hot" sorting algorithm.

Steps to Reproduce

  1. Filter by All
  2. Sort by Hot
  3. Scroll down maybe 5 items.
  4. Notice random old posts ftom Announcements @lemmy.ml showing up?

https://lemmy.ml/post/61856

Screenshot_20230630_090331_Jerboa

Technical Details

Don't have access to logs. Using Lemmy as a client. However the sorting algorithm and article fetching I assume is on the backend.

Version

0.18.0

Lemmy Instance URL

Vlemmy.net

@calculuschild calculuschild added the bug Something isn't working label Jun 30, 2023
@calculuschild
Copy link
Author

This screenshot is through the Jerboa android app, but it also occurs on PC in the browser (both Chrome and Firefox).

@wpuckering
Copy link

Having this same issue on my instance. Also running 0.18.0.

@Mutant
Copy link

Mutant commented Jul 3, 2023

Just a guess, but could it be that #3131 fixed the issue with the initial hot_rank value for posts that are synced, but didn't fix it for posts that were previously synced before 0.18.0? @sunaurus?

@sunaurus
Copy link
Collaborator

sunaurus commented Jul 3, 2023

That could be the case, but it would mean that the hot rank calculation on lemmy_server launch has failed. Could anybody here who is seeing this issue restart lemmy_server with RUST_LOG=info and check for errors during the initial hot rank calculation? The relevant logs will start with "Updating hot ranks for all history..."

By the way, I'm not seeing any of these old posts crop up on the front page of lemm.ee at all after #3131, even though we are certainly pulling in old posts all the time.

@Mutant
Copy link

Mutant commented Jul 3, 2023

Yes, it seems to affect smaller servers more than larger ones.

e.g. this looks fine on lemmy.world: https://lemmy.world/c/linuxmemes?dataType=Post&page=1&sort=Hot

but on a smaller server, older posts are ranked high: https://lemmy.nz/c/linuxmemes@lemmy.world?dataType=Post&page=1&sort=Hot

@LemiSt24
Copy link

LemiSt24 commented Jul 4, 2023

FWIW, I had the same problem on my very small instance, restarted Lemmy twice to take a look at the logs as @sunaurus suggested and now I don't get these old posts when sorting by "Hot" anymore 🥳

Edit: nevermind, they came back after some time :( although not as many as before. And, unlike before, there actually are recent posts below the age-old ones.

@Mutant
Copy link

Mutant commented Jul 5, 2023

Yes, we've seen something similar on our instance. The log lines for the initial hot rank update does appear. It seems like initially after restart there are no "stale" posts in the hot listing. However, over time posts start to get the hot_rank 1728 including posts older than a day (sometimes months old), and the ranking doesn't decrease.

So it seems to me as if #3131 hasn't completely fixed this issue.

@yads
Copy link

yads commented Jul 9, 2023

This seems to be an issue with certain communities. For example programmerhumor@lemmy.ml. If I'm subscribed to that instance it overwhelms my Hot feed. Unsubbing makes my Hot feed normal.

@Mutant
Copy link

Mutant commented Jul 9, 2023

Here's what I believe is happening:

  • Older posts (older than 1 day) get synced to an instance, e.g. when a user from that instance views them (and they are the first user to view that post).
  • When that happens, the older posts should be getting set to hot_rank=0 when they are written to the DB (which is what Calculate initial hot_rank and hot_rank_active for posts and comments from other instances #3131 does)
  • However, in some (but not all) cases, this is not happening, so these old posts are left with a hot_rank of 1728.
  • The hot_rank on those posts won't change until the server is restarted. At that point, they are updated to be hot_rank 0

This explains why this problem is more apparent on smaller servers - on large servers, users will "discover" all the old posts, so there are less of them to find since the last restart. But on smaller servers, after a server restart there are still old posts that have yet to be synced, so over time these will start polluting the hot rankings.

What I don't know is why some old posts seem to get stuck at 1728 and why some don't. It could be some transitory issue (e.g. deadlock), but then I would have thought a scheduled job would later correct the hot_rank? At any rate, there are enough old posts with a hot_rank of 1728 to effectively make the hot listing unusable on smaller servers about 12 hours after a restart.

@sunaurus
Copy link
Collaborator

I found out why I wasn't able to reproduce this - I had an additional fix still active on lemm.ee. I should be able to disable that and reproduce it now, so I will try to submit a better fix soon!

@dessalines
Copy link
Member

dessalines commented Jul 14, 2023

One other thing I'll note, that's an issue with our current sorts:

The code order by is (featured desc, hot_rank desc), when in reality it should be featured_desc, hot_rank desc, published desc)

Small communities with few posts especially, will have a ton of posts with hot_rank = 0 , so it sorts them randomly.

I'll create a PR for this shortly.

@Mutant
Copy link

Mutant commented Jul 20, 2023

Does #3618 actually fix the underlying issue? It seems like it just adds a published date to the aggregates table, so it will address the issue mentioned by @dessalines. But it won't fix the problem of old posts erroneously getting a hot_rank value of 1728 (unless I'm missing something).

Edit: apologies, I did miss something, i.e. the earlier PR that fixes this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants