-
-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding some recurring lemmy tasks. #1386
Conversation
|
||
fn delete_olds(conn: &PgConnection) -> Result<usize, Error> { | ||
use lemmy_db_schema::schema::activity::dsl::*; | ||
diesel::delete(activity.filter(published.lt(now - 6.months()))).execute(conn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with deleting activities older than 6 months.
select c.creator_id from comment c | ||
inner join user_ u on c.creator_id = u.id | ||
where c.published > ('now'::timestamp - i::interval) | ||
and u.local = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the site counts, it makes sense to use user.local = true
, because these are the values on the-federation.info ... and they wanna know counts on your specific instance.
select count(*), community_id | ||
from ( | ||
select c.creator_id, p.community_id from comment c | ||
inner join post p on c.post_id = p.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For community counts tho, I don't use a community.local
or user.local
filter, because federated users might be doing lots of things on your community. Or you might be viewing a non-local community on your instance, and you still want to know the active users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For remote communities you are not gonna see the correct user count, only the number of users from your instance which are subscribed to that community. Only if you are viewing the community on the instance where it is hosted will you see the correct follower count this way. It would be better to use the follower count from /c/{community}/followers
(although that can be manipulated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For remote communities you are not gonna see the correct user count, only the number of users from your instance which are subscribed to that community.
As soon as someone subscribes to that remote community, your instance should fetch users as they comment and post over there. So the counts shouldn't be too far off. I do wanna keep these as local DB queries rather than remote fetches, even if the numbers are a little off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will fetch the users, but I dont think they get inserted into the Followers
table. That happens only on the instance where the community is, and where the user is. You can check a production DB to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queries actually don't care about the followers table, they only care whether any users, subscribed or not, have made a post or comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, then it should be mostly fine. Only problem is when no local user is subscribed to the community, or when local users only have been subscribed for a short time (eg, first local user subscribed a week ago, then the one month and six month statistics would be wrong). You might hide some info in those cases, or simply put a disclaimer that data might be inaccurate, and link to the original instance.
let pool2 = pool.clone(); | ||
thread::spawn(move || { | ||
scheduled_tasks::setup(pool2); | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I messed with things for a while, trying to do it via actix threads, but this seemed to be the only one that creates a non-blocking / background thread.
src/scheduled_tasks.rs
Outdated
|
||
// Reindex the aggregates tables every one hour | ||
// This is necessary because hot_rank is actually a mutable function: | ||
// https://dba.stackexchange.com/questions/284052/how-to-create-an-index-based-on-a-time-based-function-in-postgres?noredirect=1#comment555727_284052 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read this if you want to see some DBA's hate on me :) . I can't find another way around this issue tho. The performance gain from adding an index (even one that degrades over time), is gigantic: (2 seconds vs 20 ms).
active_counts(&conn); | ||
reindex_aggregates_tables(&conn); | ||
scheduler.every(1.hour()).run(move || { | ||
active_counts(&conn); | ||
reindex_aggregates_tables(&conn); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running actives and re-indexing every hour, and right at the beginning.
let conn = pool.get().unwrap(); | ||
clear_old_activities(&conn); | ||
scheduler.every(1.weeks()).run(move || { | ||
clear_old_activities(&conn); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running clearing out old activities every week (it deletes them older than 6 months)
I think you should also write some documentation with details about how these stats are generated. |
K done here: LemmyNet/lemmy-docs#14 |
community. Fixes Create an active users in the last week query. #1195
Fixes Indexes based on hot rank degrade over time due to postgres function mutability. #1384