Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding some recurring lemmy tasks. #1386

Merged
merged 3 commits into from
Jan 29, 2021
Merged

Adding some recurring lemmy tasks. #1386

merged 3 commits into from
Jan 29, 2021

Conversation

dessalines
Copy link
Member

- Add active users by day, week, month, and half year to site and
  community. Fixes #1195
- Periodically re-index the aggregates tables that use hot_rank.
  Fixes #1384
- Clear out old activities (> 6 months). Fixes #1133

fn delete_olds(conn: &PgConnection) -> Result<usize, Error> {
use lemmy_db_schema::schema::activity::dsl::*;
diesel::delete(activity.filter(published.lt(now - 6.months()))).execute(conn)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with deleting activities older than 6 months.

select c.creator_id from comment c
inner join user_ u on c.creator_id = u.id
where c.published > ('now'::timestamp - i::interval)
and u.local = true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the site counts, it makes sense to use user.local = true, because these are the values on the-federation.info ... and they wanna know counts on your specific instance.

select count(*), community_id
from (
select c.creator_id, p.community_id from comment c
inner join post p on c.post_id = p.id
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For community counts tho, I don't use a community.local or user.local filter, because federated users might be doing lots of things on your community. Or you might be viewing a non-local community on your instance, and you still want to know the active users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For remote communities you are not gonna see the correct user count, only the number of users from your instance which are subscribed to that community. Only if you are viewing the community on the instance where it is hosted will you see the correct follower count this way. It would be better to use the follower count from /c/{community}/followers (although that can be manipulated).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For remote communities you are not gonna see the correct user count, only the number of users from your instance which are subscribed to that community.

As soon as someone subscribes to that remote community, your instance should fetch users as they comment and post over there. So the counts shouldn't be too far off. I do wanna keep these as local DB queries rather than remote fetches, even if the numbers are a little off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will fetch the users, but I dont think they get inserted into the Followers table. That happens only on the instance where the community is, and where the user is. You can check a production DB to make sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queries actually don't care about the followers table, they only care whether any users, subscribed or not, have made a post or comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, then it should be mostly fine. Only problem is when no local user is subscribed to the community, or when local users only have been subscribed for a short time (eg, first local user subscribed a week ago, then the one month and six month statistics would be wrong). You might hide some info in those cases, or simply put a disclaimer that data might be inaccurate, and link to the original instance.

Comment on lines +51 to +55
let pool2 = pool.clone();
thread::spawn(move || {
scheduled_tasks::setup(pool2);
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I messed with things for a while, trying to do it via actix threads, but this seemed to be the only one that creates a non-blocking / background thread.


// Reindex the aggregates tables every one hour
// This is necessary because hot_rank is actually a mutable function:
// https://dba.stackexchange.com/questions/284052/how-to-create-an-index-based-on-a-time-based-function-in-postgres?noredirect=1#comment555727_284052
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read this if you want to see some DBA's hate on me :) . I can't find another way around this issue tho. The performance gain from adding an index (even one that degrades over time), is gigantic: (2 seconds vs 20 ms).

Comment on lines +15 to +20
active_counts(&conn);
reindex_aggregates_tables(&conn);
scheduler.every(1.hour()).run(move || {
active_counts(&conn);
reindex_aggregates_tables(&conn);
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running actives and re-indexing every hour, and right at the beginning.

Comment on lines +22 to +26
let conn = pool.get().unwrap();
clear_old_activities(&conn);
scheduler.every(1.weeks()).run(move || {
clear_old_activities(&conn);
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running clearing out old activities every week (it deletes them older than 6 months)

@dessalines dessalines marked this pull request as ready for review January 28, 2021 14:10
@dessalines dessalines requested a review from Nutomic January 28, 2021 14:10
@Nutomic
Copy link
Member

Nutomic commented Jan 29, 2021

I think you should also write some documentation with details about how these stats are generated.

@dessalines
Copy link
Member Author

K done here: LemmyNet/lemmy-docs#14

@dessalines dessalines merged commit 0fd0279 into main Jan 29, 2021
@dessalines dessalines deleted the recurring_tasks branch April 7, 2021 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants