Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing high CPU usage on federation worker recheck + fix federation tests. Fixes #3958 #3960

Merged
merged 48 commits into from
Sep 21, 2023

Conversation

dessalines
Copy link
Member

No description provided.

@phiresky
Copy link
Collaborator

That index is great, I didn't see that because on my database I guess the person table is too small for it to matter.

The delays affect how long different parts of the federation take to update when new things are first seen (new subscription, new instance, ...). I'll add a bit more comments on that to the code I think.

My suggestion would be to change #[cfg(debug_assertions)] to make them settable via an environment variable which can then be set in start-federation-test.sh just like LEMMY_SYNCHRONOUS_FEDERATION=1 was.

Should I update this PR or make a new one?

@phiresky
Copy link
Collaborator

phiresky commented Sep 13, 2023

@dessalines I've updated this PR with a few additional comments and adding a LEMMY_TEST_FAST_FEDERATION env var instead of using debug_assertions.

@dessalines
Copy link
Member Author

Thanks! I'll test it locally now.

I'll check my postgres logs to see if there are any other repeated queries I should check out.

@dessalines
Copy link
Member Author

@phiresky thx! I'm still getting a lot of intermittent api_test failures, even when I run it locally. Sometimes it happens on comment.tsx, sometimes on private_message.tsx .

● Fetch in_reply_tos: A is unsubbed from B, B makes a post, and some embedded comments, A subs to B, B updates the lowest level comment, A fetches both the post and all the inreplyto comments for that post.

Could you take a look?

@phiresky
Copy link
Collaborator

phiresky commented Sep 15, 2023

Ah. The reason was the environment variable was set in the wrong file. I've changed it, it should be fixed now.

Edit: still some failure

@phiresky phiresky changed the title Fixing high CPU usage on federation worker recheck. Fixes #3958 Fixing high CPU usage on federation worker recheck + fix federation tests. Fixes #3958 Sep 18, 2023
@phiresky phiresky marked this pull request as draft September 20, 2023 12:32
@phiresky
Copy link
Collaborator

phiresky commented Sep 20, 2023

Ok. I've fixed a few things in the tests and two actual bugs:

  1. the community followers non-incremental update was implemented wrongly (not fetching all follows) which means after an hour of running the instance probably stops federating correctly
  2. there was a race condition in how the latest activity was found which means when the queue fetched from db but a new entry was currently being inserted, the new entry could be missed. (unlikely to actually happen in real life but happened fairly often in the tests).

This is in addition to the index addition and change that reduces CPU usage for low-usage instances.

Also added a "unrelated" change: I've changed the activity ids for announce activities to include the inner activity (/activity/announce/UUID to /activity/announce/create) for easier debugging)

@phiresky phiresky marked this pull request as ready for review September 20, 2023 16:07
@dessalines
Copy link
Member Author

Both checks passed, yay!

@phiresky
Copy link
Collaborator

Yes. We should merge this now. Hopefully nothing more should pop up, if it does I'd adress it in a separate PR.

@@ -324,6 +346,6 @@ pub async fn match_outgoing_activities(
}
}
};
spawn_try_task(fed_task);
fed_task.await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can just get rid of the async block above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did try but would cause a fair amount of changes because it changes the behaviour of ? operator so it was a lot of changes i didn't want to do to make the PR noisier

@phiresky phiresky merged commit 24c98a7 into main Sep 21, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants