Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU and network usage with 1000+ friends #2178

Open
alexbakker opened this issue Mar 25, 2022 · 2 comments
Open

High CPU and network usage with 1000+ friends #2178

alexbakker opened this issue Mar 25, 2022 · 2 comments
Labels
P3 Low priority performance A code change that improves performance
Milestone

Comments

@alexbakker
Copy link

alexbakker commented Mar 25, 2022

I maintain EchoBot, a small service for Tox that allows users to test their audio and video. It runs version 0.2.17 of c-toxcore. We've recently started seeing very large amounts of CPU and network usage. 100% CPU usage on the Tox thread and around 28 Mbit/s worth of network transmission, continuously. EchoBot has always been rough on resources when starting up, but would settle down eventually. The settling down part appears to no longer happen now that it has ~1400 friends.

I used Jfreegman's netprof branch of toxcore to do some monitoring of the types of packets that are being sent. Here's a chart of the top packet types over the course of half a day:

Toxcore sends almost ~7500 ONION_SEND_INITIAL packets per second, continuously. @JFreegman provided some patches to try and nail down why toxcore seems to never back off of sending so many onion packets. He found that in this case toxcore regularly thinks that we're no longer connected to the Tox network and then resets the announcement run counter:

if (onion_isconnected(onion_c)) {
if (mono_time_is_timeout(onion_c->mono_time, onion_c->last_time_connected, ONION_CONNECTED_TIMEOUT)) {
reset_friend_run_counts(onion_c);
}
onion_c->last_time_connected = mono_time_get(onion_c->mono_time);
if (onion_c->onion_connected < ONION_CONNECTION_SECONDS * 2) {
++onion_c->onion_connected;
}

I think there are possibly two things to do here:

  • Find out why toxcore thinks we're not connected the Tox network anymore (maybe it's right?)
  • Determine whether the observed network traffic is expected. Even with 1400 friends, ~7500 packets per second seems a bit excessive.
@iphydf iphydf added P3 Low priority performance A code change that improves performance labels Mar 25, 2022
@iphydf iphydf added this to the v0.2.x milestone Mar 25, 2022
@emdee-is
Copy link

Did this get addressed?

If not, could toxcore put a delay in to not try to reconnect more than x times a second, settable at compile time or from an environment variable?

@AndyTOX
Copy link

AndyTOX commented Oct 14, 2022

Hi, how about a workaround until the issue get proper fixed ?
I would propose that EchoBot "forget" a friend after an hour.
This way you never get 1000+ concurrent user.
The most common EchoBot use case is 1st time test and re-test after hardware/environment changes.
A 1 hour "test window" would be fine in, I guess, 99% of all use cases. If not, just remove EchoBot and re-add it to get another hour. RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Low priority performance A code change that improves performance
Projects
None yet
Development

No branches or pull requests

4 participants