Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Save bandwidth by moderating onion pinging #542
Currently, a large majority of the traffic in the tox network is generated by
In part this is by fixing some bugs; probably most significantly, previously
Then it implements two schemes for reducing unnecessary pinging. The first is
The second is to gradually reduce the rate of pinging we do to check on offline
This last should be the only change which could cause any problems for users.
Please note when testing this PR that you shouldn't expect to see a huge
The other potential problem I meant to mention: this PR reduces the
* Monday, 2017-05-01 at 12:55 -0700 - SkyzohKey <firstname.lastname@example.org>:
Does this allows ToxCore to perform better on mobile devices. How I understand that PR it would fix bandwidth issues on mobile, right ?
It should help a lot, but tox will always use a fair amount of bandwidth. Making up numbers which might be way off, I guess this should get it down to around 2-3KBps in each direction when idling, and with more work (optimising DHT traffic, and making sure we only send good nodes in response to onion requests) I have the impresion that as low as 1 could be feasible. I don't know if that's good enough for mobile users.
@mannol actually on mobile the "per second" does not matter when it comes to battery usage.
on mobile (in my opinion) only timeouts really matter. so that the CPU can go into deepsleep and save battery.
@mannol yeah, that's why we need at least 2 minutes total silence on Android to get some deep sleep.
tests have shown it does wonders to your battery life
I can haz dis data tooo?
Which commit[s] fixes this?
reading through the code, what would you think about externalizing this logic a bit. E.g., can we make the decision in one function, and do the work in another. Instead of the current decide+act in the same functions.
Onion paths, so if one of the nodes goes down, they whole route is stale? And what time frame did you do you testing.
Is that friend requests, or friend connections.
Tox isn't very adaptive to poor network conditions, if I'm on a bad connection, would this diff change it from hard to make a connection, to impossible?
Which network? The onion network, or the DHT network?
What data are you using to get this estimate?
Yeah, as @mannol said. Delay in between is what helps mobile users the most. The goal that will best help mobile users will be LONG delays between packets, not just fewer packets. That said, this will be make it much easier to use on metered connections.
The DHT backend is what makes tox a good protocol. The problem is not that tox uses the DHT on mobile, the problem is the DHT is dumb. I smart DHT protocol should be able to bootstrap, and find and connect to ALL online friends in under 5s. Creating and validating onion routes is one of the things that makes this hard to do. So any optimizations of onion routing is a good place to start. (A part from dropping the onion for announce/friend finding)
* Wednesday, 2017-05-10 at 23:03 -0700 - Gregory Mullen <email@example.com>:
>In part this is by fixing some bugs; probably most significantly, >previously the nodes for friends were always considered timed out when >we came to ping them, since the timeout for announce nodes was used >for the check rather than that for friend nodes, leading to some extra >traffic. Which commit[s] fixes this?
commit f3d32c9 "Revise onion node time out calculation"
> Then it implements two schemes for reducing unnecessary pinging. reading through the code, what would you think about externalizing this logic a bit. E.g., can we make the decision in one function, and do the work in another. Instead of the current decide+act in the same functions.
Normally I would do that. But the current code seems to avoid defining functions which will only be called at one point in the code, so perhaps it's better to stick to that style.
>The first is to start to trust slightly announce nodes and paths which >have been alive for a while, and not ping them so aggressively [...] >with no discernable problems. Onion paths, so if one of the nodes goes down, they whole route is stale?
Onion paths, yes. If we fail to get a response along a path, that path stops being considered stable, so we will quickly observe if it is consistently failing and needs to be replaced.
And what time frame did you do you testing.
Various. Nodes and routes only take 90s to reach the "stable" state, so the time frame doesn't matter much for this. But generally, it would be great if people could test out this PR with their own situations and usage patterns, and confirm that it doesn't cause any problems in practice. Since all it does is reduce onion requests, there should be no network effects to worry about - if tox with this patch works properly while the network at large is using unpatched tox, it should work just as well if everyone makes the switch.
>The second is to gradually reduce the rate of pinging we do to check >on offline friends when they've been offline for some time. [...] With >the constants as I've set them, the extreme case is that they stay >offline for at least 1600 minutes, and then the request will take >something like 5 minutes on average to get through. Is that friend requests, or friend connections.
Friend requests and friend searches. Friend connections are separate, and this PR doesn't touch them.
> The same delay applies to established friends who come online after > being offline for a while and who somehow fail to find our > announcement - that shouldn't happen, but possibly it sometimes does > (and an attacker can cause it by poisoning our announcement > neighbourhood). Tox isn't very adaptive to poor network conditions, if I'm on a bad connection, would this diff change it from hard to make a connection, to impossible?
The only negative interaction I can see is that in situations of severe packet loss which, with the current interval between requests, are not quite severe enough to prevent finding friend nodes, increasing the interval means we more often make requests to nodes which have disappeared - and this could nudge us over the edge to a situation where we never manage to find nodes close enough to the target friend. I expect this effect is quite small, but it's hard to quantify. This is still only an issue in the case that a friend comes online after we've been searching for them for some time, and they somehow fail to find our own announcement (which as far as I can see means deliberate poisoning or that they are suffering from similarly extreme packet loss).
> The other potential problem I meant to mention: this PR reduces the >sensitivity to being disconnected from the network, since this is >detected by seeing if enough time passes without us receiving >a response to an onion request, and now we send requests less >regularly it can happen that we don't receive any responses for >a while. I've increased the timeout from 19s to 75s, which is enough >to make false positives very unlikely. If this is a problem, it >wouldn't be too hard to add a mechanism to deliberately test the >network if we receive nothing for say 30s. Which network? The onion network, or the DHT network?
I meant the onion. See onion_client.c:onion_isconnected.
> It should help a lot, but tox will always use a fair amount of > bandwidth. Making up numbers which might be way off, I guess this > should get it down to around 2-3KBps in each direction when idling, What data are you using to get this estimate?
Observing (by packet sniffing and printf-debugging) my own node running with this patch, I observe sending ~0.5 onion packet requests per second for announcement. Once things are settled, traffic for friend searching is something like ~0.01 per offline friend. But rates are much higher at first. What this means on average depends on how many offline friends users have and how long their nodes stay online, but let's estimate the average total request rate as 1 packet per second. Each request, assuming the packet doesn't get dropped anywhere along the way, generates 403+395+387+354+416+357+298+238=2848 bytes (packet kinds 80-83,8c-8e,84; sizes including headers as reported by tcpdump) of traffic through the network. But in fact I observe that only around half of requests sent get a response, probably mostly due to the target being behind an evil NAT, making the average traffic per request 403+395+387+354 + ((416+357+298+238)/2)=2193.5 bytes Now with the recent DHT fix, my node generates ~1.3 nodes requests per second, each costing 113+238=351 bytes. Other packet kinds seem to be negligible. So that adds up to (1.3*351 + 1*2193.5)/1024 = 2.6KBps in each direction. Now this calculation assumed that everyone participates properly in the network, while in fact we have many TCP-only leeches and many behind evil NATs which make them unusable by the onion (Restricted Cone or Symmetric), and it doesn't take into account lossy network conditions some nodes may operate under. So maybe a better estimate is ~5KBps in each direction for those fully participating in the onion. All very rough, could be significantly off in either direction. A lower bound, considering only DHT and announce traffic, would be (1.3*351 + 0.5*2193.5)/1024 = 1.5KBps.
> and with more work (optimising DHT traffic, and making sure we only > send good nodes in response to onion requests) I have the impresion > that as low as 1 could be feasible. I don't know if that's good > enough for mobile users. Yeah, as @mannol said. Delay in between is what helps mobile users the most. The goal that will best help mobile users will be LONG delays between packets, not just fewer packets.
We could arrange longish delays between requests, though it involves a tradeoff against risking losing our announcement. But if we're to participate fairly in the network rather than leeching off it, we must respond promptly to any packets coming in. So we oughtn't expect to be able to sleep and yet remain announced.
Reviewed 1 of 2 files at r1, 1 of 1 files at r5.
Comments from Reviewable