-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save bandwidth by moderating onion pinging #542
Conversation
The other potential problem I meant to mention: this PR reduces the |
Does this allows ToxCore to perform better on mobile devices. How I understand that PR it would fix bandwidth issues on mobile, right ? |
* Monday, 2017-05-01 at 12:55 -0700 - SkyzohKey <notifications@github.com>:
Does this allows ToxCore to perform better on mobile devices. How
I understand that PR it would fix bandwidth issues on mobile, right ?
It should help a lot, but tox will always use a fair amount of
bandwidth. Making up numbers which might be way off, I guess this should
get it down to around 2-3KBps in each direction when idling, and with
more work (optimising DHT traffic, and making sure we only send good
nodes in response to onion requests) I have the impresion that as low as
1 could be feasible. I don't know if that's good enough for mobile
users.
|
The problem with mobile is the 'per second' notation. It's not(that much) about the amount of traffic you send but rather how frequent you send that traffic. Mobile devices shouldn't be a part of the DHT at all. |
@mannol actually on mobile the "per second" does not matter when it comes to battery usage. on mobile (in my opinion) only timeouts really matter. so that the CPU can go into deepsleep and save battery. |
@zoff99 the CPU is woken/prevented from sleeping every time the packet is received on the radio. |
@mannol yeah, that's why we need at least 2 minutes total silence on Android to get some deep sleep. tests have shown it does wonders to your battery life |
The problem is: when in a DHT, the device has to communicate with the DHT by responding on DHT requests. That's why mobile devices shouldn't be a part of the DHT at all. |
if in the near future 80% of all tox devices will be mobile devices (without beeing part of the DHT), will DHT still work as expected? |
Probably. |
I can haz dis data tooo?
Which commit[s] fixes this?
reading through the code, what would you think about externalizing this logic a bit. E.g., can we make the decision in one function, and do the work in another. Instead of the current decide+act in the same functions.
Onion paths, so if one of the nodes goes down, they whole route is stale? And what time frame did you do you testing.
Is that friend requests, or friend connections.
Tox isn't very adaptive to poor network conditions, if I'm on a bad connection, would this diff change it from hard to make a connection, to impossible?
Which network? The onion network, or the DHT network?
What data are you using to get this estimate?
Yeah, as @mannol said. Delay in between is what helps mobile users the most. The goal that will best help mobile users will be LONG delays between packets, not just fewer packets. That said, this will be make it much easier to use on metered connections. The DHT backend is what makes tox a good protocol. The problem is not that tox uses the DHT on mobile, the problem is the DHT is dumb. I smart DHT protocol should be able to bootstrap, and find and connect to ALL online friends in under 5s. Creating and validating onion routes is one of the things that makes this hard to do. So any optimizations of onion routing is a good place to start. (A part from dropping the onion for announce/friend finding) |
* Wednesday, 2017-05-10 at 23:03 -0700 - Gregory Mullen <notifications@github.com>:
>In part this is by fixing some bugs; probably most significantly,
>previously the nodes for friends were always considered timed out when
>we came to ping them, since the timeout for announce nodes was used
>for the check rather than that for friend nodes, leading to some extra
>traffic.
Which commit[s] fixes this?
commit f3d32c9
"Revise onion node time out calculation"
> Then it implements two schemes for reducing unnecessary pinging.
reading through the code, what would you think about externalizing this
logic a bit. E.g., can we make the decision in one function, and do the
work in another. Instead of the current decide+act in the same
functions.
Normally I would do that. But the current code seems to avoid defining
functions which will only be called at one point in the code, so perhaps
it's better to stick to that style.
>The first is to start to trust slightly announce nodes and paths which
>have been alive for a while, and not ping them so aggressively [...]
>with no discernable problems.
Onion paths, so if one of the nodes goes down, they whole route is stale?
Onion paths, yes. If we fail to get a response along a path, that path
stops being considered stable, so we will quickly observe if it is
consistently failing and needs to be replaced.
And what time frame did you do you testing.
Various. Nodes and routes only take 90s to reach the "stable" state, so
the time frame doesn't matter much for this.
But generally, it would be great if people could test out this PR with
their own situations and usage patterns, and confirm that it doesn't
cause any problems in practice. Since all it does is reduce onion
requests, there should be no network effects to worry about - if tox
with this patch works properly while the network at large is using
unpatched tox, it should work just as well if everyone makes the switch.
>The second is to gradually reduce the rate of pinging we do to check
>on offline friends when they've been offline for some time. [...] With
>the constants as I've set them, the extreme case is that they stay
>offline for at least 1600 minutes, and then the request will take
>something like 5 minutes on average to get through.
Is that friend requests, or friend connections.
Friend requests and friend searches. Friend connections are separate,
and this PR doesn't touch them.
> The same delay applies to established friends who come online after
> being offline for a while and who somehow fail to find our
> announcement - that shouldn't happen, but possibly it sometimes does
> (and an attacker can cause it by poisoning our announcement
> neighbourhood).
Tox isn't very adaptive to poor network conditions, if I'm on a bad
connection, would this diff change it from hard to make a connection,
to impossible?
The only negative interaction I can see is that in situations of severe
packet loss which, with the current interval between requests, are not
quite severe enough to prevent finding friend nodes, increasing the
interval means we more often make requests to nodes which have
disappeared - and this could nudge us over the edge to a situation where
we never manage to find nodes close enough to the target friend.
I expect this effect is quite small, but it's hard to quantify. This is
still only an issue in the case that a friend comes online after we've
been searching for them for some time, and they somehow fail to find our
own announcement (which as far as I can see means deliberate poisoning
or that they are suffering from similarly extreme packet loss).
> The other potential problem I meant to mention: this PR reduces the
>sensitivity to being disconnected from the network, since this is
>detected by seeing if enough time passes without us receiving
>a response to an onion request, and now we send requests less
>regularly it can happen that we don't receive any responses for
>a while. I've increased the timeout from 19s to 75s, which is enough
>to make false positives very unlikely. If this is a problem, it
>wouldn't be too hard to add a mechanism to deliberately test the
>network if we receive nothing for say 30s.
Which network? The onion network, or the DHT network?
I meant the onion. See onion_client.c:onion_isconnected.
> It should help a lot, but tox will always use a fair amount of
> bandwidth. Making up numbers which might be way off, I guess this
> should get it down to around 2-3KBps in each direction when idling,
What data are you using to get this estimate?
Observing (by packet sniffing and printf-debugging) my own node running
with this patch, I observe sending ~0.5 onion packet requests per second
for announcement. Once things are settled, traffic for friend searching
is something like ~0.01 per offline friend. But rates are much higher at
first. What this means on average depends on how many offline friends
users have and how long their nodes stay online, but let's estimate the
average total request rate as 1 packet per second. Each request,
assuming the packet doesn't get dropped anywhere along the way,
generates
403+395+387+354+416+357+298+238=2848 bytes
(packet kinds 80-83,8c-8e,84; sizes including headers as reported by
tcpdump) of traffic through the network. But in fact I observe that only
around half of requests sent get a response, probably mostly due to the
target being behind an evil NAT, making the average traffic per request
403+395+387+354 + ((416+357+298+238)/2)=2193.5 bytes
Now with the recent DHT fix, my node generates ~1.3 nodes requests per
second, each costing 113+238=351 bytes. Other packet kinds seem to be
negligible. So that adds up to
(1.3*351 + 1*2193.5)/1024 = 2.6KBps
in each direction.
Now this calculation assumed that everyone participates properly in the
network, while in fact we have many TCP-only leeches and many behind
evil NATs which make them unusable by the onion (Restricted Cone or
Symmetric), and it doesn't take into account lossy network conditions
some nodes may operate under. So maybe a better estimate is ~5KBps in
each direction for those fully participating in the onion.
All very rough, could be significantly off in either direction. A lower
bound, considering only DHT and announce traffic, would be
(1.3*351 + 0.5*2193.5)/1024 = 1.5KBps.
> and with more work (optimising DHT traffic, and making sure we only
> send good nodes in response to onion requests) I have the impresion
> that as low as 1 could be feasible. I don't know if that's good
> enough for mobile users.
Yeah, as @mannol said. Delay in between is what helps mobile users the
most. The goal that will best help mobile users will be LONG delays
between packets, not just fewer packets.
We could arrange longish delays between requests, though it involves
a tradeoff against risking losing our announcement. But if we're to
participate fairly in the network rather than leeching off it, we must
respond promptly to any packets coming in. So we oughtn't expect to be
able to sleep and yet remain announced.
|
Reviewed 1 of 2 files at r1, 1 of 1 files at r5. toxcore/onion_client.h, line 82 at r3 (raw file):
Comments from Reviewable |
Reviewed 2 of 2 files at r6. Comments from Reviewable |
ff15ec3
to
5dd2557
Compare
Currently, a large majority of the traffic in the tox network is generated by
pinging onion nodes with Onion Request packets, which a node does to keep
itself announced and to check for friends coming online. This PR significantly
reduces the rate at which such requests are sent.
In part this is by fixing some bugs; probably most significantly, previously
the nodes for friends were always considered timed out when we came to ping
them, since the timeout for announce nodes was used for the check rather than
that for friend nodes, leading to some extra traffic.
Then it implements two schemes for reducing unnecessary pinging. The first is
to start to trust slightly announce nodes and paths which have been alive for a
while, and not ping them so aggressively - while reverting to the old behaviour
at the first sign of failure. Based on my testing, this reducing announce
traffic by around a factor of 4, from around 2 packets per second to around
~0.5, with no discernable problems.
The second is to gradually reduce the rate of pinging we do to check on offline
friends when they've been offline for some time. Since some users typically
have many offline friends, this should result in a large reduction in traffic.
This last should be the only change which could cause any problems for users.
The principal effect is on friend requests: if we make a friend request
to an offline node, and it remains offline for some time while we stay online,
then when it comes online it may take longer for it to receive the friend
request. With the constants as I've set them, the extreme case is that they
stay offline for at least 1600 minutes, and then the request will take
something like 5 minutes on average to get through. The same delay applies to
established friends who come online after being offline for a while and who
somehow fail to find our announcement - that shouldn't happen, but possibly it
sometimes does (and an attacker can cause it by poisoning our announcement
neighbourhood).
Please note when testing this PR that you shouldn't expect to see a huge
bandwidth reduction - what you should see if you packet-sniff is a reduction
(after your node has been up for a while) in Onion Request packets (first byte
0x80), but it needs a large proportion of the network to adopt it before we'll
start to see actual significant bandwidth reduction.
This change is