New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save bandwidth by avoiding superfluous Nodes Requests to peers already on the Close List #511
Conversation
I did some primitive testing, and I'm sad to say that this patch doesn't seem to lead to any immediately discernable reduction in bandwidth. Whether this is because the scenario I sketched isn't actually occuring, or just because it's being drowned out by all the other sources of bandwidth use, I don't know. |
Reviewed 1 of 1 files at r1. toxcore/DHT.c, line 1025 at r1 (raw file):
So Comments from Reviewable |
Is the "close list" equivalent of the k-bucket list in Kademlia? If so, the behaviour you are complaining about might actually be the desired behaviour and you fixing it would make DHT less reliable. In Kademlia, each k-bucket contains at most k nodes and is sorted by the most recently seen time. You update the k-bucket list every time you receive any DHT reply from any node, putting that node on the top (or bottom, depends on how you implemented it) of the list, since it's the most recently seen node in that bucket. Of course, if the node is already somewhere in the k-bucket, you remove it before adoing it to the top, as you don't want duplicates in the k-bucket. The property of k-buckets being always sorted by the most recently seen time is key to Kademlia DHT's reliability, because once a k-bucket is full and you add a new node to it, the least recently seen node is removed from the k-bucket, i.e. the bottom most node is removed. This makes it so that k-bucket contains k nodes which are very likely to be online. If you change the code in such a way that you don't put a recently seen node at the top of the k-bucket if it's already in it, you violate the property of the k-bucket being sorted by the most recently seen time, which allows for the exact opposite behaviour to happen -- the bottom most node might in fact be one of the most recently seen nodes, which should be on the top of the k-bucket instead, and since the bottom most node is removed when you try to insert a new node in a full k-bucket, because the algorithm assumes that the bottom most node is the least recently seen node, it would unknowingly remove the most recently node. That said, I'm only familiar with Kademlia DHT, not Tox DHT, but Tox DHT is supposedly based on Kademlia DHT so this is likely to still apply. |
* Thursday, 2017-03-23 at 07:58 -0700 - Robin Lindén <notifications@github.com>:
Reviewed 1 of 1 files at r1.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.
---
*[toxcore/DHT.c, line 1025 at r1](https://reviewable.io:443/reviews/toktok/c-toxcore/511#-Kfveq4VV6Tj0gSbGrLa:-Kfveq4VV6Tj0gSbGrLb:b42aezb) ([raw file](https://github.com/toktok/c-toxcore/blob/2f172769d11b03488ea278dc34c0c421b211b759/toxcore/DHT.c#L1025)):*
> ```C
>
> if (index > LCLIENT_LENGTH) {
> index = LCLIENT_LENGTH - 1;
> ```
So `LCLIENT_LENGTH` is okay, but you set index to `LCLIENT_LENGTH - 1`? Did you mean to use `>=` in the if-check or is this all intentional?
Good catch!
I copied this from add_to_close(), but indeed it should be '>=' both
there and here, because in neither case is index=LCLIENT_LENGTH a good
idea.
Fixed in both functions.
|
* Thursday, 2017-03-23 at 09:47 -0700 - nurupo <notifications@github.com>:
Is the "close list" equivalent of the k-bucket list in Kademlia?
Yes.
If so, the behaviour you are complaining about might actually be the
desired behaviour and you fixing it would make DHT less reliable. In
Kademlia, each k-bucket contains at most k nodes and is sorted by the
most recently seen time.
Tox doesn't order the buckets at all. Instead, it "pings" every node in
each bucket every 60 seconds by sending it a getnodes request, and if
the node replies a timestamp on the node is updated. If a node fails to
reply to requests for long enough (122s) it's marked as "bad", which
effectively means it's removed from the bucket.
The PR doesn't interfere with this "pinging".
This frequent "pinging" is expensive, so it might well be that in the
future we want to move to something more like the Kademlia system. So it
might be worth thinking through how this PR would affect such a system:
You update the k-bucket list every time you receive any DHT reply from
any node, putting that node on the top (or bottom, depends on how you
implemented it) of the list, since it's the most recently seen node in
that bucket. Of course, if the node is already somewhere in the
k-bucket, you remove it before adoing it to the top, as you don't want
duplicates in the k-bucket.
This PR is not changing that. When we receive a reply to a getnodes
request, we update the node we got the reply from; this PR doesn't
affect that. What it affects is what we do with the nodes the reply
tells us about. Currently we immediately send a getnodes request to
each of those nodes, searching for ourself, unless the corresponding
bucket is full; this PR prevents sending such a request when the node is
already in the bucket.
The only possible interaction with a least-recently-seen policy is that
this the current behaviour would prompt nodes which show up in
a getnodes response to declare that they're still alive. But given the
pathological behaviour I described in the original PR text, I don't
think this can be a good method for doing that. The Kademlia paper
describes an entirely different method for refreshing buckets when usual
traffic isn't enough, by doing a search for a random ID in the bucket's
range.
|
Reviewed 1 of 1 files at r2. Comments from Reviewable |
Please enable the checkbox "Allow edits from maintainers." on the bottom right. |
e7c2f48
to
b8ac109
Compare
@nurupo are you happy with this PR? |
let me take a look too |
@JFreegman has a feature branch with a packet count/size, did you, or can you use that to see if it's changing the packet counts in other ways? Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
From my understanding that's not how Tox sorts it's close list. Unless I misunderstand something. ONLY replaces nodes that have timed out. Comments from Reviewable |
If they're already in the close list, we honestly don't need to interact with them at all*, assuming other nodes closer send us a get nodes request, we don't need to ask them for close nodes anymore*. *many restrictions apply, and I'm not advocating we actually do this. Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
@irungentoo would you like to take a look at this to make sure it makes sense? |
This looks fine. |
I did a little imprecise testing, and I'm observing roughly a quartering of |
fix index bounds check in add_to_close() and is_pk_in_close_list() add TODO to write test for bug which fixed by this commit
b8ac109
to
2474f43
Compare
Currently, when we receive a response to a getnodes request, for each node in
the response we check if it could be added to the close list, and if so we send
the node a getnodes request; if it replies, we add it to the close list.
We do the same for each friends list, but for them we first check whether the
node is already on the friends list, and don't send a getnodes request if it
is.
This change adds a corresponding check for the close list, such that we only
send a getnodes request if the node can be added to the close list and is not
already on the close list.
The original behaviour could, and as far as I can see should be expected
typically to, lead to getnode loops resulting in lots of unnecessary traffic.
This can happen as follows: suppose A, B, and C are all close to each other in
the DHT, and all know each other, and A sends B a getnodes request searching
for A (as happens at least every 60 seconds). B's response will likely include
C. It's quite likely that the part of A's close list (the "k-bucket") into
which C falls will not be full, i.e. that A does not know 8 nodes at that
distance. Then even though A already knows C, when A receives B's response A
will send a getnodes to C. Then C's response is likely to include B, in which
case (if that bucket is also not full) A will send another getnodes to B. So we
obtain a tight loop, with the only delays being network delays and the delay
between calls to do_DHT().
I don't see that there can be any downside to adding this check, but I'm not
entirely confident that I couldn't be missing something subtle, so please do
review carefully.
This change is