New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save bandwidth by avoiding superfluous Nodes Requests to peers already on the Close List #511

Merged
merged 1 commit into from Apr 29, 2017

Conversation

Projects
None yet
7 participants
@zugz

zugz commented Mar 12, 2017

Currently, when we receive a response to a getnodes request, for each node in
the response we check if it could be added to the close list, and if so we send
the node a getnodes request; if it replies, we add it to the close list.

We do the same for each friends list, but for them we first check whether the
node is already on the friends list, and don't send a getnodes request if it
is.

This change adds a corresponding check for the close list, such that we only
send a getnodes request if the node can be added to the close list and is not
already on the close list.

The original behaviour could, and as far as I can see should be expected
typically to, lead to getnode loops resulting in lots of unnecessary traffic.
This can happen as follows: suppose A, B, and C are all close to each other in
the DHT, and all know each other, and A sends B a getnodes request searching
for A (as happens at least every 60 seconds). B's response will likely include
C. It's quite likely that the part of A's close list (the "k-bucket") into
which C falls will not be full, i.e. that A does not know 8 nodes at that
distance. Then even though A already knows C, when A receives B's response A
will send a getnodes to C. Then C's response is likely to include B, in which
case (if that bucket is also not full) A will send another getnodes to B. So we
obtain a tight loop, with the only delays being network delays and the delay
between calls to do_DHT().

I don't see that there can be any downside to adding this check, but I'm not
entirely confident that I couldn't be missing something subtle, so please do
review carefully.


This change is Reviewable

@zugz

This comment has been minimized.

Show comment
Hide comment
@zugz

zugz Mar 12, 2017

I did some primitive testing, and I'm sad to say that this patch doesn't seem to lead to any immediately discernable reduction in bandwidth. Whether this is because the scenario I sketched isn't actually occuring, or just because it's being drowned out by all the other sources of bandwidth use, I don't know.

zugz commented Mar 12, 2017

I did some primitive testing, and I'm sad to say that this patch doesn't seem to lead to any immediately discernable reduction in bandwidth. Whether this is because the scenario I sketched isn't actually occuring, or just because it's being drowned out by all the other sources of bandwidth use, I don't know.

@robinlinden

This comment has been minimized.

Show comment
Hide comment
@robinlinden

robinlinden Mar 23, 2017

Member

Reviewed 1 of 1 files at r1.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


toxcore/DHT.c, line 1025 at r1 (raw file):

    if (index > LCLIENT_LENGTH) {
        index = LCLIENT_LENGTH - 1;

So LCLIENT_LENGTH is okay, but you set index to LCLIENT_LENGTH - 1? Did you mean to use >= in the if-check or is this all intentional?


Comments from Reviewable

Member

robinlinden commented Mar 23, 2017

Reviewed 1 of 1 files at r1.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


toxcore/DHT.c, line 1025 at r1 (raw file):

    if (index > LCLIENT_LENGTH) {
        index = LCLIENT_LENGTH - 1;

So LCLIENT_LENGTH is okay, but you set index to LCLIENT_LENGTH - 1? Did you mean to use >= in the if-check or is this all intentional?


Comments from Reviewable

@nurupo

This comment has been minimized.

Show comment
Hide comment
@nurupo

nurupo Mar 23, 2017

Member

Is the "close list" equivalent of the k-bucket list in Kademlia? If so, the behaviour you are complaining about might actually be the desired behaviour and you fixing it would make DHT less reliable. In Kademlia, each k-bucket contains at most k nodes and is sorted by the most recently seen time. You update the k-bucket list every time you receive any DHT reply from any node, putting that node on the top (or bottom, depends on how you implemented it) of the list, since it's the most recently seen node in that bucket. Of course, if the node is already somewhere in the k-bucket, you remove it before adoing it to the top, as you don't want duplicates in the k-bucket. The property of k-buckets being always sorted by the most recently seen time is key to Kademlia DHT's reliability, because once a k-bucket is full and you add a new node to it, the least recently seen node is removed from the k-bucket, i.e. the bottom most node is removed. This makes it so that k-bucket contains k nodes which are very likely to be online. If you change the code in such a way that you don't put a recently seen node at the top of the k-bucket if it's already in it, you violate the property of the k-bucket being sorted by the most recently seen time, which allows for the exact opposite behaviour to happen -- the bottom most node might in fact be one of the most recently seen nodes, which should be on the top of the k-bucket instead, and since the bottom most node is removed when you try to insert a new node in a full k-bucket, because the algorithm assumes that the bottom most node is the least recently seen node, it would unknowingly remove the most recently node.

That said, I'm only familiar with Kademlia DHT, not Tox DHT, but Tox DHT is supposedly based on Kademlia DHT so this is likely to still apply.

Member

nurupo commented Mar 23, 2017

Is the "close list" equivalent of the k-bucket list in Kademlia? If so, the behaviour you are complaining about might actually be the desired behaviour and you fixing it would make DHT less reliable. In Kademlia, each k-bucket contains at most k nodes and is sorted by the most recently seen time. You update the k-bucket list every time you receive any DHT reply from any node, putting that node on the top (or bottom, depends on how you implemented it) of the list, since it's the most recently seen node in that bucket. Of course, if the node is already somewhere in the k-bucket, you remove it before adoing it to the top, as you don't want duplicates in the k-bucket. The property of k-buckets being always sorted by the most recently seen time is key to Kademlia DHT's reliability, because once a k-bucket is full and you add a new node to it, the least recently seen node is removed from the k-bucket, i.e. the bottom most node is removed. This makes it so that k-bucket contains k nodes which are very likely to be online. If you change the code in such a way that you don't put a recently seen node at the top of the k-bucket if it's already in it, you violate the property of the k-bucket being sorted by the most recently seen time, which allows for the exact opposite behaviour to happen -- the bottom most node might in fact be one of the most recently seen nodes, which should be on the top of the k-bucket instead, and since the bottom most node is removed when you try to insert a new node in a full k-bucket, because the algorithm assumes that the bottom most node is the least recently seen node, it would unknowingly remove the most recently node.

That said, I'm only familiar with Kademlia DHT, not Tox DHT, but Tox DHT is supposedly based on Kademlia DHT so this is likely to still apply.

@zugz

This comment has been minimized.

Show comment
Hide comment
@zugz

zugz Mar 23, 2017

zugz commented Mar 23, 2017

@zugz

This comment has been minimized.

Show comment
Hide comment
@zugz

zugz Mar 23, 2017

zugz commented Mar 23, 2017

@robinlinden

This comment has been minimized.

Show comment
Hide comment
@robinlinden

robinlinden Mar 25, 2017

Member

:lgtm_strong:


Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

Member

robinlinden commented Mar 25, 2017

:lgtm_strong:


Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@iphydf iphydf modified the milestone: v0.1.8 Mar 26, 2017

@iphydf

This comment has been minimized.

Show comment
Hide comment
@iphydf

iphydf Mar 26, 2017

Member

Please enable the checkbox "Allow edits from maintainers." on the bottom right.

Member

iphydf commented Mar 26, 2017

Please enable the checkbox "Allow edits from maintainers." on the bottom right.

@zugz zugz changed the title from check if already in close list in ping_node_from_getnodes_ok() to Save bandwith by avoiding sending superfluous Nodes Requests to peers already on the Close List Apr 1, 2017

@iphydf iphydf changed the title from Save bandwith by avoiding sending superfluous Nodes Requests to peers already on the Close List to Save bandwidth by avoiding superfluous Nodes Requests to peers already on the Close List Apr 1, 2017

@iphydf

This comment has been minimized.

Show comment
Hide comment
@iphydf

iphydf Apr 1, 2017

Member

@nurupo are you happy with this PR?

Member

iphydf commented Apr 1, 2017

@nurupo are you happy with this PR?

@GrayHatter

This comment has been minimized.

Show comment
Hide comment
@GrayHatter

GrayHatter Apr 7, 2017

let me take a look too

GrayHatter commented Apr 7, 2017

let me take a look too

@GrayHatter

This comment has been minimized.

Show comment
Hide comment
@GrayHatter

GrayHatter Apr 8, 2017

I did some primitive testing, and I'm sad to say that this patch doesn't seem to lead to any immediately discernable reduction in bandwidth. Whether this is because the scenario I sketched isn't actually occuring, or just because it's being drowned out by all the other sources of bandwidth use, I don't know.

@JFreegman has a feature branch with a packet count/size, did you, or can you use that to see if it's changing the packet counts in other ways?


Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

GrayHatter commented Apr 8, 2017

I did some primitive testing, and I'm sad to say that this patch doesn't seem to lead to any immediately discernable reduction in bandwidth. Whether this is because the scenario I sketched isn't actually occuring, or just because it's being drowned out by all the other sources of bandwidth use, I don't know.

@JFreegman has a feature branch with a packet count/size, did you, or can you use that to see if it's changing the packet counts in other ways?


Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@GrayHatter

This comment has been minimized.

Show comment
Hide comment
@GrayHatter

GrayHatter Apr 8, 2017

That said, I'm only familiar with Kademlia DHT, not Tox DHT, but Tox DHT is supposedly based on Kademlia DHT so this is likely to still apply.

From my understanding that's not how Tox sorts it's close list. Unless I misunderstand something. ONLY replaces nodes that have timed out.


Comments from Reviewable

GrayHatter commented Apr 8, 2017

That said, I'm only familiar with Kademlia DHT, not Tox DHT, but Tox DHT is supposedly based on Kademlia DHT so this is likely to still apply.

From my understanding that's not how Tox sorts it's close list. Unless I misunderstand something. ONLY replaces nodes that have timed out.


Comments from Reviewable

@GrayHatter

This comment has been minimized.

Show comment
Hide comment
@GrayHatter

GrayHatter Apr 8, 2017

:lgtm: afaict, @zugz is correct about everything. All this pull does is prevent the DHT from adding nodes that exist in the close list to the to_bootstrap list. Which IIRC would only then attempt to add them to the close list.

If they're already in the close list, we honestly don't need to interact with them at all*, assuming other nodes closer send us a get nodes request, we don't need to ask them for close nodes anymore*.

*many restrictions apply, and I'm not advocating we actually do this.


Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

GrayHatter commented Apr 8, 2017

:lgtm: afaict, @zugz is correct about everything. All this pull does is prevent the DHT from adding nodes that exist in the close list to the to_bootstrap list. Which IIRC would only then attempt to add them to the close list.

If they're already in the close list, we honestly don't need to interact with them at all*, assuming other nodes closer send us a get nodes request, we don't need to ask them for close nodes anymore*.

*many restrictions apply, and I'm not advocating we actually do this.


Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@iphydf

This comment has been minimized.

Show comment
Hide comment
@iphydf

iphydf Apr 9, 2017

Member

@irungentoo would you like to take a look at this to make sure it makes sense?

Member

iphydf commented Apr 9, 2017

@irungentoo would you like to take a look at this to make sure it makes sense?

@irungentoo

This comment has been minimized.

Show comment
Hide comment
@irungentoo

irungentoo Apr 10, 2017

This looks fine.

irungentoo commented Apr 10, 2017

This looks fine.

@iphydf iphydf modified the milestones: v0.1.8, v0.1.9 Apr 12, 2017

@zugz

This comment has been minimized.

Show comment
Hide comment
@zugz

zugz Apr 28, 2017

I did a little imprecise testing, and I'm observing roughly a quartering of
the number of getnodes requests sent during normal DHT operation once this
patch is applied (from about 8 per second to about 2). So this should indeed
help with our bandwidth problem.

zugz commented Apr 28, 2017

I did a little imprecise testing, and I'm observing roughly a quartering of
the number of getnodes requests sent during normal DHT operation once this
patch is applied (from about 8 per second to about 2). So this should indeed
help with our bandwidth problem.

check if already in close list in ping_node_from_getnodes_ok()
fix index bounds check in add_to_close() and is_pk_in_close_list()

add TODO to write test for bug which fixed by this commit

@robinlinden robinlinden merged commit 2474f43 into TokTok:master Apr 29, 2017

4 of 5 checks passed

ci/circleci CircleCI is running your tests
Details
code-review/reviewable 1/1 LGTMs, 1 stale
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.2%) to 74.469%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment