Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[addrman, net] Ensure tried collisions resolve, and allow feeler connections to existing outbound netgroups #15486

Merged
merged 4 commits into from Mar 9, 2019

Conversation

Projects
None yet
9 participants
@sdaftuar
Copy link
Member

commented Feb 26, 2019

The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (m_tried_collisions) would never be drained.

Also, ensure that all entries don't linger in m_tried_collisions by evicting an old entry if its collisions is unresolved after 40 minutes.

@sdaftuar

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2019

I think this should resolve #15484, but it's not clear to me if this is entirely sufficient to protect against the class of error, or if we should add additional logic to ResolveCollisions() to ensure that collisions get eventually resolved.

@DrahtBot

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@sdaftuar

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2019

Updated with a commit that tries to fix the underlying problem in addrman, by doing eviction anyway after 20 minutes (to prevent m_tried_collisions from never being drained).

If it looks good, I'll update the OP of this PR before merge.

@fanquake fanquake added the P2P label Feb 26, 2019

@gmaxwell

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

This bugfix should probably get tagged for 0.18.

@fanquake fanquake added this to the 0.18.0 milestone Feb 26, 2019

[net] feeler connections can be made to outbound peers in same netgroup
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

@sdaftuar sdaftuar force-pushed the sdaftuar:2019-02-addrman-collisions branch from 2410d33 to 1b0621b Feb 27, 2019

Show resolved Hide resolved src/addrman.h Outdated
[addrman] Ensure collisions eventually get resolved
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

@sdaftuar sdaftuar force-pushed the sdaftuar:2019-02-addrman-collisions branch from 1b0621b to f71fdda Feb 27, 2019

@sipa

This comment has been minimized.

Copy link
Member

commented Feb 28, 2019

utACK f71fdda

Show resolved Hide resolved src/addrman.cpp Outdated
@Sjors

This comment has been minimized.

Copy link
Member

commented Mar 1, 2019

Useful background info for context:

@Sjors

Sjors approved these changes Mar 1, 2019

Copy link
Member

left a comment

utACK f71fdda, but would be nice to have a test that demonstrates the original bug.

As @sipa suggested a few years ago, let's put a link to that paper in a comment somewhere.

Show resolved Hide resolved src/addrman.cpp
Show resolved Hide resolved src/addrman.cpp Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.cpp
@sdaftuar

This comment has been minimized.

Copy link
Member Author

commented Mar 1, 2019

Addressed nits.

@sdaftuar sdaftuar changed the title [net] Allow feeler connections to go to the same netgroup as existing outbound peers [addrman, net] Ensure tried collisions resolve, and allow feeler connections to existing outbound netgroups Mar 1, 2019

@gmaxwell

This comment has been minimized.

Copy link
Member

commented Mar 1, 2019

utACK

@Sjors

This comment has been minimized.

Copy link
Member

commented Mar 2, 2019

To clarify what I brought up inline:

bitcoin/src/addrman.cpp

Lines 240 to 253 in 20e6ea2

// Will moving this address into tried evict another entry?
if (test_before_evict && (vvTried[tried_bucket][tried_bucket_pos] != -1)) {
// Output the entry we'd be colliding with, for debugging purposes
auto colliding_entry = mapInfo.find(vvTried[tried_bucket][tried_bucket_pos]);
LogPrint(BCLog::ADDRMAN, "Collision inserting element into tried table (%s), moving %s to m_tried_collisions=%d\n", colliding_entry != mapInfo.end() ? colliding_entry->second.ToString() : "", addr.ToString(), m_tried_collisions.size());
if (m_tried_collisions.size() < ADDRMAN_SET_TRIED_COLLISION_SIZE) {
m_tried_collisions.insert(nId);
}
} else {
LogPrint(BCLog::ADDRMAN, "Moving %s to tried\n", addr.ToString());
// move nId to the tried tables
MakeTried(info, nId);
}

The log statement says we're moving it to m_tried_collisions, but that's not true if that's full. Hence my suggestion to move the log statement inside the if statement (line 245).

For the case m_tried_collisions is full @sdaftuar wrote:

I don't think it's quite as simple as "when in doubt, don't evict". I think in the case of a collision we bias ourselves towards not evicting, unless the peer we might evict can't be connected to via feeler connection.

My thought is that if m_tried_collisions is ever full, it seems like that is a scenario where we may be under attack; so not replacing things in our tried table (which we assume to have some good peers in it) is safe, even if not optimal. Maybe @EthanHeilman or @gmaxwell has a better intuition for how this works though.

And @EthanHeilman replied:

I agree with your intuition.

We bias toward not evicting. The assumption is that if the tried table is contains many evil IPs we have already lost and won't hear about good IPs. Thus, we assume the tried has many good IPs and that they should not be evicted unless a particular IP in tried is offline and thus no longer provides security.

But a few lines down it seems we're doing the opposite, namely bias ourselves towards evicting:

bitcoin/src/addrman.cpp

Lines 572 to 579 in 20e6ea2

} else if (GetAdjustedTime() - info_new.nLastSuccess > ADDRMAN_TEST_WINDOW) {
// If the collision hasn't resolved in some reasonable amount of time,
// just evict the old entry -- we must not be able to
// connect to it for some reason.
LogPrint(BCLog::ADDRMAN, "Unable to test; replacing %s with %s in tried table anyway\n", info_old.ToString(), info_new.ToString());
Good_(info_new, false, GetAdjustedTime());
erase_collision = true;
}

We don't know anything about this new entry, because we couldn't connect, so it seems to me it would be more consistent to keep the old entry.

It's still much better than getting stuck of course, so utACK 20e6ea2.

@gmaxwell

This comment has been minimized.

Copy link
Member

commented Mar 2, 2019

We don't know anything about this new entry,

When we insert the new entry in tried in the first place, it's because we've actually tried it. When inserting it we may find that it causes us to want to evict an old tried entry (which has also been tried, but not as recently). What the code is supposed to do (post try before evict) is to test the old tried entry before evicting it, in order to give it a second chance to stay around iff its still connectible.

@Sjors

This comment has been minimized.

Copy link
Member

commented Mar 2, 2019

Ah, I think I get it now... since we don't specifically track when things were added to m_tried_collisions, the check info_new.nLastSuccess > ADDRMAN_TEST_WINDOW is just the way we determine when the entry was added to the new table, which coincides with the time we last heard from it, i.e. when it last sent us a version message. And that's also when it's added to m_tried_collisions.

If we receive a version message:

if (strCommand == NetMsgType::VERSION) {

From an outbound node:

if (!pfrom->fInbound)

That's when we call MarkAddressGood in net.cpp which in turn calls CAddrMan::Good_, which adds it to m_tried_collisions.

@sdaftuar

This comment has been minimized.

Copy link
Member Author

commented Mar 2, 2019

Ah, I think I get it now... since we don't specifically track when things were added to m_tried_collisions, the check info_new.nLastSuccess > ADDRMAN_TEST_WINDOW is just the way we determine when the entry was added to the new table

That's not necessarily when the entry was added to the new table -- we can do that when learning about an address, for instance when relayed to us via an addr message from another peer. But otherwise I think your description is correct (in particular, that should be the time that we attempted to put it in the tried table but discovered the collision).

@naumenkogs

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

utACK

@sdaftuar

This comment has been minimized.

Copy link
Member Author

commented Mar 6, 2019

I think this is ready to go -- anything left that I missed?

@laanwj

This comment has been minimized.

Copy link
Member

commented Mar 9, 2019

utACK 20e6ea2

@laanwj laanwj merged commit 20e6ea2 into bitcoin:master Mar 9, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Mar 9, 2019

Merge #15486: [addrman, net] Ensure tried collisions resolve, and all…
…ow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6

fanquake added a commit to fanquake/bitcoin that referenced this pull request Mar 9, 2019

fanquake added a commit to fanquake/bitcoin that referenced this pull request Mar 9, 2019

[net] feeler connections can be made to outbound peers in same netgroup
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: bitcoin#15486
Rebased-From: 4991e3c

fanquake added a commit to fanquake/bitcoin that referenced this pull request Mar 9, 2019

[addrman] Ensure collisions eventually get resolved
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: bitcoin#15486
Rebased-From: f71fdda

fanquake added a commit to fanquake/bitcoin that referenced this pull request Mar 9, 2019

@fanquake fanquake removed the Needs backport label Mar 9, 2019

@fanquake

This comment has been minimized.

Copy link
Member

commented Mar 9, 2019

Backported in #15563.

Done by @laanwj in f810f14...b80dedb.

laanwj added a commit that referenced this pull request Mar 9, 2019

[addrman] Improve tried table collision logging
Github-Pull: #15486
Rebased-From: 4d83401
Tree-SHA512: c7843191c470d8b3298d4375632ddbcfd26358d21100bcbffa8d40498782f9dfef08cfb7b3c040525ef912b27ddd593322b60f8f21e16e2ae9bae66433ce807e

laanwj added a commit that referenced this pull request Mar 9, 2019

[net] feeler connections can be made to outbound peers in same netgroup
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: #15486
Rebased-From: 4991e3c
Tree-SHA512: 2752c9909d55ff63b9143168033d0d3952effba7f911181919eb62291edf822ec54fffbb20e35b5c5f8cb1092d75c496665da00139dbebe5ce402cbea3ad87c5

laanwj added a commit that referenced this pull request Mar 9, 2019

[addrman] Ensure collisions eventually get resolved
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: #15486
Rebased-From: f71fdda
Tree-SHA512: 66c61a9f030b1666e98e4fe66dd6cfc5f1d03b2a1ec01567b195903db6e4412ac778f4464ee9ef35ae6faa1ab7e4b18ef7ecb9a7ced29e6494046990aebf7b76

laanwj added a commit that referenced this pull request Mar 9, 2019

[addrman] Improve collision logging and address nits
Github-Pull: #15486
Rebased-From: 20e6ea2
Tree-SHA512: 73b37a98cf4d9ede1d39c30d178d63a5a6865dba6cb7a9f33bd1e03445acb708b3007c7cde991b5de96a407262adda23279fe7a1d2ed31b0b5a33b2e97aaba9b

HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Mar 10, 2019

[addrman] Improve tried table collision logging
Github-Pull: bitcoin#15486
Rebased-From: 4d83401
Tree-SHA512: c7843191c470d8b3298d4375632ddbcfd26358d21100bcbffa8d40498782f9dfef08cfb7b3c040525ef912b27ddd593322b60f8f21e16e2ae9bae66433ce807e

HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Mar 10, 2019

[net] feeler connections can be made to outbound peers in same netgroup
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: bitcoin#15486
Rebased-From: 4991e3c
Tree-SHA512: 2752c9909d55ff63b9143168033d0d3952effba7f911181919eb62291edf822ec54fffbb20e35b5c5f8cb1092d75c496665da00139dbebe5ce402cbea3ad87c5

HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Mar 10, 2019

[addrman] Ensure collisions eventually get resolved
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: bitcoin#15486
Rebased-From: f71fdda
Tree-SHA512: 66c61a9f030b1666e98e4fe66dd6cfc5f1d03b2a1ec01567b195903db6e4412ac778f4464ee9ef35ae6faa1ab7e4b18ef7ecb9a7ced29e6494046990aebf7b76

HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Mar 10, 2019

[addrman] Improve collision logging and address nits
Github-Pull: bitcoin#15486
Rebased-From: 20e6ea2
Tree-SHA512: 73b37a98cf4d9ede1d39c30d178d63a5a6865dba6cb7a9f33bd1e03445acb708b3007c7cde991b5de96a407262adda23279fe7a1d2ed31b0b5a33b2e97aaba9b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.