Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[addrman, net] Ensure tried collisions resolve, and allow feeler connections to existing outbound netgroups #15486

Merged
merged 4 commits into from Mar 9, 2019

Conversation

@sdaftuar
Copy link
Member

@sdaftuar sdaftuar commented Feb 26, 2019

The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (m_tried_collisions) would never be drained.

Also, ensure that all entries don't linger in m_tried_collisions by evicting an old entry if its collisions is unresolved after 40 minutes.

@sdaftuar
Copy link
Member Author

@sdaftuar sdaftuar commented Feb 26, 2019

I think this should resolve #15484, but it's not clear to me if this is entirely sufficient to protect against the class of error, or if we should add additional logic to ResolveCollisions() to ensure that collisions get eventually resolved.

@DrahtBot
Copy link
Contributor

@DrahtBot DrahtBot commented Feb 26, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@sdaftuar
Copy link
Member Author

@sdaftuar sdaftuar commented Feb 26, 2019

Updated with a commit that tries to fix the underlying problem in addrman, by doing eviction anyway after 20 minutes (to prevent m_tried_collisions from never being drained).

If it looks good, I'll update the OP of this PR before merge.

@fanquake fanquake added the P2P label Feb 26, 2019
@gmaxwell
Copy link
Contributor

@gmaxwell gmaxwell commented Feb 26, 2019

This bugfix should probably get tagged for 0.18.

@fanquake fanquake added this to the 0.18.0 milestone Feb 26, 2019
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.
@sdaftuar sdaftuar force-pushed the 2019-02-addrman-collisions branch from 2410d33 to 1b0621b Feb 27, 2019
src/addrman.h Outdated Show resolved Hide resolved
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.
@sdaftuar sdaftuar force-pushed the 2019-02-addrman-collisions branch from 1b0621b to f71fdda Feb 27, 2019
@sipa
Copy link
Member

@sipa sipa commented Feb 28, 2019

utACK f71fdda

src/addrman.cpp Outdated Show resolved Hide resolved
@Sjors
Copy link
Member

@Sjors Sjors commented Mar 1, 2019

Useful background info for context:

Sjors
Sjors approved these changes Mar 1, 2019
Copy link
Member

@Sjors Sjors left a comment

utACK f71fdda, but would be nice to have a test that demonstrates the original bug.

As @sipa suggested a few years ago, let's put a link to that paper in a comment somewhere.

src/addrman.cpp Show resolved Hide resolved
src/addrman.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Show resolved Hide resolved
@sdaftuar
Copy link
Member Author

@sdaftuar sdaftuar commented Mar 1, 2019

Addressed nits.

@sdaftuar sdaftuar changed the title [net] Allow feeler connections to go to the same netgroup as existing outbound peers [addrman, net] Ensure tried collisions resolve, and allow feeler connections to existing outbound netgroups Mar 1, 2019
@gmaxwell
Copy link
Contributor

@gmaxwell gmaxwell commented Mar 1, 2019

utACK

@Sjors
Copy link
Member

@Sjors Sjors commented Mar 2, 2019

To clarify what I brought up inline:

bitcoin/src/addrman.cpp

Lines 240 to 253 in 20e6ea2

// Will moving this address into tried evict another entry?
if (test_before_evict && (vvTried[tried_bucket][tried_bucket_pos] != -1)) {
// Output the entry we'd be colliding with, for debugging purposes
auto colliding_entry = mapInfo.find(vvTried[tried_bucket][tried_bucket_pos]);
LogPrint(BCLog::ADDRMAN, "Collision inserting element into tried table (%s), moving %s to m_tried_collisions=%d\n", colliding_entry != mapInfo.end() ? colliding_entry->second.ToString() : "", addr.ToString(), m_tried_collisions.size());
if (m_tried_collisions.size() < ADDRMAN_SET_TRIED_COLLISION_SIZE) {
m_tried_collisions.insert(nId);
}
} else {
LogPrint(BCLog::ADDRMAN, "Moving %s to tried\n", addr.ToString());
// move nId to the tried tables
MakeTried(info, nId);
}

The log statement says we're moving it to m_tried_collisions, but that's not true if that's full. Hence my suggestion to move the log statement inside the if statement (line 245).

For the case m_tried_collisions is full @sdaftuar wrote:

I don't think it's quite as simple as "when in doubt, don't evict". I think in the case of a collision we bias ourselves towards not evicting, unless the peer we might evict can't be connected to via feeler connection.

My thought is that if m_tried_collisions is ever full, it seems like that is a scenario where we may be under attack; so not replacing things in our tried table (which we assume to have some good peers in it) is safe, even if not optimal. Maybe @EthanHeilman or @gmaxwell has a better intuition for how this works though.

And @EthanHeilman replied:

I agree with your intuition.

We bias toward not evicting. The assumption is that if the tried table is contains many evil IPs we have already lost and won't hear about good IPs. Thus, we assume the tried has many good IPs and that they should not be evicted unless a particular IP in tried is offline and thus no longer provides security.

But a few lines down it seems we're doing the opposite, namely bias ourselves towards evicting:

bitcoin/src/addrman.cpp

Lines 572 to 579 in 20e6ea2

} else if (GetAdjustedTime() - info_new.nLastSuccess > ADDRMAN_TEST_WINDOW) {
// If the collision hasn't resolved in some reasonable amount of time,
// just evict the old entry -- we must not be able to
// connect to it for some reason.
LogPrint(BCLog::ADDRMAN, "Unable to test; replacing %s with %s in tried table anyway\n", info_old.ToString(), info_new.ToString());
Good_(info_new, false, GetAdjustedTime());
erase_collision = true;
}

We don't know anything about this new entry, because we couldn't connect, so it seems to me it would be more consistent to keep the old entry.

It's still much better than getting stuck of course, so utACK 20e6ea2.

@gmaxwell
Copy link
Contributor

@gmaxwell gmaxwell commented Mar 2, 2019

We don't know anything about this new entry,

When we insert the new entry in tried in the first place, it's because we've actually tried it. When inserting it we may find that it causes us to want to evict an old tried entry (which has also been tried, but not as recently). What the code is supposed to do (post try before evict) is to test the old tried entry before evicting it, in order to give it a second chance to stay around iff its still connectible.

@Sjors
Copy link
Member

@Sjors Sjors commented Mar 2, 2019

Ah, I think I get it now... since we don't specifically track when things were added to m_tried_collisions, the check info_new.nLastSuccess > ADDRMAN_TEST_WINDOW is just the way we determine when the entry was added to the new table, which coincides with the time we last heard from it, i.e. when it last sent us a version message. And that's also when it's added to m_tried_collisions.

If we receive a version message:

if (strCommand == NetMsgType::VERSION) {

From an outbound node:

if (!pfrom->fInbound)

That's when we call MarkAddressGood in net.cpp which in turn calls CAddrMan::Good_, which adds it to m_tried_collisions.

@sdaftuar
Copy link
Member Author

@sdaftuar sdaftuar commented Mar 2, 2019

Ah, I think I get it now... since we don't specifically track when things were added to m_tried_collisions, the check info_new.nLastSuccess > ADDRMAN_TEST_WINDOW is just the way we determine when the entry was added to the new table

That's not necessarily when the entry was added to the new table -- we can do that when learning about an address, for instance when relayed to us via an addr message from another peer. But otherwise I think your description is correct (in particular, that should be the time that we attempted to put it in the tried table but discovered the collision).

@naumenkogs
Copy link
Member

@naumenkogs naumenkogs commented Mar 4, 2019

utACK

@sdaftuar
Copy link
Member Author

@sdaftuar sdaftuar commented Mar 6, 2019

I think this is ready to go -- anything left that I missed?

@laanwj
Copy link
Member

@laanwj laanwj commented Mar 9, 2019

utACK 20e6ea2

@laanwj laanwj merged commit 20e6ea2 into bitcoin:master Mar 9, 2019
2 checks passed
laanwj added a commit that referenced this issue Mar 9, 2019
…ow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
fanquake added a commit to fanquake/bitcoin that referenced this issue Mar 9, 2019
fanquake added a commit to fanquake/bitcoin that referenced this issue Mar 9, 2019
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: bitcoin#15486
Rebased-From: 4991e3c
fanquake added a commit to fanquake/bitcoin that referenced this issue Mar 9, 2019
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: bitcoin#15486
Rebased-From: f71fdda
fanquake added a commit to fanquake/bitcoin that referenced this issue Mar 9, 2019
@fanquake
Copy link
Member

@fanquake fanquake commented Mar 9, 2019

Backported in #15563.

Done by @laanwj in f810f14...b80dedb.

laanwj added a commit that referenced this issue Mar 9, 2019
Github-Pull: #15486
Rebased-From: 4d83401
Tree-SHA512: c7843191c470d8b3298d4375632ddbcfd26358d21100bcbffa8d40498782f9dfef08cfb7b3c040525ef912b27ddd593322b60f8f21e16e2ae9bae66433ce807e
laanwj added a commit that referenced this issue Mar 9, 2019
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: #15486
Rebased-From: 4991e3c
Tree-SHA512: 2752c9909d55ff63b9143168033d0d3952effba7f911181919eb62291edf822ec54fffbb20e35b5c5f8cb1092d75c496665da00139dbebe5ce402cbea3ad87c5
laanwj added a commit that referenced this issue Mar 9, 2019
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: #15486
Rebased-From: f71fdda
Tree-SHA512: 66c61a9f030b1666e98e4fe66dd6cfc5f1d03b2a1ec01567b195903db6e4412ac778f4464ee9ef35ae6faa1ab7e4b18ef7ecb9a7ced29e6494046990aebf7b76
laanwj added a commit that referenced this issue Mar 9, 2019
Github-Pull: #15486
Rebased-From: 20e6ea2
Tree-SHA512: 73b37a98cf4d9ede1d39c30d178d63a5a6865dba6cb7a9f33bd1e03445acb708b3007c7cde991b5de96a407262adda23279fe7a1d2ed31b0b5a33b2e97aaba9b
HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this issue Mar 10, 2019
Github-Pull: bitcoin#15486
Rebased-From: 4d83401
Tree-SHA512: c7843191c470d8b3298d4375632ddbcfd26358d21100bcbffa8d40498782f9dfef08cfb7b3c040525ef912b27ddd593322b60f8f21e16e2ae9bae66433ce807e
HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this issue Mar 10, 2019
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: bitcoin#15486
Rebased-From: 4991e3c
Tree-SHA512: 2752c9909d55ff63b9143168033d0d3952effba7f911181919eb62291edf822ec54fffbb20e35b5c5f8cb1092d75c496665da00139dbebe5ce402cbea3ad87c5
HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this issue Mar 10, 2019
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: bitcoin#15486
Rebased-From: f71fdda
Tree-SHA512: 66c61a9f030b1666e98e4fe66dd6cfc5f1d03b2a1ec01567b195903db6e4412ac778f4464ee9ef35ae6faa1ab7e4b18ef7ecb9a7ced29e6494046990aebf7b76
HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this issue Mar 10, 2019
Github-Pull: bitcoin#15486
Rebased-From: 20e6ea2
Tree-SHA512: 73b37a98cf4d9ede1d39c30d178d63a5a6865dba6cb7a9f33bd1e03445acb708b3007c7cde991b5de96a407262adda23279fe7a1d2ed31b0b5a33b2e97aaba9b
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this issue Jun 18, 2020
…ow feeler connections to existing outbound netgroups

Summary:
20e6ea259b222b10f066f22695a5f56c52071f63 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda3bc2e7acd2a8b74e882364866b8b0f55b [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c813c9848d3b3957ea3ad433f02fca9e81 [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d834018e368c3481a5421891395f64aa9002185 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6

Backport of Core [[bitcoin/bitcoin#15486 | PR15486]]

This PR was in response to bitcoin/bitcoin#15484, but I did not find the issue mentioned [[ bitcoin/bitcoin#15484 (comment) | here ]] in my debug log while running master.

Test Plan:
  ninja
  ninja check-all

Reviewers: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix

Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix

Subscribers: jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D6489
ftrader added a commit to bitcoin-cash-node/bitcoin-cash-node that referenced this issue Aug 17, 2020
…ow feeler connections to existing outbound netgroups

Summary:
20e6ea259b222b10f066f22695a5f56c52071f63 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda3bc2e7acd2a8b74e882364866b8b0f55b [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c813c9848d3b3957ea3ad433f02fca9e81 [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d834018e368c3481a5421891395f64aa9002185 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6

Backport of Core [[bitcoin/bitcoin#15486 | PR15486]]

This PR was in response to bitcoin/bitcoin#15484, but I did not find the issue mentioned [[ bitcoin/bitcoin#15484 (comment) | here ]] in my debug log while running master.

Test Plan:
  ninja
  ninja check-all

Reviewers: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix

Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix

Subscribers: jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D6489
random-zebra added a commit to PIVX-Project/PIVX that referenced this issue Apr 8, 2021
…eler connections to existing outbound netgroups

ccb6a4a [addrman] Improve collision logging and address nits (Suhas Daftuar)
aa3c805 [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
efd1d6a [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
afcffc4 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  Two pretty direct back ports for the address manager.
  Coming from bitcoin#13115 and bitcoin#15486.

ACKs for top commit:
  random-zebra:
    rebase utACK ccb6a4a, and merging...

Tree-SHA512: 89d42b3164f92fcee69bc981c7e0887cab09820088732297dbcadbf4ccfc3257e9564ef2b394205f3153c8a03e4c0a0e8145b1c291e8cd980dc06b0ffa65a28c
PastaPastaPasta added a commit to PastaPastaPasta/dash that referenced this issue Jun 25, 2021
…and allow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
PastaPastaPasta added a commit to PastaPastaPasta/dash that referenced this issue Jun 25, 2021
…and allow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6

# Conflicts:
#	src/addrman.cpp
#	src/net.cpp
PastaPastaPasta added a commit to PastaPastaPasta/dash that referenced this issue Jun 25, 2021
…and allow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
PastaPastaPasta added a commit to PastaPastaPasta/dash that referenced this issue Jun 26, 2021
…and allow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants