p2p: peer connection bug fixes #28248

jonatack · 2023-08-09T20:46:20Z

This pull fixes several peer connection bugs in our p2p code, along with the logging that uncovered them:

Fix detection of inbound peer connections in GetAddedNodeInfo.
Fix addnode CJDNS peers not detected in GetAddedNodeInfo, causing ThreadOpenAddedConnections to continually retry to connect to them and RPC getaddednodeinfo incorrectly showing them as not connected.
Fix ThreadOpenConnections not detecting inbound CJDNS connections and making automatic outbound connections to them.
Fix detection of already connected peers in AlreadyConnectedToAddress().
Fix detection of already connected peers when making outbound connections in ConnectNode.
Do not accept inbound connections in CreateNodeFromAcceptedSocket from I2P peers we're already connected to, as building I2P tunnels is expensive.
Fix making automatic outbound connections in ThreadOpenConnections to addnode peers, in order not to allocate our limited outbound slots to them and to ensure addnode connections benefit from their intended protections. Our addnode logic usually connects the addnode peers before the automatic outbound logic does, but not always, as a connection race can occur (see the commit message for further details and mainnet examples). When an addnode peer is connected as an automatic outbound peer and is the only connection we have to a network, it can be protected by our new outbound network-specific eviction logic and persist in the "wrong role". Fix these issues by checking if the selected address is an addnode peer in our automatic outbound connection logic.

Update the p2p logging with the improvements that allowed seeing/understanding/debugging the current behavior. Please see the commit messages for details.

Simplify MaybePickPreferredNetwork to return std::optional, make it a const class method, and add Clang thread-safety analysis annotation and related assertions.

DrahtBot · 2023-08-09T20:46:24Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	brunoerg
Stale ACK	vasild

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28155 (net: improves addnode / m_added_nodes logic by sr-gi)
#27114 (p2p: Allow whitelisting outgoing connections by brunoerg)
#27052 (test: rpc: add last block announcement time to getpeerinfo result by LarryRuane)
#26859 (fuzz: extend ConsumeNetAddr() to return I2P and CJDNS addresses by vasild)
#25832 (tracing: network connection tracepoints by 0xB10C)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

dergoegge · 2023-08-10T10:03:44Z

The title "p2p: outbound network diversity improvements" makes it sound like you are improving on the logic for making outbound connections but really you are refactoring and adding additional logging. Would you mind making the title more accurate?

Please see the commit messages for details.

The PR description is added to the merge commit, so in my opinion it makes sense to have at least a summary of what the PR does in the description.

From our docs:

The body of the pull request should contain sufficient description of what the patch does, and even more importantly, why, with 
justification and reasoning. You should include references to any discussions (for example, other issues or mailing list
discussions).

jonatack · 2023-08-10T21:52:33Z

@dergoegge updated

ajtowns · 2023-08-11T05:51:05Z

This mostly seems like pointless refactoring (changing a switch to an || of two functions that themselves have switches). Doesn't seem worth the review effort to me.

jonatack · 2023-08-14T04:12:30Z

Updated to propose fixes for the issues observed with the improved logging.

src/net.cpp

jonatack · 2023-08-15T20:16:18Z

Updated for @vasild's review feedback to change m_added_nodes from a vector to an unordered set (3f4a1f0), which simplifies and speeds up the AddNode() and RemoveAddedNode() methods along with the commit that follows it, p2p: do not make automatic outbound connections to addnode peers (048e1af).

src/net.cpp

maflcko · 2023-11-01T19:56:31Z

Maybe mark as draft while CI is red?

jonatack · 2023-11-01T20:07:31Z

Done. Note that only the latest push is red. I can push a version that is green, but have been rebasing on #28155 that I reckon will be merged soon, and adding missing test coverage. Concept ACKs on fixing the bugs would be encouraging 😃

jonatack · 2023-11-01T20:15:25Z

Note also, that item 7 in the PR description involves a regression in v26.x.

fanquake · 2023-11-01T20:48:37Z

Note also, that item 7 in the PR description involves a regression in v26.x.

What is the regression? Which change caused it?

mzumsande · 2023-11-01T21:18:38Z

Concept ACKs on fixing the bugs would be encouraging 😃

I think that some of the commits are straightforward (e.g. the missing MaybeFlipIPv6toCJDNS calls or the logging improvements), whereas others are fixing some non-ideal behavior in some cases, but have the risk of introducing new non-ideal behavior in other cases because the fix comes with downsides of its own.

For example, if it isn't possible to distinguish situations in which two connections involving the same IP are controlled by the same entity or different entities, any solution would be imperfect, either allowing duplicate connections or preventing legitimate behavior (that is possibly widely used in local setups). So it's unclear to me if the status quo on these is actually a bug or an intended best-effort solution / the result of a compromise.

Maybe some of the straightforward bugfixes could be split out into another PR?

jonatack · 2023-11-01T21:20:02Z

Reported in https://github.com/bitcoin/bitcoin/pull/27213/files#r1291926369 and described in 21f8426, there is an interaction between the new network-specific automatic outbound logic and the addnode list: the former doesn't account for the latter.

Notable changes
===============

P2P and network changes
-----------------------

- Nodes with multiple reachable networks will actively try to have at least one
  outbound connection to each network. This improves individual resistance to
  eclipse attacks and network level resistance to partition attacks. Users no
  longer need to perform active measures to ensure being connected to multiple
  enabled networks. (#27213)

Node operators running over multiple networks may be likely to use -addnode to ensure connection to each of them (I do). They would be the ones most likely to be affected by the change with their addnode peers being allotted to rare outbound slots instead. It makes sense to exclude addnode peers from the automatic outbound logic, either in general or for the new network-specific one.

jonatack · 2023-11-01T21:25:26Z

Maybe some of the straightforward bugfixes could be split out into another PR?

Yes -- looking at doing that, or dropping here for now any that aren't relatively clear.

src/net.cpp

DrahtBot · 2023-11-08T13:21:48Z

🐙 This pull request conflicts with the target branch and needs rebase.

mzumsande · 2023-11-15T15:02:52Z

Note also, that item 7 in the PR description involves a regression in v26.x.

I don't think that this can be called a regression:

There is nothing new about the root problem (making automatic connections to addnode peers in rare cases) in v26.
It can happen that such a peer could now be protected as the only one of its network, but to determine if it's a regression this needs to be compared with the status before.
Before we did protection-by-network, we would only ever get into the situation with an extra-outbound peer if we had a stale tip (30 minutes). Unless we are eclipsed or something weird is going on, this happens naturally a few times per day on average.
If we are in that situation, it is not very probable that we'd evict the outbound peer that we'd rather have as an addnode peer. First of all, we'd usually just evict the new extra-peer (unless it provides a new block, which should happen very rarely, especially when we have other high-bandwidth OB peers at the same time). But even in the rare cases, we often would not evict the specific peer to solve the addnode problem, that peer might be one of the 4 protected peers, or we might just pick another one.

I'd guess you could probably have waited for months to actually evict a mis-placed peer that way. So I'd say that the stale-tip eviction was never a functioning mechanism to resolve these kind of addnode problems (and never meant to be), and as a result, #27213 was not a regression - in practice, it might arguably be even an improvement instead of a regression, because in those cases where the peer in question is not the only one from its network, we'll get into the situation where we have an extra peer to evict more frequently than before.

jonatack · 2023-11-15T15:15:17Z

@mzumsande The improved logging allows distinguishing when this is due to the new protection logic, and with occasional exceptions, it is.

mzumsande · 2023-11-15T15:31:00Z

@mzumsande The improved logging allows distinguishing when this is due to the new protection logic, and with occasional exceptions, it is.

That's not my point. My point (see above) is that in earlier versions, the extra-outbound eviction (which would only happen in the stale-tip case then) was not a working or reliable mechanism to evict these peers, so there is no regression. Or did I miss something in my above explanation / do you have logs that pre-26 the stale-tip outbound eviction could somehow resolve this problem reliably?

jonatack · 2023-11-15T15:42:06Z

Empirically, the regression is that we're actively adding addnode peers as outbound ones, and keeping them there. Since a couple of years I run a node with all three of our privacy networks and addnode entries for each, along with clearnet, and I didn't see my addnode peers being connected to as non-manual outbounds. This is new behavior for my node.

Aside, I'm adding unit tests for each change right now, and these are finding additional issues.

mzumsande · 2023-11-15T16:26:13Z

Empirically, the regression is that we're actively adding addnode peers as outbound ones, and keeping them there.

Oh, I thought you were talking about a race at startup - since the extra outbound should only kick in after 5 minutes, I think that should be enough to make connections to the addnode peers.

I think what might be happening later is that an addnode peer disconnects us, and if it was the only one from its network then there could be a race between picking another network-related outbound peer (which could be the peer that just disconnected us if addrman would select it) and reconnecting to the peer that evicted us with addnode thread. Could that be what you are seeing?

jonatack · 2023-11-15T17:30:11Z

@mzumsande I think so, yes, along with the other races mentioned in the commit message 21f8426. I'll add the case you describe to it (thanks!)

jonatack · 2023-11-15T17:33:41Z

I'd be interested to hear anyone's thoughts on this, from that commit message:

Finally, there does not seem to be a reason to make block-relay or short-lived feeler connections to addnode peers, as the addnode logic will ensure we connect to them if they are up, within the addnode connection limit.

This may not be optimal if one were to add many addnode entries, i.e. more than 8 that are online. I don't know if people do that. Maybe it would be prudent to allow feelers to addnodes.

mzumsande · 2023-11-15T19:25:20Z

Maybe it would be prudent to allow feelers to addnodes.

I think it's fine to skip feelers for addnode peers. When we would skip these addresses for automatic connection, I don't see what making a feeler connection to them would achieve, considering that the point of feelers is to have better options for future automatic connections.

DrahtBot · 2024-02-13T01:16:38Z

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

jonatack · 2024-04-08T16:19:52Z

Will continue work on these bugfixes. Closing to be able to re-open it.

d0b0474 test: add GetAddedNodeInfo() CJDNS regression unit test (Jon Atack) 684da97 p2p, bugfix: detect addnode cjdns peers in GetAddedNodeInfo() (Jon Atack) Pull request description: Addnode peers connected to us via the cjdns network are currently not detected by `CConnman::GetAddedNodeInfo()`, i.e. `fConnected` is always false. This causes the following issues: - RPC `getaddednodeinfo` incorrectly shows them as not connected - `CConnman::ThreadOpenAddedConnections()` continually retries to connect them Fix the issue and add a unit regression test. Extracted from #28248. Suggest running the test with: `./src/test/test_bitcoin -t net_peer_connection_tests -l test_suite` ACKs for top commit: mzumsande: utACK d0b0474 brunoerg: crACK d0b0474 pinheadmz: ACK d0b0474 Tree-SHA512: a4d81425f79558f5792585611f3fe8ab999b82144daeed5c3ec619861c69add934c2b2afdad24c8488a0ade94f5ce8112f5555d60a1ce913d4f5a1cf5dbba55a

DrahtBot added the P2P label Aug 9, 2023

jonatack marked this pull request as ready for review August 10, 2023 00:56

jonatack changed the title ~~p2p: outbound network diversity improvements~~ p2p: outbound network diversity refactoring and logging improvements Aug 10, 2023

jonatack changed the title ~~p2p: outbound network diversity refactoring and logging improvements~~ p2p: outbound logic, logging and refactoring improvements Aug 12, 2023

jonatack force-pushed the 2023-08-network-diversity branch from c5c38a7 to 21b0895 Compare August 12, 2023 00:18

jonatack mentioned this pull request Aug 12, 2023

p2p: Diversify automatic outbound connections with respect to networks #27213

Merged

This was referenced Aug 12, 2023

Use shared_ptr for CNode inside CConnman #28222

Closed

net: improves addnode / m_added_nodes logic #28155

Merged

p2p: Allow whitelisting manual connections #27114

Merged

refactor: Continue moving application data from CNode to Peer #26621

Closed

jonatack changed the title ~~p2p: outbound logic, logging and refactoring improvements~~ p2p: bugfixes, logic and logging improvements Aug 13, 2023

jonatack force-pushed the 2023-08-network-diversity branch 2 times, most recently from 428169f to 875d7b0 Compare August 14, 2023 00:03

jonatack force-pushed the 2023-08-network-diversity branch from 875d7b0 to 4b9774d Compare August 14, 2023 04:18

vasild reviewed Aug 14, 2023

View reviewed changes

src/net.cpp Outdated Show resolved Hide resolved

DrahtBot mentioned this pull request Aug 15, 2023

test: rpc: add last block announcement time to getpeerinfo result #27052

Open

jonatack force-pushed the 2023-08-network-diversity branch from 4b9774d to a411cb5 Compare August 15, 2023 17:27

DrahtBot added the CI failed label Aug 15, 2023

DrahtBot mentioned this pull request Aug 16, 2023

Handle CJDNS from LookupSubNet() #27071

Merged

vasild reviewed Aug 16, 2023

View reviewed changes

src/net.cpp Outdated Show resolved Hide resolved

src/net.cpp Outdated Show resolved Hide resolved

src/net.cpp Outdated Show resolved Hide resolved

src/net.cpp Outdated Show resolved Hide resolved

DrahtBot mentioned this pull request Aug 24, 2023

BIP324 integration #28331

Merged

jonatack force-pushed the 2023-08-network-diversity branch 2 times, most recently from abc2507 to 7019c23 Compare September 6, 2023 06:20

DrahtBot mentioned this pull request Sep 6, 2023

tracing: network connection tracepoints #25832

Open

jonatack mentioned this pull request Oct 29, 2023

Test calling addnode more than once (and MAX_ADDNODE_CONNECTIONS) #28635

Open

DrahtBot mentioned this pull request Oct 31, 2023

fuzz: extend ConsumeNetAddr() to return I2P and CJDNS addresses #26859

Merged

jonatack mentioned this pull request Oct 31, 2023

Evicting and filling attack for linking multiple network addresses #28760

Open

jonatack marked this pull request as draft November 1, 2023 20:07

dergoegge reviewed Nov 5, 2023

View reviewed changes

src/net.cpp Show resolved Hide resolved

DrahtBot added the Needs rebase label Nov 8, 2023

jonatack mentioned this pull request Nov 14, 2023

[26.x] Changes for rc3 #28872

Merged

jonatack mentioned this pull request Nov 16, 2023

p2p: do not make automatic outbound connections to addnode peers #28895

Merged

jonatack closed this Apr 8, 2024

jonatack mentioned this pull request May 11, 2024

p2p: detect addnode cjdns peers in GetAddedNodeInfo() #30085

Merged

bitcoin deleted a comment from Waseem650 May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p: peer connection bug fixes #28248

p2p: peer connection bug fixes #28248

jonatack commented Aug 9, 2023 •

edited

Loading

DrahtBot commented Aug 9, 2023 •

edited

Loading

dergoegge commented Aug 10, 2023

jonatack commented Aug 10, 2023

ajtowns commented Aug 11, 2023

jonatack commented Aug 14, 2023

jonatack commented Aug 15, 2023

maflcko commented Nov 1, 2023

jonatack commented Nov 1, 2023

jonatack commented Nov 1, 2023

fanquake commented Nov 1, 2023

mzumsande commented Nov 1, 2023 •

edited

Loading

jonatack commented Nov 1, 2023

jonatack commented Nov 1, 2023

DrahtBot commented Nov 8, 2023

mzumsande commented Nov 15, 2023 •

edited

Loading

jonatack commented Nov 15, 2023

mzumsande commented Nov 15, 2023

jonatack commented Nov 15, 2023

mzumsande commented Nov 15, 2023 •

edited

Loading

jonatack commented Nov 15, 2023

jonatack commented Nov 15, 2023 •

edited

Loading

mzumsande commented Nov 15, 2023

DrahtBot commented Feb 13, 2024

jonatack commented Apr 8, 2024

p2p: peer connection bug fixes #28248

p2p: peer connection bug fixes #28248

Conversation

jonatack commented Aug 9, 2023 • edited Loading

DrahtBot commented Aug 9, 2023 • edited Loading

Code Coverage

Reviews

Conflicts

dergoegge commented Aug 10, 2023

jonatack commented Aug 10, 2023

ajtowns commented Aug 11, 2023

jonatack commented Aug 14, 2023

jonatack commented Aug 15, 2023

maflcko commented Nov 1, 2023

jonatack commented Nov 1, 2023

jonatack commented Nov 1, 2023

fanquake commented Nov 1, 2023

mzumsande commented Nov 1, 2023 • edited Loading

jonatack commented Nov 1, 2023

jonatack commented Nov 1, 2023

DrahtBot commented Nov 8, 2023

mzumsande commented Nov 15, 2023 • edited Loading

jonatack commented Nov 15, 2023

mzumsande commented Nov 15, 2023

jonatack commented Nov 15, 2023

mzumsande commented Nov 15, 2023 • edited Loading

jonatack commented Nov 15, 2023

jonatack commented Nov 15, 2023 • edited Loading

mzumsande commented Nov 15, 2023

DrahtBot commented Feb 13, 2024

jonatack commented Apr 8, 2024

jonatack commented Aug 9, 2023 •

edited

Loading

DrahtBot commented Aug 9, 2023 •

edited

Loading

mzumsande commented Nov 1, 2023 •

edited

Loading

mzumsande commented Nov 15, 2023 •

edited

Loading

mzumsande commented Nov 15, 2023 •

edited

Loading

jonatack commented Nov 15, 2023 •

edited

Loading