-
Notifications
You must be signed in to change notification settings - Fork 35.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: peer connection bug fixes #28248
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
The title "p2p: outbound network diversity improvements" makes it sound like you are improving on the logic for making outbound connections but really you are refactoring and adding additional logging. Would you mind making the title more accurate?
The PR description is added to the merge commit, so in my opinion it makes sense to have at least a summary of what the PR does in the description. From our docs:
|
@dergoegge updated |
This mostly seems like pointless refactoring (changing a |
c5c38a7
to
21b0895
Compare
428169f
to
875d7b0
Compare
Updated to propose fixes for the issues observed with the improved logging. |
875d7b0
to
4b9774d
Compare
4b9774d
to
a411cb5
Compare
abc2507
to
7019c23
Compare
Maybe mark as draft while CI is red? |
Done. Note that only the latest push is red. I can push a version that is green, but have been rebasing on #28155 that I reckon will be merged soon, and adding missing test coverage. Concept ACKs on fixing the bugs would be encouraging 😃 |
Note also, that item 7 in the PR description involves a regression in v26.x. |
What is the regression? Which change caused it? |
I think that some of the commits are straightforward (e.g. the missing For example, if it isn't possible to distinguish situations in which two connections involving the same IP are controlled by the same entity or different entities, any solution would be imperfect, either allowing duplicate connections or preventing legitimate behavior (that is possibly widely used in local setups). So it's unclear to me if the status quo on these is actually a bug or an intended best-effort solution / the result of a compromise. Maybe some of the straightforward bugfixes could be split out into another PR? |
Reported in https://github.com/bitcoin/bitcoin/pull/27213/files#r1291926369 and described in 21f8426, there is an interaction between the new network-specific automatic outbound logic and the addnode list: the former doesn't account for the latter. Notable changes
===============
P2P and network changes
-----------------------
- Nodes with multiple reachable networks will actively try to have at least one
outbound connection to each network. This improves individual resistance to
eclipse attacks and network level resistance to partition attacks. Users no
longer need to perform active measures to ensure being connected to multiple
enabled networks. (#27213) Node operators running over multiple networks may be likely to use -addnode to ensure connection to each of them (I do). They would be the ones most likely to be affected by the change with their addnode peers being allotted to rare outbound slots instead. It makes sense to exclude addnode peers from the automatic outbound logic, either in general or for the new network-specific one. |
Yes -- looking at doing that, or dropping here for now any that aren't relatively clear. |
🐙 This pull request conflicts with the target branch and needs rebase. |
I don't think that this can be called a regression:
I'd guess you could probably have waited for months to actually evict a mis-placed peer that way. So I'd say that the stale-tip eviction was never a functioning mechanism to resolve these kind of addnode problems (and never meant to be), and as a result, #27213 was not a regression - in practice, it might arguably be even an improvement instead of a regression, because in those cases where the peer in question is not the only one from its network, we'll get into the situation where we have an extra peer to evict more frequently than before. |
@mzumsande The improved logging allows distinguishing when this is due to the new protection logic, and with occasional exceptions, it is. |
That's not my point. My point (see above) is that in earlier versions, the extra-outbound eviction (which would only happen in the stale-tip case then) was not a working or reliable mechanism to evict these peers, so there is no regression. Or did I miss something in my above explanation / do you have logs that pre-26 the stale-tip outbound eviction could somehow resolve this problem reliably? |
Empirically, the regression is that we're actively adding addnode peers as outbound ones, and keeping them there. Since a couple of years I run a node with all three of our privacy networks and addnode entries for each, along with clearnet, and I didn't see my addnode peers being connected to as non-manual outbounds. This is new behavior for my node. Aside, I'm adding unit tests for each change right now, and these are finding additional issues. |
Oh, I thought you were talking about a race at startup - since the extra outbound should only kick in after 5 minutes, I think that should be enough to make connections to the addnode peers. I think what might be happening later is that an addnode peer disconnects us, and if it was the only one from its network then there could be a race between picking another network-related outbound peer (which could be the peer that just disconnected us if addrman would select it) and reconnecting to the peer that evicted us with addnode thread. Could that be what you are seeing? |
@mzumsande I think so, yes, along with the other races mentioned in the commit message 21f8426. I'll add the case you describe to it (thanks!) |
I'd be interested to hear anyone's thoughts on this, from that commit message: Finally, there does not seem to be a reason to make block-relay or short-lived feeler connections to addnode peers, as the addnode logic will ensure we connect to them if they are up, within the addnode connection limit. This may not be optimal if one were to add many addnode entries, i.e. more than 8 that are online. I don't know if people do that. Maybe it would be prudent to allow feelers to addnodes. |
I think it's fine to skip feelers for addnode peers. When we would skip these addresses for automatic connection, I don't see what making a feeler connection to them would achieve, considering that the point of feelers is to have better options for future automatic connections. |
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Will continue work on these bugfixes. Closing to be able to re-open it. |
d0b0474 test: add GetAddedNodeInfo() CJDNS regression unit test (Jon Atack) 684da97 p2p, bugfix: detect addnode cjdns peers in GetAddedNodeInfo() (Jon Atack) Pull request description: Addnode peers connected to us via the cjdns network are currently not detected by `CConnman::GetAddedNodeInfo()`, i.e. `fConnected` is always false. This causes the following issues: - RPC `getaddednodeinfo` incorrectly shows them as not connected - `CConnman::ThreadOpenAddedConnections()` continually retries to connect them Fix the issue and add a unit regression test. Extracted from #28248. Suggest running the test with: `./src/test/test_bitcoin -t net_peer_connection_tests -l test_suite` ACKs for top commit: mzumsande: utACK d0b0474 brunoerg: crACK d0b0474 pinheadmz: ACK d0b0474 Tree-SHA512: a4d81425f79558f5792585611f3fe8ab999b82144daeed5c3ec619861c69add934c2b2afdad24c8488a0ade94f5ce8112f5555d60a1ce913d4f5a1cf5dbba55a
This pull fixes several peer connection bugs in our p2p code, along with the logging that uncovered them:
Fix detection of inbound peer connections in
GetAddedNodeInfo
.Fix addnode CJDNS peers not detected in
GetAddedNodeInfo
, causingThreadOpenAddedConnections
to continually retry to connect to them and RPCgetaddednodeinfo
incorrectly showing them as not connected.Fix
ThreadOpenConnections
not detecting inbound CJDNS connections and making automatic outbound connections to them.Fix detection of already connected peers in
AlreadyConnectedToAddress()
.Fix detection of already connected peers when making outbound connections in
ConnectNode
.Do not accept inbound connections in
CreateNodeFromAcceptedSocket
from I2P peers we're already connected to, as building I2P tunnels is expensive.Fix making automatic outbound connections in
ThreadOpenConnections
to addnode peers, in order not to allocate our limited outbound slots to them and to ensure addnode connections benefit from their intended protections. Our addnode logic usually connects the addnode peers before the automatic outbound logic does, but not always, as a connection race can occur (see the commit message for further details and mainnet examples). When an addnode peer is connected as an automatic outbound peer and is the only connection we have to a network, it can be protected by our new outbound network-specific eviction logic and persist in the "wrong role". Fix these issues by checking if the selected address is an addnode peer in our automatic outbound connection logic.Update the p2p logging with the improvements that allowed seeing/understanding/debugging the current behavior. Please see the commit messages for details.
Simplify
MaybePickPreferredNetwork
to returnstd::optional
, make it a const class method, and add Clang thread-safety analysis annotation and related assertions.