New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: attempt to fill full outbound connection slots with peers that support tx relay #28538
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
b5f7e09
to
32f044d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
I think this measure makes perfect sense, it's easy and immediately approaches the mentioned issues.
I would be fine with restricting this (as suggested by the OP message) too.
32f044d
to
0034a8f
Compare
src/net_processing.cpp
Outdated
// Don't waste any of our full outbound slots on peers that don't want tx relay (because they might be in -blocksonly mode) | ||
if (!fRelay && !m_opts.ignore_incoming_txs && pfrom.IsFullOutboundConn()) | ||
{ | ||
LogPrint(BCLog::NET, "peer=%d (full-relay outbound) does not participate in transaction relay; disconnecting\n", pfrom.GetId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this immediaely upon connection (ie, after receiving their VERSION message), would it be better to do it only when we have all our outbound slots filled up (ie, as part of EvictExtraOutboundPeers
) ?
Also, doing the check repeatedy would mean that we could keep blocks-only peers while we're doing IBD, only dropping them when we exit IBD and start caring about tx relay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this immediaely upon connection (ie, after receiving their VERSION message), would it be better to do it only when we have all our outbound slots filled up (ie, as part of EvictExtraOutboundPeers) ?
I like that idea. If for some strange reason we wouldn't be able to find any other peers we wouldn't end up connecting the only ones we have. But I think we would want to start evicting if we are at the limit (so in CheckForStaleTipAndEvictPeers
), not just if we have exceeded it (EvictExtraOutboundPeers
).
Also, doing the check repeatedy would mean that we could keep blocks-only peers while we're doing IBD, only dropping them when we exit IBD and start caring about tx relay.
Do we really want that? The main reason to run in -blocksonly
mode is to save traffic. I would assume that most nodes wouldn't enable listening anyway, but those that do (maybe not even on purpose), maybe it would be nicer not to use them for IBD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start evicting if we are at the limit (so in CheckForStaleTipAndEvictPeers), not just if we have exceeded it (EvictExtraOutboundPeers).
I agree.
Do we really want that? The main reason to run in -blocksonly mode is to save traffic. I would assume that most nodes wouldn't enable listening anyway, but those that do (maybe not even on purpose), maybe it would be nicer not to use them for IBD?
Not sure about this. You seem to care specifically about those users which do -blocksonly
and forget to disable inbounds. Perhaps this is better communicated through documentation and logging. I think we should leave them a chance to strengthen the block relaying network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to having the logic in eviction instead of connection
RE IBD:
it seems odd / logically inconsistent to run a node in -blocksonly
and enable inbounds, so I don't think it's worth overly optimizing for the use case. If we want to improve the user experience, we could have a soft param interaction between the two (independent of these changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems odd / logically inconsistent to run a node in -blocksonly and enable inbounds, so I don't think it's worth overly optimizing for the use case
Maybe there is a use case for wanting to help the network by providing inbound capacity, but still minimize traffic at the same time? Although then it often doesn't make a lot of sense to run a non-pruned node - unless you also need the entire chain for selfish purposes (e.g. txindex).
In any case, I have included the suggestion to not evict while in ibd in the latest push!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want that? The main reason to run in
-blocksonly
mode is to save traffic.
In that case you'd be better off doing -maxuploadtarget
either instead of or in addition to -blocksonly
?
it seems odd / logically inconsistent to run a node in
-blocksonly
and enable inbounds,
I think it would make sense to expect that if 100% (or 99.9%) of nodes were -blocksonly
that they would still manage to propagate blocks sensibly, bring up new nodes efficiently, etc. In theory with this PR, we would also be able to propagate txs successfully in the 99.9% -blocksonly
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concept ACK
src/net_processing.cpp
Outdated
// Don't waste any of our full outbound slots on peers that don't want tx relay (because they might be in -blocksonly mode) | ||
if (!fRelay && !m_opts.ignore_incoming_txs && pfrom.IsFullOutboundConn()) | ||
{ | ||
LogPrint(BCLog::NET, "peer=%d (full-relay outbound) does not participate in transaction relay; disconnecting\n", pfrom.GetId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to having the logic in eviction instead of connection
RE IBD:
it seems odd / logically inconsistent to run a node in -blocksonly
and enable inbounds, so I don't think it's worth overly optimizing for the use case. If we want to improve the user experience, we could have a soft param interaction between the two (independent of these changes)
0034a8f
to
831ed49
Compare
I've now changed the approach to @ajtowns suggestion to disconnecting during eviction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concept/approach ACK
@@ -673,6 +673,7 @@ def add_p2p_connection(self, p2p_conn, *, wait_for_verack=True, **kwargs): | |||
def add_outbound_p2p_connection(self, p2p_conn, *, wait_for_verack=True, p2p_idx, connection_type="outbound-full-relay", **kwargs): | |||
"""Add an outbound p2p connection from node. Must be an | |||
"outbound-full-relay", "block-relay-only", "addr-fetch" or "feeler" connection. | |||
The txrelay arg determines whether our peer will signal support for txrelay in their version message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
831ed49: seems like this should be in the peer_connect_send_version
function docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the doc, since it's hidden in the kwargs
anyway (It wasn't when I added it in an earlier revision). Within peer_connect_send_version
it's kind of self-explanatory, I had added it here to avoid confusion to which side of the connection txrelay
refers to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, just some nits here and there
src/net_processing.cpp
Outdated
m_connman.ForEachNode([&node_to_evict](CNode* node) { | ||
if (!node->IsFullOutboundConn() || node->m_relays_txs) return; | ||
if (!node_to_evict.has_value() || *node_to_evict < node->GetId()) { | ||
node_to_evict = node->GetId(); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there is, at least, a node already flagged to be disconnected? Shouldn't we skip this altogether? Otherwise, it could be the case that we end up evicting more nodes that we really need to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, ForEachNode
already checks NodeFullyConnected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's why I didn't include the check here (the similar checks a few lines above aren't necessary either)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume fDisconnect
could be useful here too. Just in case ForEachNode
gets updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updating ForEachNode
to no longer check NodeFullyConnected
seems pretty intrusive & like all callers would need to be carefully checked, so imo it doesn't seem super valuable to repeat the check here. but cost of redundant check is also pretty low so doesn't seem like a big deal either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't see that as a real risk, but I do see the problem that whoever reads this code (or decides to use ForEachNode
somewhere else) could easily get confused, especially since some callers check again for fDisconnect
.
I added a refactor commit renaming ForEachNode()
to ForEachFullyConnectedNode()
and removing the duplicate checks / comments. What do you think about that approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, i suggested literally having Assume(fDisconnect)
, not an extra if
check (that would actually make code worse).
Even if you refactor it into ForEachFullyConnectedNode
, I think this Assume(fDisconnect)
is kinda useful — because who knows what exactly ForEachFullyConnectedNode
means :)
But I'm fine either way, either you rename or add Assume
(both are ideal i think :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a refactor commit renaming ForEachNode() to ForEachFullyConnectedNode() and removing the duplicate checks / comments.
I like it, I find it clear
831ed49
to
a361ac4
Compare
Pushed to address feedback. I also removed a sub-test with a block-relay-only peer, there isn't really a reason to think that that peer would be disconnected anyway, and it didn't make much sense in that form because we weren't at any connection limit. |
a361ac4
to
cd7c42e
Compare
blocksonly_peer1.peer_disconnect() | ||
|
||
self.log.info("Check that we don't evict full-relay outbound connections to blocksonly peers if we are blocksonly ourselves") | ||
self.restart_node(0, ["-blocksonly", "-maxconnections=1"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why anyone would do maxconnections=1
in practice, but then if their firewall allows to find only one blocksonly node (and they are not in blocksonly), the behavior might be confusing: they will be frequently disconencted from this peer.
Perhaps, this could be logged if maxconnections=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it makes no sense, if you want just one connection you should pick a node you trust and do -connect
. Using automatic connections for that means eclipsing yourself - maybe we should warn about that in general?
That being said, this PR does log disconnection to -blocksonly
peers, although only in BCLog::NET
- do you suggest to make it unconditional based on -maxconnections
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea was to log if maxconnect is set to 1
, so at the startup (setting this argument) time.
I don't insist though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo users should get an error if they set -maxconnections
to less than 8, or at least a "this is unsafe. are you SURE?!". but that's probably getting off topic here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree with adding this warning, but yeah, this PR is not the best place for it.
cd7c42e
to
840a022
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good to me. left a comment about the test that seems worth fixing, then I'm ready to ACK the patch.
trying to think about any potential negative network effects that could occur, but so far it seems fine to disconnect OB full relay peers that signal they don't want to receive transactions.
blocksonly_peer1.peer_disconnect() | ||
|
||
self.log.info("Check that we don't evict full-relay outbound connections to blocksonly peers if we are blocksonly ourselves") | ||
self.restart_node(0, ["-blocksonly", "-maxconnections=1"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo users should get an error if they set -maxconnections
to less than 8, or at least a "this is unsafe. are you SURE?!". but that's probably getting off topic here :)
840a022
to
bbb01b1
Compare
840a022 to d636e38: ToDo: address #28538 (comment) |
bbb01b1
to
d636e38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK d636e38
src/net_processing.cpp
Outdated
m_connman.ForEachNode([&node_to_evict](CNode* node) { | ||
if (!node->IsFullOutboundConn() || node->m_relays_txs) return; | ||
if (!node_to_evict.has_value() || *node_to_evict < node->GetId()) { | ||
node_to_evict = node->GetId(); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a refactor commit renaming ForEachNode() to ForEachFullyConnectedNode() and removing the duplicate checks / comments.
I like it, I find it clear
d636e38
to
2a17ac3
Compare
…bound maximum ...unless we are in -blocksonly mode ourselve or in ibd. If we had too many outbound peers not participating in tx relay, we would run into problems with getting own transaction relayed. This would also be bad for privacy and fee estimation. We evict non-txrelay peers only when all full outbound slots are filled to have peers in hypothetical situations where -blocksonly peers are the only ones available. Idea for this approach by ajtowns.
2a17ac3
to
ad48667
Compare
ACK ad48667 |
// evict the newest of them (highest Node Id) so we can fill the slot with a tx-relaying outbound peer | ||
// This is not done if we are in -blocksonly mode ourselves or still in IBD, | ||
// because then we don't care if our peer supports tx relay. | ||
else if (full_outbound_delta == 0 && !m_opts.ignore_incoming_txs && !m_chainman.IsInitialBlockDownload()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not super clear why we won't do this if full_outbound_delta > 0
?
Maybe a comment explaining why this could be possible in the first place could help (probably above the first if block).
Even now it's hard to tell if it's possible to be over the limit most-of-the-time, let alone if we add some new complexity for being over the limit more frequently (for some security reason or whatever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think the main reason was to avoid interaction with the existing extra-full-outbound eviction.
Which raises the question whether there should be any interaction?
Should nodes that are protected from that (either because m_protect
is true or because they are the only ones for their network) also be made exempt from this new disconnection due to not supporting tx relay?
I'm not really sure about that: It would be unfortunate if the non-tx-relaying outbound peer would be the only one with the best chain, but how realistic would such a scenario be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not super clear why we won't do this if full_outbound_delta > 0?
Which raises the question whether there should be any interaction?
I think this would create a weird interaction. The current approach of this PR is trying to make the 8 full-outbound connections relay transactions, given we already have blocks-only connections, so we should not waste full-outbound slots for this purpose. However, our full-outbound connection protection criteria is based on whether they have provided valid blocks with enough accumulated work and it has nothing to do with how good they are at providing transactions (there is also the recently added network diversity criteria, more about that later).
This raises the question of what we value the most, having 8 full-outbound peers relaying transactions or our protected peers (and under what criteria). If the former is valued the most, we could merge this under full_outbound_delta > 0
but pick a node not taking the protection criteria into account (which raises the question of how useful the protection is to start with). If the latter is prioritized, then we exclude the protected peers first and then run the eviction over the remaining. This slightly changes the goal of the PR, given we may not achieve 8 transaction-relaying outbound peers. Finally, there are the additional protected peers for network diversity. Depending on how this is treated we may end up with an even smaller number of full-outbounds relaying transactions (8 - 4 - n
where n
represents the number of networks we support).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree - I think that there are colliding goals, my thinking is as follows:
- The
m_protect
protection attempts to prevent pathological situations where we wouldn't have enough peers with the honest best chain (that shouldn't happen normally even without the protection, but would be very serious if they did) - The network-specific protection aims to be connected to all networks we support, to help the interconnection, and also to reduce the risk of eclipse attacks. Ideally, that means relaying both blocks and transactions from one network to another.
- The added code attempts to make sure we have as many full-outbound peers actually participating in tx relay as possible, mostly for selfish reasons (e.g. privacy, fee estimation)
These goals conflict. I'm pretty confident that 3) should take precedence over 2), because the network-specific protection isn't really about this particular peer, we just want any connection to that network, and would be happy to exchange a current peer that doesn't support tx-relay with a tx-relaying peer from that same network.
I'm kind of torn between 1) and 2). It would be easy to exempt non-tx-relaying m_protect
peers from disconnection, but that would mean that we could end up with up to 4 of these peers permanently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easy to exempt non-tx-relaying m_protect peers from disconnection, but that would mean that we could end up with up to 4 of these peers permanently.
commit message where m_protect
was introduced says "we pick 4 of our outbound peers and do not subject them to this logic, to be more conservative. We don't wish to permit temporary network issues (or an attacker) to excessively disrupt network topology."
wondering how someone can disrupt network topology with this and if it's still relevant.
blocksonly_peer.sync_with_ping() | ||
blocksonly_peer.peer_disconnect() | ||
|
||
self.log.info("Check that we evict full-relay outbound connections to blocksonly peers at the full outbound limit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could expand this test by adding some irrelevant peers (e.g. ADDRFETCH), but no big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it, but it turned out to be annoying because in the test we simulate being at the full outbound limit by artificially running with a lower -maxconnections
. If we now also add an unrelated peer like ADDRFETCH , it would make the test fail because we don't have enough slots in semOutbound
.
// marked for disconnection | ||
if (!pnode->IsFullOutboundConn() || pnode->fDisconnect) return; | ||
// Only consider outbound-full-relay peers | ||
if (!pnode->IsFullOutboundConn()) return; | ||
CNodeState *state = State(pnode->GetId()); | ||
if (state == nullptr) return; // shouldn't be possible, but just in case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might update this too if (!Assume(state)) return
and drop the comment if you feel like it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer to not do it here, for my taste it's not related enough to the core of this PR.
Rename ForEachNode to ForEachFullyConnectedNode because it has an inbuilt check that only executes the passed function for fully connected nodes (i.e. if fSuccessfullyConnected is set and fDisconnect isn't). This means callers don't need to need to worry about that, so remove those duplicate checks and related comments.
ad48667
to
8f07458
Compare
Updated to address feedback, I would love to hear more opinions on the protection issue discussed in the threat following #28538 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 8f07458. useful to fill outbound slots with peers which we can relay transactions with and i liked the approach in this PR to swap away blocks-only peers in full relay outbound slots only after the threshold number of max full relay outbound connections is reached.
# Have one blocksonly peer that is not being disconnected | ||
blocksonly_peer1 = self.nodes[0].add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, txrelay=False) | ||
blocksonly_peer2 = self.nodes[0].add_outbound_p2p_connection(P2PInterface(), p2p_idx=2, txrelay=False) | ||
self.nodes[0].mockscheduler(46) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
68769ff: nit if you retouch.
self.nodes[0].mockscheduler(46) | |
with self.nodes[0].assert_debug_log(expected_msgs=["disconnecting full outbound peer not participating in tx relay: peer=1"]): | |
self.nodes[0].mockscheduler(46) | |
blocksonly_peer2.wait_for_disconnect() |
// evict the newest of them (highest Node Id) so we can fill the slot with a tx-relaying outbound peer | ||
// This is not done if we are in -blocksonly mode ourselves or still in IBD, | ||
// because then we don't care if our peer supports tx relay. | ||
else if (full_outbound_delta == 0 && !m_opts.ignore_incoming_txs && !m_chainman.IsInitialBlockDownload()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easy to exempt non-tx-relaying m_protect peers from disconnection, but that would mean that we could end up with up to 4 of these peers permanently.
commit message where m_protect
was introduced says "we pick 4 of our outbound peers and do not subject them to this logic, to be more conservative. We don't wish to permit temporary network issues (or an attacker) to excessively disrupt network topology."
wondering how someone can disrupt network topology with this and if it's still relevant.
Thanks for the review! We had an offline conversation about this today (FYI @sdaftuar @sipa @sr-gi). Some takeaways:
I think I'll look into reworking the PR and will put it in draft until that has happened. |
🤔 There hasn't been much activity lately and the CI seems to be failing. If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in. |
As described in the issues #16418 and #28371, it's a possibility that we could end up having an insufficient number of outbound peers relaying our transactions. Having fewer outbound peers support tx-relay could also have implications on privacy and fee estimation.
While #28488 is suggesting meaures based on comparing fee filters / mempool sizes, there is also the simpler issue of peers that tell us they don't want transactions, which is determined by the
fRelay
field in theversion
msg. If such a peer runs bitcoin core, that usually means that it's running in-blocksonly
mode and accepting inbound connections. The status quo is that we will not avoid these peers and just not send them any transactions.This PR proposes not to waste any of our 8 full-relay outbound slots on these peers (unless we are in
-blocksonly
mode ourselves). Using them as outbound peers for block-relay-only connections is fine though and not impacted.As was suggested by ajtowns below, we don't disconnect during version processing, but later during the regularly scheduled
EvictExtraOutboundPeers()
task, and only if we are at the maximum of full-outbound peers.If reviewers think that this proposal is too aggressive, an alternative solution (with a little more complexity) would be to restrict the maximum number of outbound connections to
-blocksonly
peers to 1 or 2. Although I currently think that it's ok to be selective with outbound connections, so I didn't implement that right away.