-
Notifications
You must be signed in to change notification settings - Fork 35.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addrman: test-before-evict bugfix and improvements for block-relay-only peers #20187
Conversation
I'm still trying to verify that the bug being fixed relating to test-before-evict is in fact a bug (and I'm not just misunderstanding the logic); but if my understanding is correct I think we should fix all these issues in the next release. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK. The former issues are worth addressing. |
I was able to verify the test-before-evict bug by modifying the tried table to be very small, making collisions with existing peers likely. The patch here appears to fix it as well. |
266d22a
to
e921c5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
An alternative approach could be to just periodically refresh the info.nLastSuccess
for all active connections. If I'm understanding the addrman code correctly, as long as we're refreshing that value every 4 hours, we'll never evict that peer.
Adding a StillGood()
(with a better name) method to CAddrMan which simply updates nLastSuccess
and calling it either globally or per-peer every three hours would avoid adding more logic to the already-too-complicated ThreadOpenConnections()
I think the best thing would be if addrman just detected the collision with an existing peer immediately, and avoided the whole tried-collision logic itself. If we had a good way to do that (by changing the interface between addrman and connman) then I think that could make sense. Short of addrman being smart enough to avoid this problem, having |
df1015d
to
0245de9
Compare
I disagree, but perhaps it's just a matter of taste. To me, making addrman always aware of existing connections and keeping |
We could add an interface to addrman so that it explicitly keeps track of currently connected peers, by notifying it when we successfully make new connections and again when a peer disconnects, and then use that to prevent tried collisions with current peers from ever making it into the tried-collision list for later resolution. |
Yes, that seems like another reasonable approach. To be clear, I'm not NACKing the approach in this PR. I wanted to offer an alternative suggestion, but what you're doing here seems fine, and this actually cleans up and clarifies some of the code. Thanks for being so responsive to review. I plan to do another pass of the latest changes tomorrow. |
@@ -837,7 +837,8 @@ void PeerManager::ReattemptInitialBroadcast(CScheduler& scheduler) const | |||
scheduler.scheduleFromNow([&] { ReattemptInitialBroadcast(scheduler); }, delta); | |||
} | |||
|
|||
void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) { | |||
void PeerManager::FinalizeNode(const CNode& node, bool& fUpdateConnectionTime) { | |||
NodeId nodeid = node.GetId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if you have to retouch the branch, you could make this const.
If you squashed the last commit (Eliminate unnecessary parameter from FinalizeNode()) with the first commit (Avoid calling CAddrMan::Connected() on block-relay-only peer addresses), or even just moved it to be the first commit, you'd avoid having to touch other lines of code (e.g. in denialofservice_tests.cpp) multiple times. |
utACK 430c8b6 I think moving the last commit to the beginning would make this easier for other reviewers. Very happy to reACK if you decide to do that while you just have my ACK. |
Connected() updates the time we serve in addr messages, so avoid leaking block-relay-only peer connections by avoiding these calls.
Being able to invoke Good() is important for address management (new vs tried table, tried table eviction via test-before-evict). We mitigate potential information leaks by not calling Connected() on these peer addresses.
Outbound peer logic prevents connecting to addresses that we're already connected to, so prevent inadvertent eviction of current peers via test-before-evict by checking this condition and marking current peer's addresses as Good(). Co-authored-by: John Newbery <john@johnnewbery.com>
430c8b6
to
16d9bfc
Compare
Thanks for the review @jnewbery -- I went ahead and squashed that last commit into the first one, so that the |
utACK 16d9bfc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK 16d9bfc
addrman.Good(addr); | ||
// Select a new table address for our feeler instead. | ||
addr = addrman.Select(true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented this here for my own understanding while parsing the changed logic in 16d9bfc.
- }
+ } // Otherwise, keep the selected addr if we have a tried
+ // collision we're not already connected to.
I haven't looked at the code here, but just tried to verify the test-before-evict bug you mention. I believe it is indeed there, and will result in the conncted node being forgotten. Consider the following scenario:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 16d9bfc
|
||
// SelectTriedCollision returns an invalid address if it is empty. | ||
if (!fFeeler || !addr.IsValid()) { | ||
addr = addrman.Select(fFeeler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy to see this rewritten. The logic where we always called SelectTriedCollision
and then throw out the result if !fFeeler
was very confusing (and inefficient...).
approach ACK - I've read through the code, understand the proposed changes, and think they make sense. the update to only thing holding me back from a code review ACK is wanting to better understand the full implications of also- the travis failure is bizarre ! https://travis-ci.org/github/bitcoin/bitcoin/jobs/739321630 update: its been restarted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 16d9bfc.
AFAIU this privacy fix it's a worthy one as it masks the addr of our hidden peers among the wider set of all our known addrs. After this PR, an adversary observing our addr-relay traffic shouldn't be able to dissociate them.
@@ -854,7 +855,8 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) { | |||
if (state->fSyncStarted) | |||
nSyncStarted--; | |||
|
|||
if (misbehavior == 0 && state->fCurrentlyConnected) { | |||
if (misbehavior == 0 && state->fCurrentlyConnected && !node.IsBlockOnlyConn()) { | |||
// Note: we avoid changing visible addrman state for block-relay-only peers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mention the reference to anchors you made in PR description. Otherwise, I don't think this change is that much valuable if we had only non-persistent block-relay peers. I don't see how learning a past connection would significantly help an attacker to disrupt them further in the future, CAddrInfo.nTime
isn't considered by AddrMan::Select
?
Note, maybe we should rename AddrMan::Connected
to HasBeenConnected
, current comment "Mark an entry as currently-connected to" is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mention the reference to anchors you made in PR description. Otherwise, I don't think this change is that much valuable if we had only non-persistent block-relay peers. I don't see how learning a past connection would significantly help an attacker to disrupt them further in the future, CAddrInfo.nTime isn't considered by AddrMan::Select ?
Perhaps it doesn't, but I think if we left addrman truly unchanged by block-relay-only peers then that would definitely mean that we can't leak topology information via addrman. I reversed my view on whether we should call Good()
for these peers after realizing we didn't have machinery in place to prevent addrman eviction and it seemed like that could happen too easily (and actually more easily for long-lived block-relay-only peer connections, because their timestamps could never update), but if we did come up with an auxiliary data structure for remembering block-relay-only peers (perhaps, just by making the anchor connections data structure more robust to temporarily losing connectivity) then I might argue again that we should switch the way we do things in the future to try to make these connections not interact with addrman whatsoever. You could imagine for instance having a separate, smaller "block-relay-addrman" that samples from the main addrman whenever it needs more addresses but otherwise has its own eviction rules and never shares its state through any externally probable interface.
So philosophically I still think no interaction should be our default way of thinking, and an exception to that is being proposed in this PR because I don't have a better alternative right now. But I would hate for someone to later argue that it's somehow safe to make these addrman calls because we don't know of an information leak -- the point is to isolate this part of the p2p protocol so there can't be an information leak, even one we haven't thought of. (Leaving this call to Connected()
on these peer addresses in the first place was a big oversight in #15759, in my opinion.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So philosophically I still think no interaction should be our default way of thinking, and an exception to that is being proposed in this PR because I don't have a better alternative right now. But I would hate for someone to later argue that it's somehow safe to make these addrman calls because we don't know of an information leak
Right, I concede fixing the information leak in itself is valuable even if it's not currently exploitable by an attacker to interfere with our non-persistent block-relay peer selection. This one might change in the future and turn this leak as a more exploitable, without the timestamp relation being recollected.
I agree it's a more temporary fix waiting to remove timestamps from addr messages as discussed during last p2p meeting. Going further, even assuming a separate, smaller "block-relay-addrman", I'm still concerned about the block-relay peers not participating in addr-relay compared to other peers, this fact which might observable in itself to break the hiddeness property of block-relay peers. E.g I was thinking about an adversary tweaking nServices to create a unique fingerprint and observe the absence of propagation on a presumed block-relay link to victim node. I guess isolating addr-relay on its own would make us all more confident, though it's a more complex, long-term change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-Review ACK 16d9bfc.
I'm not entirely convinced that calling Good
for blocks-only peers leaks strictly no information at all (e.g. it updates internal state such as nLastSuccess
, which influence IsTerrible()
, which influences GetAddr
Response selection), but I couldn't think of a way to use that, especially after GetAddr caching (18991).
@@ -854,7 +855,8 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) { | |||
if (state->fSyncStarted) | |||
nSyncStarted--; | |||
|
|||
if (misbehavior == 0 && state->fCurrentlyConnected) { | |||
if (misbehavior == 0 && state->fCurrentlyConnected && !node.IsBlockOnlyConn()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line confused me into thinking that addrman::Connected()
could also be called for inbound peers on disconnection - which would be not ideal, because then our peer might be leaking the information about our long-term blocks-only connections by updating nTime
on their side of the connection.
However, if I read the code correctly, fCurrentlyConnected
, despite its name, actually rather tracks whether it is an sucessfully connected outbound connection, so that this is not an issue (although keeping a flag for that does not seem necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're exactly right. See #20187 (comment).
It'd be great to consolidate the pfrom.IsInboundConn()
check here too. That could easily be done as a follow up, eliminating fCurrentlyConnected
entirely (and using fSuccessfullyConnected
to check that the version handshake is complete).
@mzumsande I agree that it's unlikely we're leaking no information by calling However if we were to structure the addrman differently in the future so that eviction isn't a concern (as I mention here #20187 (comment)), I might suggest that we reverse course again and avoid any interaction between block-relay-only peers and addrman at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code review ACK 16d9bfc
- agree with the prior conversation that we can't have 100% guarantee we're not leaking any information, but that not evicting the peers is more important. & I spent some time and can't figure out any concrete way to exploit any information leak, so I'm comfortable with these changes
thoughts for future improvements:
- also saw some mentions along these lines, but I think the code around
fUpdateConnectionTime
,fCurrentlyConnected
, and comment or function name ofCAddrMan::Connected
could be simplified and made easier to parse.Connected
is only invoked when stopping or disconnecting the nodes, but the comment says "Mark an entry as currently-connected-to."fUpdateConnectionTime
is used as an in/out param whenFinalizeNode
could just directly callConnected
. but to reiterate, these are all tangential to this PR. just thoughts that came up as I was trying to understand relevant code paths.
Summary: PR description: > This PR does two things: > > - Block-relay-only interaction with addrman. > > - Calling `CAddrMan::Connected()` on an address that was a block-relay-only peer causes the time we report in `addr` messages containing that peer to be updated; particularly now that we use anchor connections with a our block-relay-only peers, this risks leaking information about those peers. So, stop this. > - Avoiding calling `CAddrMan::Good()` on block-relay-only peer addresses causes the addrman logic around maintaining the new and tried table to be less good, and in particular makes it so that block-relay-only peer addresses are more likely to be evicted from the addrman (for no good reason I can think of). So, mark those addresses as good when we connect. > > - Fix test-before-evict bug. There's a bug where if we get a collision in the tried table with an existing address that is one of our current peers, and the connection is long-lived enough, then `SelectTriedCollisions()` might return that existing peer address to us as a test-before-evict connection candidate. However, our logic for new outbound connections would later prevent us from actually making a connection; the result would be that when we get a collision with a long-lived current peer, that peer's address is likely to get evicted from the tried table. Fix this by checking to see if a test-before-evict candidate is a peer we're currently connected to, and if so, mark it as `Good()`. Commit description: > `Connected()` updates the time we serve in addr messages, so avoid leaking block-relay-only peer connections by avoiding these calls. This is a backport of [[bitcoin/bitcoin#20187 | core#20187]] [1/4] bitcoin/bitcoin@daf5553 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D10694
Summary: This is a backport of [[bitcoin/bitcoin#20187 | core#20187]] [2/4] bitcoin/bitcoin@e8b215a Depends on D10694 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D10695
Summary: Being able to invoke `Good()` is important for address management (new vs tried table, tried table eviction via test-before-evict). We mitigate potential information leaks by not calling `Connected()` on these peer addresses. This is a backport of [[bitcoin/bitcoin#20187 | core#20187]] [3/4] bitcoin/bitcoin@4fe338a Depends on D10695 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D10696
Summary: Outbound peer logic prevents connecting to addresses that we're already connected to, so prevent inadvertent eviction of current peers via test-before-evict by checking this condition and marking current peer's addresses as `Good()`. Co-authored-by: John Newbery <john@johnnewbery.com> This is a backport of [[bitcoin/bitcoin#20187 | core#20187]] [4/4] bitcoin/bitcoin@16d9bfc Depends on D10696 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D10697
This PR does two things:
Block-relay-only interaction with addrman.
CAddrMan::Connected()
on an address that was a block-relay-only peer causes the time we report inaddr
messages containing that peer to be updated; particularly now that we use anchor connections with a our block-relay-only peers, this risks leaking information about those peers. So, stop this.CAddrMan::Good()
on block-relay-only peer addresses causes the addrman logic around maintaining the new and tried table to be less good, and in particular makes it so that block-relay-only peer addresses are more likely to be evicted from the addrman (for no good reason I can think of). So, mark those addresses as good when we connect.Fix test-before-evict bug. There's a bug where if we get a collision in the tried table with an existing address that is one of our current peers, and the connection is long-lived enough, then
SelectTriedCollisions()
might return that existing peer address to us as a test-before-evict connection candidate. However, our logic for new outbound connections would later prevent us from actually making a connection; the result would be that when we get a collision with a long-lived current peer, that peer's address is likely to get evicted from the tried table. Fix this by checking to see if a test-before-evict candidate is a peer we're currently connected to, and if so, mark it asGood()
.