Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addrman: test-before-evict bugfix and improvements for block-relay-only peers #20187

Merged
merged 4 commits into from
Nov 3, 2020

Conversation

sdaftuar
Copy link
Member

@sdaftuar sdaftuar commented Oct 19, 2020

This PR does two things:

  • Block-relay-only interaction with addrman.

    • Calling CAddrMan::Connected() on an address that was a block-relay-only peer causes the time we report in addr messages containing that peer to be updated; particularly now that we use anchor connections with a our block-relay-only peers, this risks leaking information about those peers. So, stop this.
    • Avoiding calling CAddrMan::Good() on block-relay-only peer addresses causes the addrman logic around maintaining the new and tried table to be less good, and in particular makes it so that block-relay-only peer addresses are more likely to be evicted from the addrman (for no good reason I can think of). So, mark those addresses as good when we connect.
  • Fix test-before-evict bug. There's a bug where if we get a collision in the tried table with an existing address that is one of our current peers, and the connection is long-lived enough, then SelectTriedCollisions() might return that existing peer address to us as a test-before-evict connection candidate. However, our logic for new outbound connections would later prevent us from actually making a connection; the result would be that when we get a collision with a long-lived current peer, that peer's address is likely to get evicted from the tried table. Fix this by checking to see if a test-before-evict candidate is a peer we're currently connected to, and if so, mark it as Good().

@sdaftuar
Copy link
Member Author

I'm still trying to verify that the bug being fixed relating to test-before-evict is in fact a bug (and I'm not just misunderstanding the logic); but if my understanding is correct I think we should fix all these issues in the next release.

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 19, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@naumenkogs
Copy link
Member

Concept ACK.

The former issues are worth addressing.
Seems like the latter bug is real, even though the chance of hitting it is rather low.

@sdaftuar
Copy link
Member Author

I was able to verify the test-before-evict bug by modifying the tried table to be very small, making collisions with existing peers likely. The patch here appears to fix it as well.

src/net.cpp Outdated Show resolved Hide resolved
src/net_processing.cpp Show resolved Hide resolved
src/net.h Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net_processing.cpp Outdated Show resolved Hide resolved
src/net.cpp Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Show resolved Hide resolved
@sdaftuar sdaftuar force-pushed the 2020-10-addrman-block-relay branch 2 times, most recently from 266d22a to e921c5d Compare October 26, 2020 16:51
Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

An alternative approach could be to just periodically refresh the info.nLastSuccess for all active connections. If I'm understanding the addrman code correctly, as long as we're refreshing that value every 4 hours, we'll never evict that peer.

Adding a StillGood() (with a better name) method to CAddrMan which simply updates nLastSuccess and calling it either globally or per-peer every three hours would avoid adding more logic to the already-too-complicated ThreadOpenConnections()

src/net_processing.cpp Outdated Show resolved Hide resolved
src/net_processing.cpp Outdated Show resolved Hide resolved
@sdaftuar
Copy link
Member Author

An alternative approach could be to just periodically refresh the info.nLastSuccess for all active connections. If I'm understanding the addrman code correctly, as long as we're refreshing that value every 4 hours, we'll never evict that peer.

Adding a StillGood() (with a better name) method to CAddrMan which simply updates nLastSuccess and calling it either globally or per-peer every three hours would avoid adding more logic to the already-too-complicated ThreadOpenConnections()

I think the best thing would be if addrman just detected the collision with an existing peer immediately, and avoided the whole tried-collision logic itself. If we had a good way to do that (by changing the interface between addrman and connman) then I think that could make sense.

Short of addrman being smart enough to avoid this problem, having net.cpp detect this condition and fix it directly seems next simplest to me; relying on Good() or a StillGood() being called at the right times to avoid this problem as a consequence of how and when those variables get updated just seems unnecessarily complex...? The next time someone wonders if it's possible for an existing peer to be evicted from the tried table, I think it'll be much easier to reason about my proposed change than the alternative you're suggesting.

@sdaftuar sdaftuar force-pushed the 2020-10-addrman-block-relay branch 2 times, most recently from df1015d to 0245de9 Compare October 26, 2020 18:30
@jnewbery
Copy link
Contributor

relying on Good() or a StillGood() being called at the right times to avoid this problem as a consequence of how and when those variables get updated just seems unnecessarily complex...? The next time someone wonders if it's possible for an existing peer to be evicted from the tried table, I think it'll be much easier to reason about my proposed change than the alternative you're suggesting.

I disagree, but perhaps it's just a matter of taste. To me, making addrman always aware of existing connections and keeping nLastSuccess fresh by periodic notifications from net_processing or net seems much cleaner than only finding out at the point that we attempt to reconnect.

@sdaftuar
Copy link
Member Author

sdaftuar commented Oct 26, 2020

I disagree, but perhaps it's just a matter of taste. To me, making addrman always aware of existing connections and keeping nLastSuccess fresh by periodic notifications from net_processing or net seems much cleaner than only finding out at the point that we attempt to reconnect.

We could add an interface to addrman so that it explicitly keeps track of currently connected peers, by notifying it when we successfully make new connections and again when a peer disconnects, and then use that to prevent tried collisions with current peers from ever making it into the tried-collision list for later resolution.

@jnewbery
Copy link
Contributor

We could add an interface to addrman so that it explicitly keeps track of currently connected peers, by notifying it when we successfully make new connections and again when a peer disconnects, and then use that to prevent tried collisions with current peers from ever making it into the tried-collision list for later resolution.

Yes, that seems like another reasonable approach.

To be clear, I'm not NACKing the approach in this PR. I wanted to offer an alternative suggestion, but what you're doing here seems fine, and this actually cleans up and clarifies some of the code.

Thanks for being so responsive to review. I plan to do another pass of the latest changes tomorrow.

@@ -837,7 +837,8 @@ void PeerManager::ReattemptInitialBroadcast(CScheduler& scheduler) const
scheduler.scheduleFromNow([&] { ReattemptInitialBroadcast(scheduler); }, delta);
}

void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
void PeerManager::FinalizeNode(const CNode& node, bool& fUpdateConnectionTime) {
NodeId nodeid = node.GetId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you have to retouch the branch, you could make this const.

src/net.cpp Show resolved Hide resolved
@jnewbery
Copy link
Contributor

If you squashed the last commit (Eliminate unnecessary parameter from FinalizeNode()) with the first commit (Avoid calling CAddrMan::Connected() on block-relay-only peer addresses), or even just moved it to be the first commit, you'd avoid having to touch other lines of code (e.g. in denialofservice_tests.cpp) multiple times.

@jnewbery
Copy link
Contributor

utACK 430c8b6

I think moving the last commit to the beginning would make this easier for other reviewers. Very happy to reACK if you decide to do that while you just have my ACK.

sdaftuar and others added 4 commits October 27, 2020 11:14
Connected() updates the time we serve in addr messages, so avoid leaking
block-relay-only peer connections by avoiding these calls.
Being able to invoke Good() is important for address management (new vs tried
table, tried table eviction via test-before-evict). We mitigate potential
information leaks by not calling Connected() on these peer addresses.
Outbound peer logic prevents connecting to addresses that we're already
connected to, so prevent inadvertent eviction of current peers via
test-before-evict by checking this condition and marking current peer's
addresses as Good().

Co-authored-by: John Newbery <john@johnnewbery.com>
@sdaftuar
Copy link
Member Author

Thanks for the review @jnewbery -- I went ahead and squashed that last commit into the first one, so that the FinalizeNode interface changes are all together now.

@jnewbery
Copy link
Contributor

utACK 16d9bfc

Copy link
Contributor

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK 16d9bfc

addrman.Good(addr);
// Select a new table address for our feeler instead.
addr = addrman.Select(true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented this here for my own understanding while parsing the changed logic in 16d9bfc.

-                }
+                } // Otherwise, keep the selected addr if we have a tried
+                  // collision we're not already connected to.

@sipa
Copy link
Member

sipa commented Oct 28, 2020

I haven't looked at the code here, but just tried to verify the test-before-evict bug you mention. I believe it is indeed there, and will result in the conncted node being forgotten. Consider the following scenario:

  • Addrman has a tried entry A and a new entry B that would collide with A if moved to the tried table. There is a (long term) ongoining outbound connection to A currently.
  • An outbound connection to B succeeds (VERSION is received) as a non-block-only connection, calling Good(B). Due to the collision, B is not marked tried, but instead recorded in the m_tried_collisions set.
  • ConnectionType::FEELER connections during this state have a high chance of returning A from SelectTriedCollision() now. However, OpenNetworkConnection on those will fail, as an existing connection to A exists. No addrman state changes result from this (neither nLastSuccess or nLastTry are updated).
  • The first time ResolveCollisions is called at least 4 hours after the connection to A was opened, and after successfully connecting to B, it will see all connections to A as over ADDRMAN_REPLACEMENT_HOURS old. Assuming the last connection to B at this point is also at least 40 minutes ago (which seems likely), B will take A's place in the tried table.

Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 16d9bfc


// SelectTriedCollision returns an invalid address if it is empty.
if (!fFeeler || !addr.IsValid()) {
addr = addrman.Select(fFeeler);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very happy to see this rewritten. The logic where we always called SelectTriedCollision and then throw out the result if !fFeeler was very confusing (and inefficient...).

@amitiuttarwar
Copy link
Contributor

amitiuttarwar commented Oct 29, 2020

approach ACK - I've read through the code, understand the proposed changes, and think they make sense.

the update to Connected looks good & I agree that the test-before-evict logic was bugged & this patch fixes it. I also appreciate the refactoring to ThreadOpenConnections, I've looked at that code before but wasn't able to fully retain the pieces to understand the big picture, but with this patch its both fixed and makes sense to me now.

only thing holding me back from a code review ACK is wanting to better understand the full implications of Good to verify that there are no other problems by calling it for block-relay-only connections in both places. I think we are good 😏 but I'm going to continue digging to fully convince myself.

also- the travis failure is bizarre ! https://travis-ci.org/github/bitcoin/bitcoin/jobs/739321630
no error, no logs, nothing. doesn't even look like a timeout!

update: its been restarted

Copy link
Member

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 16d9bfc.

AFAIU this privacy fix it's a worthy one as it masks the addr of our hidden peers among the wider set of all our known addrs. After this PR, an adversary observing our addr-relay traffic shouldn't be able to dissociate them.

@@ -854,7 +855,8 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
if (state->fSyncStarted)
nSyncStarted--;

if (misbehavior == 0 && state->fCurrentlyConnected) {
if (misbehavior == 0 && state->fCurrentlyConnected && !node.IsBlockOnlyConn()) {
// Note: we avoid changing visible addrman state for block-relay-only peers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention the reference to anchors you made in PR description. Otherwise, I don't think this change is that much valuable if we had only non-persistent block-relay peers. I don't see how learning a past connection would significantly help an attacker to disrupt them further in the future, CAddrInfo.nTime isn't considered by AddrMan::Select ?

Note, maybe we should rename AddrMan::Connected to HasBeenConnected, current comment "Mark an entry as currently-connected to" is confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention the reference to anchors you made in PR description. Otherwise, I don't think this change is that much valuable if we had only non-persistent block-relay peers. I don't see how learning a past connection would significantly help an attacker to disrupt them further in the future, CAddrInfo.nTime isn't considered by AddrMan::Select ?

Perhaps it doesn't, but I think if we left addrman truly unchanged by block-relay-only peers then that would definitely mean that we can't leak topology information via addrman. I reversed my view on whether we should call Good() for these peers after realizing we didn't have machinery in place to prevent addrman eviction and it seemed like that could happen too easily (and actually more easily for long-lived block-relay-only peer connections, because their timestamps could never update), but if we did come up with an auxiliary data structure for remembering block-relay-only peers (perhaps, just by making the anchor connections data structure more robust to temporarily losing connectivity) then I might argue again that we should switch the way we do things in the future to try to make these connections not interact with addrman whatsoever. You could imagine for instance having a separate, smaller "block-relay-addrman" that samples from the main addrman whenever it needs more addresses but otherwise has its own eviction rules and never shares its state through any externally probable interface.

So philosophically I still think no interaction should be our default way of thinking, and an exception to that is being proposed in this PR because I don't have a better alternative right now. But I would hate for someone to later argue that it's somehow safe to make these addrman calls because we don't know of an information leak -- the point is to isolate this part of the p2p protocol so there can't be an information leak, even one we haven't thought of. (Leaving this call to Connected() on these peer addresses in the first place was a big oversight in #15759, in my opinion.)

Copy link
Member

@ariard ariard Nov 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So philosophically I still think no interaction should be our default way of thinking, and an exception to that is being proposed in this PR because I don't have a better alternative right now. But I would hate for someone to later argue that it's somehow safe to make these addrman calls because we don't know of an information leak

Right, I concede fixing the information leak in itself is valuable even if it's not currently exploitable by an attacker to interfere with our non-persistent block-relay peer selection. This one might change in the future and turn this leak as a more exploitable, without the timestamp relation being recollected.

I agree it's a more temporary fix waiting to remove timestamps from addr messages as discussed during last p2p meeting. Going further, even assuming a separate, smaller "block-relay-addrman", I'm still concerned about the block-relay peers not participating in addr-relay compared to other peers, this fact which might observable in itself to break the hiddeness property of block-relay peers. E.g I was thinking about an adversary tweaking nServices to create a unique fingerprint and observe the absence of propagation on a presumed block-relay link to victim node. I guess isolating addr-relay on its own would make us all more confident, though it's a more complex, long-term change.

src/net.cpp Show resolved Hide resolved
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-Review ACK 16d9bfc.

I'm not entirely convinced that calling Good for blocks-only peers leaks strictly no information at all (e.g. it updates internal state such as nLastSuccess, which influence IsTerrible(), which influences GetAddr Response selection), but I couldn't think of a way to use that, especially after GetAddr caching (18991).

@@ -854,7 +855,8 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
if (state->fSyncStarted)
nSyncStarted--;

if (misbehavior == 0 && state->fCurrentlyConnected) {
if (misbehavior == 0 && state->fCurrentlyConnected && !node.IsBlockOnlyConn()) {
Copy link
Contributor

@mzumsande mzumsande Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line confused me into thinking that addrman::Connected() could also be called for inbound peers on disconnection - which would be not ideal, because then our peer might be leaking the information about our long-term blocks-only connections by updating nTime on their side of the connection.

However, if I read the code correctly, fCurrentlyConnected, despite its name, actually rather tracks whether it is an sucessfully connected outbound connection, so that this is not an issue (although keeping a flag for that does not seem necessary).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're exactly right. See #20187 (comment).

It'd be great to consolidate the pfrom.IsInboundConn() check here too. That could easily be done as a follow up, eliminating fCurrentlyConnected entirely (and using fSuccessfullyConnected to check that the version handshake is complete).

@sdaftuar
Copy link
Member Author

I'm not entirely convinced that calling Good for blocks-only peers leaks strictly no information at all (e.g. it updates internal state such as nLastSuccess, which influence IsTerrible(), which influences GetAddr Response selection), but I couldn't think of a way to use that, especially after GetAddr caching (18991).

@mzumsande I agree that it's unlikely we're leaking no information by calling Good(); my previous approach in #15759 was to try to avoid calling any addrman functions for block-relay-only peers, but my conclusion is that addrman-eviction is a worse outcome than the effects of calling Good() (which I estimate as reducing the anonymity set of our block-relay-only peers to the set of addresses in our tried table -- hopefully it's not meaningfully worse than that).

However if we were to structure the addrman differently in the future so that eviction isn't a concern (as I mention here #20187 (comment)), I might suggest that we reverse course again and avoid any interaction between block-relay-only peers and addrman at all.

Copy link
Contributor

@amitiuttarwar amitiuttarwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code review ACK 16d9bfc

  • agree with the prior conversation that we can't have 100% guarantee we're not leaking any information, but that not evicting the peers is more important. & I spent some time and can't figure out any concrete way to exploit any information leak, so I'm comfortable with these changes

thoughts for future improvements:

  • also saw some mentions along these lines, but I think the code around fUpdateConnectionTime, fCurrentlyConnected, and comment or function name of CAddrMan::Connected could be simplified and made easier to parse. Connected is only invoked when stopping or disconnecting the nodes, but the comment says "Mark an entry as currently-connected-to." fUpdateConnectionTime is used as an in/out param when FinalizeNode could just directly call Connected. but to reiterate, these are all tangential to this PR. just thoughts that came up as I was trying to understand relevant code paths.

src/net.cpp Show resolved Hide resolved
@fanquake fanquake merged commit 8387f83 into bitcoin:master Nov 3, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 3, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 3, 2020
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Dec 21, 2021
Summary:
PR description:
> This PR does two things:
>
>  - Block-relay-only interaction with addrman.
>
>      - Calling `CAddrMan::Connected()` on an address that was a block-relay-only peer causes the time we report in `addr` messages containing that peer to be updated; particularly now that we use anchor connections with a our block-relay-only peers, this risks leaking information about those peers. So, stop this.
>      - Avoiding calling `CAddrMan::Good()` on block-relay-only peer addresses causes the addrman logic around maintaining the new and tried table to be less good, and in particular makes it so that block-relay-only peer addresses are more likely to be evicted from the addrman (for no good reason I can think of). So, mark those addresses as good when we connect.
>
>  - Fix test-before-evict bug. There's a bug where if we get a collision in the tried table with an existing address that is one of our current peers, and the connection is long-lived enough, then `SelectTriedCollisions()` might return that existing peer address to us as a test-before-evict connection candidate. However, our logic for new outbound connections would later prevent us from actually making a connection; the result would be that when we get a collision with a long-lived current peer, that peer's address is likely to get evicted from the tried table. Fix this by checking to see if a test-before-evict candidate is a peer we're currently connected to, and if so, mark it as `Good()`.

Commit description:
> `Connected()` updates the time we serve in addr messages, so avoid leaking block-relay-only peer connections by avoiding these calls.

This is a backport of [[bitcoin/bitcoin#20187 | core#20187]] [1/4]
bitcoin/bitcoin@daf5553

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D10694
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Dec 21, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#20187 | core#20187]] [2/4]
bitcoin/bitcoin@e8b215a

Depends on D10694

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D10695
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Dec 21, 2021
Summary:
Being able to invoke `Good()` is important for address management (new vs tried
table, tried table eviction via test-before-evict). We mitigate potential
information leaks by not calling `Connected()` on these peer addresses.

This is a backport of [[bitcoin/bitcoin#20187 | core#20187]] [3/4]
bitcoin/bitcoin@4fe338a

Depends on D10695

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D10696
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Dec 21, 2021
Summary:
Outbound peer logic prevents connecting to addresses that we're already
connected to, so prevent inadvertent eviction of current peers via
test-before-evict by checking this condition and marking current peer's
addresses as `Good()`.

Co-authored-by: John Newbery <john@johnnewbery.com>

This is a backport of [[bitcoin/bitcoin#20187 | core#20187]] [4/4]
bitcoin/bitcoin@16d9bfc

Depends on D10696

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D10697
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet