Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Add new permission forceinbound to evict a random unprotected connection if all slots are otherwise full #27600

Closed
wants to merge 7 commits into from

Conversation

pinheadmz
Copy link
Member

@pinheadmz pinheadmz commented May 8, 2023

Closes #8798

Use case: I run a full node that accepts inbound connections and have a whitebind setting so my personal light client can always connect, even when maxconnections (and particularly all inbound slots) is already full.

Currently when connections are full, if we receive an inbound peer request, we look for a current connection to evict so the new peer can have a slot. To find an evict-able peer we go through all our peers and "protect" multiple categories of peers, then we evict the "worst" peer that is left unprotected. If there are no peers left to evict, the inbound connection is denied.

With this PR, if the inbound connection has forceinbound permission we start the eviction process by first protecting all noban and outbound connections, then selecting one of the remaining peers (if any) at random. Then we loop through all our current connections, removing protected peers from the evict-able list. If we end up protecting all our remaining connections, the randomly chosen peer is evicted.

forceinbound implies noban permission.

All outbound and noban connections remain protected from eviction.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 8, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK mzumsande
Stale ACK LarryRuane, stickies-v, willcl-ark

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28463 (p2p: Increase inbound capacity for block-relay only connections by mzumsande)
  • #27581 (net: Continuous ASMap health check by fjahr)
  • #27114 (p2p: Allow whitelisting outgoing connections by brunoerg)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@pinheadmz pinheadmz changed the title Make peer eviction slightly more aggresive to make room for whitelisted inbound connections Allow inbound whitebind connections to more aggresivey evict peers when slots are full May 8, 2023
@DrahtBot DrahtBot changed the title Allow inbound whitebind connections to more aggresivey evict peers when slots are full net: Allow inbound whitebind connections to more aggresivey evict peers when slots are full May 9, 2023
@DrahtBot DrahtBot added the P2P label May 9, 2023
Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, makes sense to prioritize whitelisted peers. Will review more in-depth soon.

src/node/eviction.cpp Outdated Show resolved Hide resolved
@pinheadmz pinheadmz changed the title net: Allow inbound whitebind connections to more aggresivey evict peers when slots are full net: Allow inbound whitebind connections to more aggressively evict peers when slots are full May 12, 2023
src/node/eviction.cpp Outdated Show resolved Hide resolved
src/node/eviction.cpp Outdated Show resolved Hide resolved
src/node/eviction.cpp Outdated Show resolved Hide resolved
src/net.cpp Show resolved Hide resolved
src/node/eviction.cpp Outdated Show resolved Hide resolved
src/net.h Show resolved Hide resolved
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

As I suggested in the review club, an alternative, more simple approach would be to just pick a random peer after removing NoBan and outbound connections when in force-mode. Then, if at the end of the usual eviction algorithm, we don't have a evict-able peer, we would evict the random one instead.
This would make the code simpler (no need to change EraseLastKElements or keep track of last), and I don't really see a major downside:

  • The EraseLastKElements eviction criteria don't really seem to be sorted by priority anyway right now (because order doesn't matter so far), so evicting a peer that would've been protected by later calls doesn't seem better than evicting one that would've been protected by earlier ones.
  • Whitebind No-Ban peers are more trusted to begin with, but if they maliciously wanted to take over all inbound slots by repeatedly connecting to us on the whitebind address, they can do that anyway, whether the eviction is random or whether the current approach of the PR is used. So it's not clear to me what the extra benefit of the current approach over random eviction is.

@pinheadmz
Copy link
Member Author

pinheadmz commented May 17, 2023

@stickies-v @mzumsande thanks for your review and feedback, I refactored the PR so we evict a random peer when forced if all other peers are protected. It is MUCH simpler ;-)

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light approach ACK c826187

I think this random selection approach suggested by mzumsande is a more elegant approach. "Light" in approach ACK because I need to build more confidence that this doesn't introduce new attack vectors.

src/node/eviction.cpp Outdated Show resolved Hide resolved
src/node/eviction.cpp Outdated Show resolved Hide resolved
src/node/eviction.cpp Outdated Show resolved Hide resolved
src/node/eviction.cpp Outdated Show resolved Hide resolved
src/node/eviction.h Show resolved Hide resolved
src/net.h Outdated Show resolved Hide resolved
src/net.h Outdated Show resolved Hide resolved
@pinheadmz
Copy link
Member Author

Thanks @stickies-v nits addressed! 🙏

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 96b513f, but I think the first commit should be removed from this PR, I think it's an accident?

I think the code is good and suits the intent of the PR well. My only concern is about the potential footgun introduced, even if it is relatively mild. Anyone running bitcoind with whitebind=0.0.0.0:<port> will now be vulnerable to having all their inbound non-whitelisted slots taken over by an attacker that has figured out what the whitelisted port is. It doesn't seem like a huge concern, given that:

  • it only affects inbound connections, which we already assume to be more susceptible to these kinds of attacks, and
  • it does require the user to manually whitebind to 0.0.0.0

but I just wanted to flag it here anyway in case it is actually more serious than I see it to be.

Also, this seems like behaviour change that would benefit from a mention in the release notes?

Comment on lines 188 to 189
std::optional<NodeId> force_evict;
if (vEvictionCandidates.size() > 0 && force) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if vEvictionCandidates is empty, there's no point (I think, couldn't see any side effects in EraseLastKElements or ProtectEvictionCandidatesByRatio) executing the next lines so might as well return early? Unless you think this makes the code less clear?

Suggested change
std::optional<NodeId> force_evict;
if (vEvictionCandidates.size() > 0 && force) {
if (vEvictionCandidates.empty()) return std::nullopt;
std::optional<NodeId> force_evict;
if (force) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, although "if empty, return now" logic would make sense after each step in this function (why bother calling EraseLastKElements() with an empty array?) even without this PR. One thing we could do is add that logic to the beginning of EraseLastKElements() itself, but I dunno how much time is really wasted in there (sort, min, erase ... ?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had similar considerations to e.g. wrap the EraseLastKElements call in a lambda fn, but it's probably not worth the LoC change.

I think this case is slightly different in that it better highlights that if there are no inbound NoBan peers, the result is always std::nullopt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i'll add it. one line i think in this case will improve readability

@@ -122,6 +122,36 @@ def run_test(self):
self.log.debug("{} protected peers: {}".format(len(protected_peers), protected_peers))
assert evicted_peers[0] not in protected_peers

self.log.info("Test that whitebind inbounds get extra eviction power")
# Only allow 1 inbound connection, but set whitebind
self.restart_node(0, extra_args=['-maxconnections=12', '-whitebind=0.0.0.0:30201'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned in the review club, whitelisting just based on ports enables an attacker that discovered which port(s) are whitelisted to take over all (non-whitelisted) inbound connections slots.

Therefore, I think we have to be careful not to accidentally lead people to this footgun. Binding to 127.0.0.1 seems much more prudent to not set a bad example.

Suggested change
self.restart_node(0, extra_args=['-maxconnections=12', '-whitebind=0.0.0.0:30201'])
self.restart_node(0, extra_args=['-maxconnections=12', '-whitebind=127.0.0.1:30201'])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated test and also included warning in release note

pinheadmz added a commit to pinheadmz/bitcoin that referenced this pull request May 24, 2023
@pinheadmz pinheadmz requested a review from stickies-v May 24, 2023 19:14
permissions will now be more likely to connect even if `maxconnections` is
reached and the node's inbound slots are full. Because this is achieved by
more aggressively finding a current connection to evict, users should take
care only to set these permissions on local ports that attackers can not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "on local ports" is unnecessarily restrictive/scary. I think it's still completely fine to whitelist remote addresses, you mostly just want to avoid ranges, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzumsande what's your opinion on this? Is this PR a big enough change to warrant this kind of warning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure, I've never looked into these options much and don't know about best practices - I think that if you use -whitebind with non-local addresses, you'd at least have to make sure that that address is not self-advertised. I guess that this applies more to -whitebind than -whitelist, so that the preferred approach in case of a non-local connection would be to use -whitelist?
fyi @vasild, do you have an opinion on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, whitelist is harder to attack than whitebind because the attacker would have to spoof their origin IP repeatedly to fill up your inbounds. If a user whitebind's a port and an attacker figures out that port numnber, they can trivially evict all your other inbounds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what you say above - whitebind on publicly accessible address:port with this new permission sounds bad. Similar with whitelist - if a range is used.

This PR currently expands the semantic of the noban permission. This will affect existent setups that already use it. Would it make sense to introduce a new permisson, separate from noban? I mean - now if somebody is running -whitebind=noban@publicaddr:port then a bad actor could cause harm on master, but even more harm with this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting idea, so we could add NetPermissionFlags::NoBanForce or something. The danger will still be present for users of the permission, but existing users of NoBan wouldn't have to worry. And since NoBan is a default of whitebind it'll require more user attention anyway to use the more dangerous option.

In that case, the release note should still warn the user about using NoBanForce (or whatever we call it) but that warning can just be general, like, keep an eye on your netinfo if you do this. As opposed to just recommending only setting local addresses with whitebind

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasild

This PR currently expands the semantic of the noban permission

Could you please elaborate on this point? I thought that with this PR, noban nodes are still completely protected (they are removed from the eviction candidates list before the random node is chosen). But it's certain that I'm missing something. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LarryRuane you are correct but what we are changing (in the current state of this branch) is that noban nodes can now force disconnection of other peers. Since many users may already have noban set, maybe even for a large range of IPs, this branch would introduce a new vulnerability they may not be aware of. Because of that I think it does make more sense to specify a new permission so users can narrow the attack surface

@LarryRuane
Copy link
Contributor

Please consider editing the description slightly, if my understanding is correct.

With this PR, if the inbound connection is on our whitelist we start the eviction process by selecting a random unprotected peer.

At this point in the process, where we choose a random peer, we're considering all (inbound, not noban) peers, is that correct? They are all (other than outbound or noban) "unprotected" because we haven't protected any of them yet. So maybe this should say, "... selecting a random peer" or "... selecting a random inbound peer" -- because we may randomly choose a peer that ends up being protected, or not protected. That would make this PR easier for me to understand.

I do think the updated algorithm is very elegant (great suggestion, @mzumsande). It chooses a random peer, which may end up being erased from the eviction candidate list, or may not, but we don't care either way; we evict it if we need to, all the better if it actually did not get protected (but, again, we don't care).

By the way, I'm going to highlight this PR in the Optech newsletter next week, that's why I'm particularly interested in making sure I understand it.

Copy link
Contributor

@LarryRuane LarryRuane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK fa78fc5
except for possible attack vectors (for example, NoBanForce) that I'm not really qualified to comment on. I may try to contribute to that discussion after I've learned more. Nice PR!

src/node/eviction.cpp Outdated Show resolved Hide resolved
permissions will now be more likely to connect even if `maxconnections` is
reached and the node's inbound slots are full. Because this is achieved by
more aggressively finding a current connection to evict, users should take
care only to set these permissions on local ports that attackers can not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasild

This PR currently expands the semantic of the noban permission

Could you please elaborate on this point? I thought that with this PR, noban nodes are still completely protected (they are removed from the eviction candidates list before the random node is chosen). But it's certain that I'm missing something. Thanks.

@@ -204,7 +215,7 @@ void ProtectEvictionCandidatesByRatio(std::vector<NodeEvictionCandidate>& evicti
// or disadvantaged characteristics.
ProtectEvictionCandidatesByRatio(vEvictionCandidates);

if (vEvictionCandidates.empty()) return std::nullopt;
if (vEvictionCandidates.empty()) return force_evict;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps insert a comment before this line // This may still return std::nullopt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest expanding this comment to explain what exactly happened.

*/
[[nodiscard]] std::optional<NodeId> SelectNodeToEvict(std::vector<NodeEvictionCandidate>&& vEvictionCandidates);
[[nodiscard]] std::optional<NodeId> SelectNodeToEvict(std::vector<NodeEvictionCandidate>&& vEvictionCandidates, bool force = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of default arguments; consider making this not have a default. It makes sense if there are many existing calls to a function and you don't want to touch them all, but there's only one call to this function in production code. You would have to touch about 6 call sites in test code, but that's not too bad. The reason I'm not a fan of default arguments is that they hide something that may be helpful to see when just looking at the call site. (You might think, "Oh, it's possible that eviction can be 'forced', what does that mean?) But this is an optional suggestion, fine if you leave it as-is, just something to consider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I like this, will do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still remains default, no?

@DrahtBot DrahtBot requested a review from stickies-v June 10, 2023 18:20
@LarryRuane
Copy link
Contributor

Even though not needed for this PR, you may want to consider adding a commit to remove the unnecessary templating. It's perfectly reasonable not to make this change in this PR, but I thought I'd document it just in case; I do think it makes the code easier to understand.

--- a/src/node/eviction.cpp
+++ b/src/node/eviction.cpp
@@ -74,9 +74,10 @@ struct CompareNodeNetworkTime {
 };
 
 //! Sort an array by the specified comparator, then erase the last K elements where predicate is true.
-template <typename T, typename Comparator>
 static void EraseLastKElements(
-    std::vector<T>& elements, Comparator comparator, size_t k,
+    std::vector<NodeEvictionCandidate>& elements,
+    std::function<bool(const NodeEvictionCandidate&, const NodeEvictionCandidate&)> comparator,
+    size_t k,
     std::function<bool(const NodeEvictionCandidate&)> predicate = [](const NodeEvictionCandidate& n) { return true; })
 {
     std::sort(elements.begin(), elements.end(), comparator);

@pinheadmz
Copy link
Member Author

@LarryRuane thanks for reviewing. I actually had removed the templating in the very first version of this branch but after changing the approach I just left it alone. The function is written like a generic utility function, even though we don't currently use it for any other vectors. So I think the function could go as written into a util.cpp or something, in a future PR ?

@pinheadmz
Copy link
Member Author

Thanks for the review Gleb, I also rebased on master to fix conflicts

Copy link
Member

@naumenkogs naumenkogs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you missed the two comments:
one
two

@@ -226,6 +226,8 @@ class CNodeStats
TransportProtocolType m_transport_type;
/** BIP324 session id string in hex, if any. */
std::string m_session_id;
/** whether this peer forced its connection by evicting another */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8c20268

So forced actually means two things to you:

  1. Before connecting — something that might evict another peer
  2. After connecting — something that has evicted another peer

I think this is confusing and better to use different words. RPC can expose "force_evicted" flag (to cover the latter case and apply to the Connection), while "forceInbound" can stay in the network/potential-connection context (the former case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, in the previous commit 7586802 you assign forced = true; even for non-forceInbound peers.

So basically if a node had 8 connections which are non-forceInbound but still did eviction, a real forceInbound connection won't be able to join (see how you count nForced). Is that intended? Seems like a bug to me.

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Nov 10, 2023
Accomplished by adding a bool argument `force` to SelectNodeToEvict()

Github-Pull: bitcoin#27600
Rebased-From: 0c0f2a2
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Nov 10, 2023
Only inbound nodes with this permission set will call
`SelectNodeToEvict()` with force=true, so when connections are full
there is an increased liklihood of opening a slot for the new inbound.
Extends NoBan permission.

Github-Pull: bitcoin#27600
Rebased-From: 99399b3
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Nov 10, 2023
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Nov 10, 2023
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Nov 10, 2023
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Nov 10, 2023
@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 6, 2023

🐙 This pull request conflicts with the target branch and needs rebase.

@sr-gi
Copy link
Member

sr-gi commented Jan 4, 2024

I feel like this makes sense conceptually, but I have similar concerns to @stickies-v @mzumsande and @naumenkogs with it.

If we could approach this without the need to introduce an additional permission I'd be happier, since it seems a big change for a narrow use case with a potential workaround by just accepting more than 38 total connections (plus N for the nobans). I also wonder how this plays out with #27114, given this would be an in only permission.

Given this only triggers under really specific conditions, can we not just prioritize our whitelisted peer under those conditions? The addition of new whitelisted peers can take priority over the ones we protect for "diversity criteria" as long as we do not have enough candidates to evict a peer, that is: after protecting the current noban and outbound peers, if the number of remaining peers is smaller than the number of peers we will protect for diversity criteria, we just pick one of the remaining at random and remove them. All this limited to a maximum as you are already doing with MAX_FORCED_INBOUND_CONNECTIONS.

For this to trigger, your number of noban connections needs to be too high, and/or, your max number of connections is too low. If we are concerned that this could backfire, as @vasild mentioned, we could even gatekeep it under a global flag so it does not affect any of the existing noban connections.

Retropex pushed a commit to Retropex/bitcoin that referenced this pull request Mar 28, 2024
Accomplished by adding a bool argument `force` to SelectNodeToEvict()

Github-Pull: bitcoin#27600
Rebased-From: 0c0f2a2
Retropex pushed a commit to Retropex/bitcoin that referenced this pull request Mar 28, 2024
Only inbound nodes with this permission set will call
`SelectNodeToEvict()` with force=true, so when connections are full
there is an increased liklihood of opening a slot for the new inbound.
Extends NoBan permission.

Github-Pull: bitcoin#27600
Rebased-From: 99399b3
Retropex pushed a commit to Retropex/bitcoin that referenced this pull request Mar 28, 2024
Retropex pushed a commit to Retropex/bitcoin that referenced this pull request Mar 28, 2024
Retropex pushed a commit to Retropex/bitcoin that referenced this pull request Mar 28, 2024
Retropex pushed a commit to Retropex/bitcoin that referenced this pull request Mar 28, 2024
@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 2, 2024

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@achow101 achow101 closed this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

whiteconnections should be re-added