-
Notifications
You must be signed in to change notification settings - Fork 35.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: Add new permission forceinbound
to evict a random unprotected connection if all slots are otherwise full
#27600
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, makes sense to prioritize whitelisted peers. Will review more in-depth soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
As I suggested in the review club, an alternative, more simple approach would be to just pick a random peer after removing NoBan and outbound connections when in force-mode. Then, if at the end of the usual eviction algorithm, we don't have a evict-able peer, we would evict the random one instead.
This would make the code simpler (no need to change EraseLastKElements
or keep track of last
), and I don't really see a major downside:
- The
EraseLastKElements
eviction criteria don't really seem to be sorted by priority anyway right now (because order doesn't matter so far), so evicting a peer that would've been protected by later calls doesn't seem better than evicting one that would've been protected by earlier ones. - Whitebind No-Ban peers are more trusted to begin with, but if they maliciously wanted to take over all inbound slots by repeatedly connecting to us on the whitebind address, they can do that anyway, whether the eviction is random or whether the current approach of the PR is used. So it's not clear to me what the extra benefit of the current approach over random eviction is.
e71d495
to
c826187
Compare
@stickies-v @mzumsande thanks for your review and feedback, I refactored the PR so we evict a random peer when forced if all other peers are protected. It is MUCH simpler ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Light approach ACK c826187
I think this random selection approach suggested by mzumsande is a more elegant approach. "Light" in approach ACK because I need to build more confidence that this doesn't introduce new attack vectors.
c826187
to
2ab1ed6
Compare
Thanks @stickies-v nits addressed! 🙏 |
2ab1ed6
to
96b513f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 96b513f, but I think the first commit should be removed from this PR, I think it's an accident?
I think the code is good and suits the intent of the PR well. My only concern is about the potential footgun introduced, even if it is relatively mild. Anyone running bitcoind with whitebind=0.0.0.0:<port>
will now be vulnerable to having all their inbound non-whitelisted slots taken over by an attacker that has figured out what the whitelisted port is. It doesn't seem like a huge concern, given that:
- it only affects inbound connections, which we already assume to be more susceptible to these kinds of attacks, and
- it does require the user to manually whitebind to
0.0.0.0
but I just wanted to flag it here anyway in case it is actually more serious than I see it to be.
Also, this seems like behaviour change that would benefit from a mention in the release notes?
src/node/eviction.cpp
Outdated
std::optional<NodeId> force_evict; | ||
if (vEvictionCandidates.size() > 0 && force) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if vEvictionCandidates
is empty, there's no point (I think, couldn't see any side effects in EraseLastKElements
or ProtectEvictionCandidatesByRatio
) executing the next lines so might as well return early? Unless you think this makes the code less clear?
std::optional<NodeId> force_evict; | |
if (vEvictionCandidates.size() > 0 && force) { | |
if (vEvictionCandidates.empty()) return std::nullopt; | |
std::optional<NodeId> force_evict; | |
if (force) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, although "if empty, return now" logic would make sense after each step in this function (why bother calling EraseLastKElements()
with an empty array?) even without this PR. One thing we could do is add that logic to the beginning of EraseLastKElements()
itself, but I dunno how much time is really wasted in there (sort
, min
, erase
... ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had similar considerations to e.g. wrap the EraseLastKElements
call in a lambda fn, but it's probably not worth the LoC change.
I think this case is slightly different in that it better highlights that if there are no inbound NoBan peers, the result is always std::nullopt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, i'll add it. one line i think in this case will improve readability
test/functional/p2p_eviction.py
Outdated
@@ -122,6 +122,36 @@ def run_test(self): | |||
self.log.debug("{} protected peers: {}".format(len(protected_peers), protected_peers)) | |||
assert evicted_peers[0] not in protected_peers | |||
|
|||
self.log.info("Test that whitebind inbounds get extra eviction power") | |||
# Only allow 1 inbound connection, but set whitebind | |||
self.restart_node(0, extra_args=['-maxconnections=12', '-whitebind=0.0.0.0:30201']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mentioned in the review club, whitelisting just based on ports enables an attacker that discovered which port(s) are whitelisted to take over all (non-whitelisted) inbound connections slots.
Therefore, I think we have to be careful not to accidentally lead people to this footgun. Binding to 127.0.0.1
seems much more prudent to not set a bad example.
self.restart_node(0, extra_args=['-maxconnections=12', '-whitebind=0.0.0.0:30201']) | |
self.restart_node(0, extra_args=['-maxconnections=12', '-whitebind=127.0.0.1:30201']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated test and also included warning in release note
96b513f
to
fa78fc5
Compare
doc/release-notes-27600.md
Outdated
permissions will now be more likely to connect even if `maxconnections` is | ||
reached and the node's inbound slots are full. Because this is achieved by | ||
more aggressively finding a current connection to evict, users should take | ||
care only to set these permissions on local ports that attackers can not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "on local ports" is unnecessarily restrictive/scary. I think it's still completely fine to whitelist remote addresses, you mostly just want to avoid ranges, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzumsande what's your opinion on this? Is this PR a big enough change to warrant this kind of warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure, I've never looked into these options much and don't know about best practices - I think that if you use -whitebind
with non-local addresses, you'd at least have to make sure that that address is not self-advertised. I guess that this applies more to -whitebind
than -whitelist
, so that the preferred approach in case of a non-local connection would be to use -whitelist
?
fyi @vasild, do you have an opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, whitelist
is harder to attack than whitebind
because the attacker would have to spoof their origin IP repeatedly to fill up your inbounds. If a user whitebind
's a port and an attacker figures out that port numnber, they can trivially evict all your other inbounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what you say above - whitebind
on publicly accessible address:port with this new permission sounds bad. Similar with whitelist
- if a range is used.
This PR currently expands the semantic of the noban
permission. This will affect existent setups that already use it. Would it make sense to introduce a new permisson, separate from noban
? I mean - now if somebody is running -whitebind=noban@publicaddr:port
then a bad actor could cause harm on master
, but even more harm with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting idea, so we could add NetPermissionFlags::NoBanForce
or something. The danger will still be present for users of the permission, but existing users of NoBan
wouldn't have to worry. And since NoBan
is a default of whitebind
it'll require more user attention anyway to use the more dangerous option.
In that case, the release note should still warn the user about using NoBanForce
(or whatever we call it) but that warning can just be general, like, keep an eye on your netinfo if you do this. As opposed to just recommending only setting local addresses with whitebind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR currently expands the semantic of the
noban
permission
Could you please elaborate on this point? I thought that with this PR, noban
nodes are still completely protected (they are removed from the eviction candidates list before the random node is chosen). But it's certain that I'm missing something. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LarryRuane you are correct but what we are changing (in the current state of this branch) is that noban
nodes can now force disconnection of other peers. Since many users may already have noban
set, maybe even for a large range of IPs, this branch would introduce a new vulnerability they may not be aware of. Because of that I think it does make more sense to specify a new permission so users can narrow the attack surface
Please consider editing the description slightly, if my understanding is correct.
At this point in the process, where we choose a random peer, we're considering all (inbound, not noban) peers, is that correct? They are all (other than outbound or noban) "unprotected" because we haven't protected any of them yet. So maybe this should say, "... selecting a random peer" or "... selecting a random inbound peer" -- because we may randomly choose a peer that ends up being protected, or not protected. That would make this PR easier for me to understand. I do think the updated algorithm is very elegant (great suggestion, @mzumsande). It chooses a random peer, which may end up being erased from the eviction candidate list, or may not, but we don't care either way; we evict it if we need to, all the better if it actually did not get protected (but, again, we don't care). By the way, I'm going to highlight this PR in the Optech newsletter next week, that's why I'm particularly interested in making sure I understand it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK fa78fc5
except for possible attack vectors (for example, NoBanForce
) that I'm not really qualified to comment on. I may try to contribute to that discussion after I've learned more. Nice PR!
doc/release-notes-27600.md
Outdated
permissions will now be more likely to connect even if `maxconnections` is | ||
reached and the node's inbound slots are full. Because this is achieved by | ||
more aggressively finding a current connection to evict, users should take | ||
care only to set these permissions on local ports that attackers can not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR currently expands the semantic of the
noban
permission
Could you please elaborate on this point? I thought that with this PR, noban
nodes are still completely protected (they are removed from the eviction candidates list before the random node is chosen). But it's certain that I'm missing something. Thanks.
@@ -204,7 +215,7 @@ void ProtectEvictionCandidatesByRatio(std::vector<NodeEvictionCandidate>& evicti | |||
// or disadvantaged characteristics. | |||
ProtectEvictionCandidatesByRatio(vEvictionCandidates); | |||
|
|||
if (vEvictionCandidates.empty()) return std::nullopt; | |||
if (vEvictionCandidates.empty()) return force_evict; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps insert a comment before this line // This may still return std::nullopt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest expanding this comment to explain what exactly happened.
*/ | ||
[[nodiscard]] std::optional<NodeId> SelectNodeToEvict(std::vector<NodeEvictionCandidate>&& vEvictionCandidates); | ||
[[nodiscard]] std::optional<NodeId> SelectNodeToEvict(std::vector<NodeEvictionCandidate>&& vEvictionCandidates, bool force = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of default arguments; consider making this not have a default. It makes sense if there are many existing calls to a function and you don't want to touch them all, but there's only one call to this function in production code. You would have to touch about 6 call sites in test code, but that's not too bad. The reason I'm not a fan of default arguments is that they hide something that may be helpful to see when just looking at the call site. (You might think, "Oh, it's possible that eviction can be 'forced', what does that mean?) But this is an optional suggestion, fine if you leave it as-is, just something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I like this, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still remains default, no?
Even though not needed for this PR, you may want to consider adding a commit to remove the unnecessary templating. It's perfectly reasonable not to make this change in this PR, but I thought I'd document it just in case; I do think it makes the code easier to understand.
|
@LarryRuane thanks for reviewing. I actually had removed the templating in the very first version of this branch but after changing the approach I just left it alone. The function is written like a generic utility function, even though we don't currently use it for any other vectors. So I think the function could go as written into a util.cpp or something, in a future PR ? |
311902f
to
115b728
Compare
Thanks for the review Gleb, I also rebased on master to fix conflicts |
115b728
to
8c20268
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -226,6 +226,8 @@ class CNodeStats | |||
TransportProtocolType m_transport_type; | |||
/** BIP324 session id string in hex, if any. */ | |||
std::string m_session_id; | |||
/** whether this peer forced its connection by evicting another */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So forced actually means two things to you:
- Before connecting — something that might evict another peer
- After connecting — something that has evicted another peer
I think this is confusing and better to use different words. RPC can expose "force_evicted" flag (to cover the latter case and apply to the Connection
), while "forceInbound" can stay in the network/potential-connection context (the former case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, in the previous commit 7586802 you assign forced = true;
even for non-forceInbound
peers.
So basically if a node had 8 connections which are non-forceInbound
but still did eviction, a real forceInbound
connection won't be able to join (see how you count nForced). Is that intended? Seems like a bug to me.
Accomplished by adding a bool argument `force` to SelectNodeToEvict() Github-Pull: bitcoin#27600 Rebased-From: 0c0f2a2
Only inbound nodes with this permission set will call `SelectNodeToEvict()` with force=true, so when connections are full there is an increased liklihood of opening a slot for the new inbound. Extends NoBan permission. Github-Pull: bitcoin#27600 Rebased-From: 99399b3
Github-Pull: bitcoin#27600 Rebased-From: 8bc2030
… full Github-Pull: bitcoin#27600 Rebased-From: 6b6bcaf
Github-Pull: bitcoin#27600 Rebased-From: 7586802
Github-Pull: bitcoin#27600 Rebased-From: 8c20268
🐙 This pull request conflicts with the target branch and needs rebase. |
I feel like this makes sense conceptually, but I have similar concerns to @stickies-v @mzumsande and @naumenkogs with it. If we could approach this without the need to introduce an additional permission I'd be happier, since it seems a big change for a narrow use case with a potential workaround by just accepting more than 38 total connections (plus Given this only triggers under really specific conditions, can we not just prioritize our whitelisted peer under those conditions? The addition of new whitelisted peers can take priority over the ones we protect for "diversity criteria" as long as we do not have enough candidates to evict a peer, that is: after protecting the current For this to trigger, your number of |
Accomplished by adding a bool argument `force` to SelectNodeToEvict() Github-Pull: bitcoin#27600 Rebased-From: 0c0f2a2
Only inbound nodes with this permission set will call `SelectNodeToEvict()` with force=true, so when connections are full there is an increased liklihood of opening a slot for the new inbound. Extends NoBan permission. Github-Pull: bitcoin#27600 Rebased-From: 99399b3
Github-Pull: bitcoin#27600 Rebased-From: 8bc2030
… full Github-Pull: bitcoin#27600 Rebased-From: 6b6bcaf
Github-Pull: bitcoin#27600 Rebased-From: 7586802
Github-Pull: bitcoin#27600 Rebased-From: 8c20268
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Closes #8798
Use case: I run a full node that accepts inbound connections and have a
whitebind
setting so my personal light client can always connect, even whenmaxconnections
(and particularly all inbound slots) is already full.Currently when connections are full, if we receive an inbound peer request, we look for a current connection to evict so the new peer can have a slot. To find an evict-able peer we go through all our peers and "protect" multiple categories of peers, then we evict the "worst" peer that is left unprotected. If there are no peers left to evict, the inbound connection is denied.
With this PR, if the inbound connection has
forceinbound
permission we start the eviction process by first protecting allnoban
and outbound connections, then selecting one of the remaining peers (if any) at random. Then we loop through all our current connections, removing protected peers from the evict-able list. If we end up protecting all our remaining connections, the randomly chosen peer is evicted.forceinbound
impliesnoban
permission.All outbound and
noban
connections remain protected from eviction.