Connection slot exhaustion DoS mitigation #6374

Merged
merged 15 commits into from Sep 3, 2015

Conversation

Projects
None yet
7 participants
@pstratem
Contributor

pstratem commented Jul 4, 2015

Connection slots are a limited resource which can be the target of DoS attacks.

This issue was introduced in 2011 by 5a3e82f.

In mitigating this issue it is important to take steps to avoid network partitioning.

I have taken the approach of protecting connections with various properties from eviction.

Of the nodes still available for eviction the most recently connected node from the CNetAddr with the most connections is selected and evicted.

The largest class of protected connections is those which have been connected for the longest time.
The intent is to maintain the strong bias towards these connections which exists today and thus avoid any risk of network partition.

@laanwj laanwj added the P2P label Jul 5, 2015

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Jul 5, 2015

Contributor

This has been tested by setting up a node and then connected to it from the same source ip in a loop.

Additional testing is needed around multiple source ips.

Contributor

pstratem commented Jul 5, 2015

This has been tested by setting up a node and then connected to it from the same source ip in a loop.

Additional testing is needed around multiple source ips.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jul 5, 2015

Member

Concept ACK

This does need extensive testing in various scenarios - e.g. what happens with Tor hidden service connections, which all appear to come from one IP (localhost) address.

Member

laanwj commented Jul 5, 2015

Concept ACK

This does need extensive testing in various scenarios - e.g. what happens with Tor hidden service connections, which all appear to come from one IP (localhost) address.

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Jul 5, 2015

Contributor

Tested with multiple inbound connections from 128+ source ips.

Long lived connections were stable and the newer connections dropped.

Contributor

pstratem commented Jul 5, 2015

Tested with multiple inbound connections from 128+ source ips.

Long lived connections were stable and the newer connections dropped.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jul 6, 2015

Contributor

@pstratem How did you actually test that?

We could make good use of automated scripts to make such testing relocatable.

Contributor

petertodd commented Jul 6, 2015

@pstratem How did you actually test that?

We could make good use of automated scripts to make such testing relocatable.

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Jul 6, 2015

Contributor

@petertodd python script that connects to the node in a loop running on top of torify and me sitting there hitting "new identity" in vidalia a bunch until i had unique ips connecting...

not exactly an automated process

Contributor

pstratem commented Jul 6, 2015

@petertodd python script that connects to the node in a loop running on top of torify and me sitting there hitting "new identity" in vidalia a bunch until i had unique ips connecting...

not exactly an automated process

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Jul 12, 2015

Contributor

@laanwj Missed the second part of your comment.

This wont ever evict localhost connections, so inbound connections to a hidden service wont ever be disconnected by this.

That's actually not optimal, but unfortunately getting info on inbound hidden service connections requires interfacing with tors control port.

That's definitely out of scope for this patch set.

Contributor

pstratem commented Jul 12, 2015

@laanwj Missed the second part of your comment.

This wont ever evict localhost connections, so inbound connections to a hidden service wont ever be disconnected by this.

That's actually not optimal, but unfortunately getting info on inbound hidden service connections requires interfacing with tors control port.

That's definitely out of scope for this patch set.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jul 28, 2015

Member

@pstratem Absolutely - I wasn't implying that you'd have to interact with Tor's control port in this pull, just that it's a requirement that it didn't make the current situation worse.

Needs a trivial rebase in net.cpp due to #5288.

Member

laanwj commented Jul 28, 2015

@pstratem Absolutely - I wasn't implying that you'd have to interact with Tor's control port in this pull, just that it's a requirement that it didn't make the current situation worse.

Needs a trivial rebase in net.cpp due to #5288.

@laanwj

View changes

src/net.cpp
CloseSocket(hSocket);
+ delete pnode;

This comment has been minimized.

@laanwj

laanwj Aug 3, 2015

Member

Deleting CNodes happens in the "Delete disconnected nodes" loop, which first makes sure that no one is using the node anymore. Adding a delete pnode here seems like a danger for race conditions?

@laanwj

laanwj Aug 3, 2015

Member

Deleting CNodes happens in the "Delete disconnected nodes" loop, which first makes sure that no one is using the node anymore. Adding a delete pnode here seems like a danger for race conditions?

This comment has been minimized.

@pstratem

pstratem Aug 3, 2015

Contributor

This is deleting the CNode which was created at 984 and which was not added to vNodes.

Not deleting it here would be a memory leak.

@pstratem

pstratem Aug 3, 2015

Contributor

This is deleting the CNode which was created at 984 and which was not added to vNodes.

Not deleting it here would be a memory leak.

This comment has been minimized.

@laanwj

laanwj Aug 7, 2015

Member

Bah.
I wonder if we could use e.g. boost::scoped_ptr or auto_ptr to avoid this. It's too easy to get memory leaks with manual deallocation along a subset of code paths, especially when factoring in exceptions.

@laanwj

laanwj Aug 7, 2015

Member

Bah.
I wonder if we could use e.g. boost::scoped_ptr or auto_ptr to avoid this. It's too easy to get memory leaks with manual deallocation along a subset of code paths, especially when factoring in exceptions.

This comment has been minimized.

@pstratem

pstratem Aug 7, 2015

Contributor

Alternatively I could just add it to vNodes and rely on the normal cleanup logic.

@pstratem

pstratem Aug 7, 2015

Contributor

Alternatively I could just add it to vNodes and rely on the normal cleanup logic.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Aug 13, 2015

Contributor

Spoke with @pstratem about this for a bit and we agreed this should be refactored so that the connection acceptance stuff is in a separate function. Also after this change the nWhiteConnections variable is set in init but goes unused.

Contributor

TheBlueMatt commented Aug 13, 2015

Spoke with @pstratem about this for a bit and we agreed this should be refactored so that the connection acceptance stuff is in a separate function. Also after this change the nWhiteConnections variable is set in init but goes unused.

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Aug 14, 2015

Contributor

@TheBlueMatt I've simply removed the whiteconnections option in favor of protecting any inbound whitelisted connection. This is safe as outbound connections are never disconnected.

Contributor

pstratem commented Aug 14, 2015

@TheBlueMatt I've simply removed the whiteconnections option in favor of protecting any inbound whitelisted connection. This is safe as outbound connections are never disconnected.

@Diapolo

View changes

src/net.cpp
+ BOOST_FOREACH(CNode *node, vEvictionCandidates) {
+ mapAddrCounts[node->addr].push_back(node);
+
+ if (mapAddrCounts[node->addr].size() > nMostConnections) {

This comment has been minimized.

@Diapolo

Diapolo Aug 14, 2015

size_t to unsigned int is safe? You could use size_t for nMostConnections also.

@Diapolo

Diapolo Aug 14, 2015

size_t to unsigned int is safe? You could use size_t for nMostConnections also.

This comment has been minimized.

@pstratem

pstratem Aug 14, 2015

Contributor

This is almost certainly safe because the values are heavily constrained, however it's not an issue to switch to size_t so I will

@pstratem

pstratem Aug 14, 2015

Contributor

This is almost certainly safe because the values are heavily constrained, however it's not an issue to switch to size_t so I will

This comment has been minimized.

@dcousens

dcousens Aug 15, 2015

Contributor

@Diapolo aren't there warnings for this?

@dcousens

dcousens Aug 15, 2015

Contributor

@Diapolo aren't there warnings for this?

This comment has been minimized.

@pstratem

pstratem Aug 15, 2015

Contributor

@dcousens they're both unsigned, so no

@pstratem

pstratem Aug 15, 2015

Contributor

@dcousens they're both unsigned, so no

@Diapolo

View changes

src/net.cpp
+ }
+ }
+
+ // Protect connections with certain characteristics

This comment has been minimized.

@Diapolo

Diapolo Aug 14, 2015

This should be a more detailed comment IMHO.

@Diapolo

Diapolo Aug 14, 2015

This should be a more detailed comment IMHO.

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Aug 20, 2015

Contributor

Yes, please add a comment above each sort/erase block that explains what/why you are protecting that group.

@TheBlueMatt

TheBlueMatt Aug 20, 2015

Contributor

Yes, please add a comment above each sort/erase block that explains what/why you are protecting that group.

This comment has been minimized.

@pstratem

pstratem Aug 20, 2015

Contributor

Comments added

@pstratem

pstratem Aug 20, 2015

Contributor

Comments added

+ if (!addr.SetSockAddr((const struct sockaddr*)&sockaddr))
+ LogPrintf("Warning: Unknown socket family\n");
+
+ bool whitelisted = hListenSocket.whitelisted || CNode::IsWhitelistedRange(addr);

This comment has been minimized.

@Diapolo

Diapolo Aug 14, 2015

You seem to use this bool first at line 916, why not move the check there? At first look I had the impression it belongs to the LOCK block.

@Diapolo

Diapolo Aug 14, 2015

You seem to use this bool first at line 916, why not move the check there? At first look I had the impression it belongs to the LOCK block.

This comment has been minimized.

@dcousens

dcousens Aug 14, 2015

Contributor

I also had this impression, though that is a common theme with several of the variables (nInbound, nMaxInbound). It'd be nice to see them declared where they are used, rather than C99 style at the top of the function.

edit: Though to be fair, this code is simply copied, so this is not @pstratem's fault.

@dcousens

dcousens Aug 14, 2015

Contributor

I also had this impression, though that is a common theme with several of the variables (nInbound, nMaxInbound). It'd be nice to see them declared where they are used, rather than C99 style at the top of the function.

edit: Though to be fair, this code is simply copied, so this is not @pstratem's fault.

This comment has been minimized.

@pstratem

pstratem Aug 14, 2015

Contributor

This is all copied directly from what was there before.

I can clean up the logic a bit more later, but for now it's nice that the diff is minimal.

@pstratem

pstratem Aug 14, 2015

Contributor

This is all copied directly from what was there before.

I can clean up the logic a bit more later, but for now it's nice that the diff is minimal.

This comment has been minimized.

@dcousens

dcousens Aug 15, 2015

Contributor

Agreed on the clean diff, @pstratem could you put a TODO in there (or make an issue?) though such that this isn't just forgotten after this though?

@dcousens

dcousens Aug 15, 2015

Contributor

Agreed on the clean diff, @pstratem could you put a TODO in there (or make an issue?) though such that this isn't just forgotten after this though?

+ nInbound++;
+ }
+
+ if (hSocket == INVALID_SOCKET)

This comment has been minimized.

@Diapolo

Diapolo Aug 14, 2015

Is this the check for the accept call? Would be cleaner if it was below the call. You also wouldn't need the hSocket != INVALID_SOCKET check at the top.

@Diapolo

Diapolo Aug 14, 2015

Is this the check for the accept call? Would be cleaner if it was below the call. You also wouldn't need the hSocket != INVALID_SOCKET check at the top.

This comment has been minimized.

@pstratem

pstratem Aug 14, 2015

Contributor

I agree, but I think that's for a separate pull request.

@pstratem

pstratem Aug 14, 2015

Contributor

I agree, but I think that's for a separate pull request.

This comment has been minimized.

@dcousens

dcousens Aug 15, 2015

Contributor

Do we file these as issues or TODOs?

@dcousens

dcousens Aug 15, 2015

Contributor

Do we file these as issues or TODOs?

This comment has been minimized.

@pstratem

pstratem Aug 15, 2015

Contributor

I would say file them as a single issue. something like "clean up logic/code in AcceptConnection"

@pstratem

pstratem Aug 15, 2015

Contributor

I would say file them as a single issue. something like "clean up logic/code in AcceptConnection"

@@ -4522,6 +4522,7 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
if (pingUsecTime > 0) {
// Successful ping time measurement, replace previous
pfrom->nPingUsecTime = pingUsecTime;
+ pfrom->nMinPingUsecTime = std::min(pfrom->nMinPingUsecTime, pingUsecTime);

This comment has been minimized.

@Diapolo

Diapolo Aug 14, 2015

Is this interresting for getpeerinfo?

@Diapolo

Diapolo Aug 14, 2015

Is this interresting for getpeerinfo?

This comment has been minimized.

@pstratem

pstratem Aug 14, 2015

Contributor

Probably, but again separate pr

@pstratem

pstratem Aug 14, 2015

Contributor

Probably, but again separate pr

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Aug 14, 2015

Contributor

Concept ACK

Contributor

dcousens commented Aug 14, 2015

Concept ACK

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Aug 17, 2015

Member

I think i'm missing a thing: how does this change make whiteconnections redundant? There is no information in the commit message.

Member

laanwj commented Aug 17, 2015

I think i'm missing a thing: how does this change make whiteconnections redundant? There is no information in the commit message.

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Aug 19, 2015

Contributor

@laanwj All whitelisted connections are protected. Which makes a parameter for protecting a specific number of whitelisted connections redundant.

Contributor

pstratem commented Aug 19, 2015

@laanwj All whitelisted connections are protected. Which makes a parameter for protecting a specific number of whitelisted connections redundant.

@TheBlueMatt

View changes

src/net.cpp
+ continue;
+ if (node->addr.IsLocal())
+ continue;
+ vEvictionCandidates.push_back(node);

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Aug 20, 2015

Contributor

Dont you need to increment the use count on the node here?

@TheBlueMatt

TheBlueMatt Aug 20, 2015

Contributor

Dont you need to increment the use count on the node here?

@TheBlueMatt

View changes

src/net.cpp
+ // Protect connections with certain characteristics
+ static CompareNetGroupKeyed comparerNetGroupKeyed;
+ std::sort(vEvictionCandidates.begin(), vEvictionCandidates.end(), comparerNetGroupKeyed);
+ vEvictionCandidates.erase(vEvictionCandidates.end() - std::min(4, static_cast<int>(vEvictionCandidates.size())), vEvictionCandidates.end());

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Aug 20, 2015

Contributor

These constants (4, 8, 64) really need to be a function of your configured maximum connections.

@TheBlueMatt

TheBlueMatt Aug 20, 2015

Contributor

These constants (4, 8, 64) really need to be a function of your configured maximum connections.

@TheBlueMatt

View changes

src/net.cpp
+ // Protect the 64 nodes which have been connected the longest.
+ // This replicates the existing implicit behavior.
+ std::sort(vEvictionCandidates.begin(), vEvictionCandidates.end(), ReverseCompareNodeTimeConnected);
+ vEvictionCandidates.erase(vEvictionCandidates.end() - std::min(64, static_cast<int>(vEvictionCandidates.size())), vEvictionCandidates.end());

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Aug 21, 2015

Contributor

Seems my comment was lost in a rebase, can you change the constants (4, 8, 64) to be some multiple of max connections?

@TheBlueMatt

TheBlueMatt Aug 21, 2015

Contributor

Seems my comment was lost in a rebase, can you change the constants (4, 8, 64) to be some multiple of max connections?

This comment has been minimized.

@pstratem

pstratem Aug 21, 2015

Contributor

Hmm not sure that makes sense actually.

The goal is to prevent a sybil attack, constants are probably best for that.

@pstratem

pstratem Aug 21, 2015

Contributor

Hmm not sure that makes sense actually.

The goal is to prevent a sybil attack, constants are probably best for that.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Aug 24, 2015

Contributor

utACK (may want to squash a few of the last commits, but doesnt matter).

Contributor

TheBlueMatt commented Aug 24, 2015

utACK (may want to squash a few of the last commits, but doesnt matter).

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Aug 25, 2015

Member

@pstratem Fair enough.

Member

laanwj commented Aug 25, 2015

@pstratem Fair enough.

@laanwj

View changes

src/net.cpp
+ ~CNodeRef() {_pnode->Release();}
+
+ CNode& operator *() const {return *_pnode;};
+ CNode* operator ->() const {return _pnode;};

This comment has been minimized.

@laanwj

laanwj Aug 25, 2015

Member

This class needs an explicit copy constructor and assignment operator to be safe to use inside STL containers:

    CNodeRef& operator =(const CNodeRef& other)
    {
        if (this != &other) {
            _pnode->Release();
            _pnode = other._pnode;
            _pnode->AddRef();
        }
        return *this;
    }
    CNodeRef(const CNodeRef& other):
        _pnode(other._pnode)
    {
        _pnode->AddRef();
    }

Otherwise, copying it won't properly update reference counts.

@laanwj

laanwj Aug 25, 2015

Member

This class needs an explicit copy constructor and assignment operator to be safe to use inside STL containers:

    CNodeRef& operator =(const CNodeRef& other)
    {
        if (this != &other) {
            _pnode->Release();
            _pnode = other._pnode;
            _pnode->AddRef();
        }
        return *this;
    }
    CNodeRef(const CNodeRef& other):
        _pnode(other._pnode)
    {
        _pnode->AddRef();
    }

Otherwise, copying it won't properly update reference counts.

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Aug 25, 2015

Contributor

@sipa you're commenting on a commit from ~13 days ago, most of those issues have been fixed since

Contributor

pstratem commented Aug 25, 2015

@sipa you're commenting on a commit from ~13 days ago, most of those issues have been fixed since

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Aug 25, 2015

Member

@pstratem I'm going through the commits one by one.

Member

sipa commented Aug 25, 2015

@pstratem I'm going through the commits one by one.

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Aug 25, 2015

Contributor

@sipa ah ok then

Contributor

pstratem commented Aug 25, 2015

@sipa ah ok then

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Aug 25, 2015

Member

Concept ACK. I think that calling AddRef/Release without holding cs_vNodes should not be done.

I think the biases can be improved still - for example by computing a score per node based on ping time, and then penalizing the scores of nodes from the same netgroup if there are multiple. But that can be done later.

Member

sipa commented Aug 25, 2015

Concept ACK. I think that calling AddRef/Release without holding cs_vNodes should not be done.

I think the biases can be improved still - for example by computing a score per node based on ping time, and then penalizing the scores of nodes from the same netgroup if there are multiple. But that can be done later.

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Aug 26, 2015

Contributor

@sipa this is definitely intended mostly as a framework, the initial rules are an improvement but certainly not intended to be the final set of rules.

Contributor

pstratem commented Aug 26, 2015

@sipa this is definitely intended mostly as a framework, the initial rules are an improvement but certainly not intended to be the final set of rules.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Sep 3, 2015

Member

utACK

Member

laanwj commented Sep 3, 2015

utACK

@laanwj laanwj merged commit 027de94 into bitcoin:master Sep 3, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Sep 3, 2015

Merge pull request #6374
027de94 Use network group instead of CNetAddr in final pass to select node to disconnect (Patrick Strateman)
000c18a Fix comment (Patrick Strateman)
fed3094 Acquire cs_vNodes before changing refrence counts (Patrick Strateman)
69ee1aa CNodeRef copy constructor and assignment operator (Patrick Strateman)
dc81dd0 Return false early if vEvictionCandidates is empty (Patrick Strateman)
17f3533 Better support for nodes with non-standard nMaxConnections (Patrick Strateman)
1317cd1 RAII wrapper for CNode* (Patrick Strateman)
df23937 Add comments to AttemptToEvictConnection (Patrick Strateman)
a8f6e45 Remove redundant whiteconnections option (Patrick Strateman)
b105ba3 Prefer to disconnect peers in favor of whitelisted peers (Patrick Strateman)
2c70153 AttemptToEvictConnection (Patrick Strateman)
4bac601 Record nMinPingUsecTime (Patrick Strateman)
ae037b7 Refactor: Move failure conditions to the top of AcceptConnection (Patrick Strateman)
1ef4817 Refactor: Bail early in AcceptConnection (Patrick Strateman)
541a1dd Refactor: AcceptConnection (Patrick Strateman)

@bitcartel bitcartel referenced this pull request in zcash/zcash Aug 14, 2016

Open

Merge upstream anti DoS patches #1251

str4d added a commit to str4d/zcash that referenced this pull request Jul 13, 2017

Re-organize -maxconnections option handling
Zcash: Was "Added -whiteconnections=<n> option" from bitcoin/bitcoin#5288. The
option was later removed in bitcoin/bitcoin#6374 which we merged in #1258. This
commit contains the difference between the two.

@str4d str4d referenced this pull request in zcash/zcash Jul 14, 2017

Open

Bitcoin 0.12 P2P/Net PRs 1 #2534

str4d added a commit to str4d/zcash that referenced this pull request Nov 9, 2017

Re-organize -maxconnections option handling
Zcash: Was "Added -whiteconnections=<n> option" from bitcoin/bitcoin#5288. The
option was later removed in bitcoin/bitcoin#6374 which we merged in #1258. This
commit contains the difference between the two.

str4d added a commit to str4d/zcash that referenced this pull request Apr 5, 2018

Re-organize -maxconnections option handling
Zcash: Was "Added -whiteconnections=<n> option" from bitcoin/bitcoin#5288. The
option was later removed in bitcoin/bitcoin#6374 which we merged in #1258. This
commit contains the difference between the two.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment