Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: Try to preserve outbound block-relay-only connections during restart #17428

Merged
merged 7 commits into from
Oct 15, 2020

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Nov 9, 2019

This is an implementation of #17326:

This PR prevents a type of eclipse attack when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 9, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj
Copy link
Member

laanwj commented Nov 10, 2019

Don't think this should get in the way of implementing something like this, but have to note here, as it's a significant change of behavior from before: reconnecting to peers "aggressively", was, up to now, the reserve of spy nodes and mass-connectors and such. For normal nodes the chance of making the same outgoing connection twice was pretty low. Historically this has been one of the ways used to detect these.

@gmaxwell
Copy link
Contributor

It would be really good to limit it so that an honest node won't attempt to connect twice to the same peer through this mechanism for some prudent amount of time... like two minutes.

I don't think it would cause any harm to the intent of this mechanism to not bring them up as fast as possible, and it would avoid tripping up detection of aggressive mass connectors.

@sdaftuar
Copy link
Member

Not sure if I’m misreading this code (I haven’t tested it), but it looks like on restart we would only attempt to connect once to each anchor, and then it gets removed from consideration. Does that satisfy the concern about rapid reconnects? I guess if a user rapidly restarted their node, we could see rapid connections to the same peers, but that seems pretty weird.

Anyway I just skimmed the discussion in #17326. I think Greg or Matt commented that we shouldn’t do this with all our block-relay-only connections, but since we only have 2 at the moment, perhaps it’s fine for now. And then in the future if we expand to say 8 block-relay peers we might still cap the anchors at 2. Does that reflect the thinking of others as well?

src/net.cpp Outdated Show resolved Hide resolved
@hebasto
Copy link
Member Author

hebasto commented Nov 21, 2019

@sdaftuar

Thank you for your review. Your comment has been addressed.

Also, an additional safety check added to "p2p: Integrate ReadAnchors() into CConnman()" commit in the latest push.

Copy link
Contributor

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK on the idea but see my concerns in #17326 (comment).

IMO before to go forward we need to agree on a) the scope of anchors: full-relay, block-relay, both, b) anchors selection & lifecyle, i.e after how much time do we select them or based on any metrics, do we prune them from anchors.dat, do we keep a bigger file and select randomly from it

src/net.cpp Outdated
m_anchors = ReadAnchors(m_anchors_db_path);
if (m_anchors.size() > m_max_outbound_block_relay) {
// Keep our policy safe.
m_anchors.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clear the whole instead of picking randomly m_max_outbound_block_relay among the anchors set? If one of our most recent anchors isn't reliable we should be allowed to pick up among older anchors to still of the counter-measure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worst case is assumed. If condition m_anchors.size() > m_max_outbound_block_relay is satisfied, we could suppose that the anchors.dat file is wrong or corrupted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're melting 2 different problems, tracking the anchors set and connecting to peers among this one. I don't understand why reaching the m_max_outbound_block_relay means we have to delete our whole repository of maybe-good anchors peers.

In the scenario you're aiming, if victim restarts and the 2 anchors peers are honest but unreliable, you're going to drop them and pick maybe new outbound peers from attacker-provided ones. You can avoid this if you have a bigger set of potential anchor peers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic implemented in this PR is minimal, and I think that makes sense for keeping it simple to reason about.

with this implementation, its possible for an attacker to poison the anchors and execute an eclipse attack, its just increases the cost of doing so.

until we develop more robust logic around ensuring our anchors are not malicious, I don't think it makes sense to give priority to all historic anchors. I think it could sometimes be beneficial but other times degrade security, eg. if you historically were connected to a malicious peer but now your addrman is clear.

src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated
@@ -1805,11 +1807,24 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
if (nTries > 100)
break;

CAddrInfo addr = addrman.SelectTriedCollision();
CAddress addr;
if (!m_anchors.empty() && (nOutboundBlockRelay < m_max_outbound_block_relay) && !fFeeler) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we check here also nOutboundFullRelay, and so keep same connection order, i.e full-relay then block-only or is there any advantage to invert order ?

Copy link
Member Author

@hebasto hebasto Jan 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order is not inverted. The first target "anchors" is added, and was not present before, i.e., the new order is "anchors" then "full-relay" then "block-relay-only".
If anchors would not the first, they could be connected as other type nodes, and anchoring could not happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But anchors nodes are block-only, so now you have block-only-as-anchors, full-relay, block-only-classics. You can still test addresses returned from addrman against m_anchors andon continue in case of equality to avoid the problem you're describing, but don't think that changes something, do we favor first-connected nodes in anyway ?

@@ -2088,6 +2088,9 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
pfrom->nVersion.load(), pfrom->nStartingHeight,
pfrom->GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom->addr.ToString()) : ""),
pfrom->m_tx_relay == nullptr ? "block-relay" : "full-relay");
if (!pfrom->m_tx_relay) {
DumpAnchors(connman->m_anchors_db_path, connman->GetBlockRelayNodeAddresses());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about waiting some time like 24hours or one block announcement before to qualify node as anchor? That would require at least some good behavior from it, that said and may favor too much well-reliable nodes and decay network-wide connectivity..

nit: you may use scheduler for DumpAnchors like we do for DumpAddresses, that would avoid file syscalls while holding cs_main

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of moving this onto the scheduler thread, but not a huge deal given this only happens once per blocks-only connection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you may use scheduler for DumpAnchors like we do for DumpAddresses, that would avoid file syscalls while holding cs_main

I like the idea of moving this onto the scheduler thread, but not a huge deal given this only happens once per blocks-only connection.

Agree that using CScheduler.scheduleFromNow() for DumpAnchors is an improvement. Let's leave it for follow ups.

@laanwj laanwj added this to the 0.20.0 milestone Jan 20, 2020
@laanwj
Copy link
Member

laanwj commented Jan 20, 2020

Added 0.20 milestone as this mitigates a security issue so it would be good to have.

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d405145 (jamesob/ackr/17428.1.hebasto.p2p_try_to_preserve_outb)

Code looks fine, but seeing some strange behavior locally.

Built, started bitcoind, and noted the current blocks-only peers:

$ ./src/bitcoin-cli -datadir=/data2/bitcoin getpeerinfo | jq '.[] | select(.relaytxes==false) | .addr'

"185.217.241.142:8333"
"2.203.128.83:8333"

Verified the creation of anchors.dat:

$ du -hs /data2/bitcoin/anchors.dat

4.0K    /data2/bitcoin/anchors.dat

Shutdown bitcoind, restarted, checked blocks-only peers:

$ ./src/bitcoin-cli -datadir=/data2/bitcoin getpeerinfo | jq '.[] | select(.relaytxes==false) | .addr'

"2.203.128.83:8333"
"95.216.198.89:8333"

One of the peers has persisted from the first run, but the second is different.
Unexpectedly, my logs don't indicate the anchor connections were ever reused - the
only occurrence of anchor in debug.log is when flushing to anchors.dat.

Show platform data

Tested on Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic

Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror --enable-wallet --enable-debug --with-daemon CXX=/usr/bin/clang++-4.0 CC=/usr/bin/clang-4.0 PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --no-create --no-recursion

Compiled with /usr/bin/ccache /usr/bin/clang++-4.0 -std=c++11 -mavx -mavx2 -std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror -O0 -g3 -ftrapv  -Wstack-protector -fstack-protector-all -msse4 -msha -msse4.1 -msse4.2  i

Compiler version: clang version 4.0.1-10 (tags/RELEASE_401/final)

src/addrdb.cpp Outdated
{
int64_t start = GetTimeMillis();
SerializeFileDB("anchors", anchors_db_path, anchors);
LogPrintf("Flushed %d anchor outbound peer addresses to anchors.dat in %d ms\n", anchors.size(), GetTimeMillis() - start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use logging/timer.h if you care to. (example)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done in the latest push.

@@ -2088,6 +2088,9 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
pfrom->nVersion.load(), pfrom->nStartingHeight,
pfrom->GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom->addr.ToString()) : ""),
pfrom->m_tx_relay == nullptr ? "block-relay" : "full-relay");
if (!pfrom->m_tx_relay) {
DumpAnchors(connman->m_anchors_db_path, connman->GetBlockRelayNodeAddresses());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of moving this onto the scheduler thread, but not a huge deal given this only happens once per blocks-only connection.

@hebasto
Copy link
Member Author

hebasto commented Jan 26, 2020

@ariard

The only idea of this PR is to make a node to try restore outbound block-relay-only connections as they were just before node restart. It is expected that this change in node behavior mitigates restart-based eclipse attacks without significant other tradeoffs.

IMO before to go forward we need to agree on a) the scope of anchors: full-relay, block-relay, both...

Anchoring as a new procedure, or anchors as "seemingly safe" nodes are out of the scope of this PR.

b) anchors selection & lifecyle, i.e after how much time do we select them or based on any metrics, do we prune them from anchors.dat, do we keep a bigger file and select randomly from it

With an idea to make a node to try restore outbound block-relay-only connections as they were just before node restart, anchors are the last known outbound block-relay-only connections only.

#17428 (comment)

What do you think about waiting some time like 24hours or one block announcement before to qualify node as anchor? That would require at least some good behavior from it, that said and may favor too much well-reliable nodes and decay network-wide connectivity..

In this PR anchors are considered as the last known outbound block-relay-only connections, not as "presumable safe" ones.

@hebasto
Copy link
Member Author

hebasto commented Jan 26, 2020

Updated d405145 -> f2b4cfd (pr17428.02 -> pr17428.03, compare).

@ariard @jamesob

Thank you for your reviews. Your comments have been addressed.

@jamesob

... seeing some strange behavior locally.

Added additional logging when connecting to an anchor fails.

@hebasto
Copy link
Member Author

hebasto commented Jan 26, 2020

Pushed a fixup. Ready for testing and (re)reviewing now.

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 37671dd (jamesob/ackr/17428.2.hebasto.p2p_try_to_preserve_outb)

Built locally, verified that anchors were connected to on restart using the same test as above.

Show platform data

Tested on Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic

Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror --enable-wallet --enable-debug --with-daemon CXX=/usr/bin/clang++-4.0 CC=/usr/bin/clang-4.0 PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --no-create --no-recursion

Compiled with /usr/bin/ccache /usr/bin/clang++-4.0 -std=c++11 -mavx -mavx2 -std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror -O0 -g3 -ftrapv  -Wstack-protector -fstack-protector-all -msse4 -msha -msse4.1 -msse4.2  i

Compiler version: clang version 4.0.1-10 (tags/RELEASE_401/final)

@luke-jr
Copy link
Member

luke-jr commented Jan 29, 2020

Couldn't there be a risk here, that if one of the anchor peers is malicious and knows a DoS vulnerability, this ensures they can keep repeatedly crashing your node, even if you setup a watchdog to restart it?

@hebasto
Copy link
Member Author

hebasto commented Jan 29, 2020

Rebased after #16702 has been merged.

@hebasto
Copy link
Member Author

hebasto commented Jan 29, 2020

@luke-jr

Couldn't there be a risk here, that if one of the anchor peers is malicious and knows a DoS vulnerability, this ensures they can keep repeatedly crashing your node, even if you setup a watchdog to restart it?

Could manual deletion of the anchors.dat file be a remedy? See: #17428 (comment)

@ariard
Copy link
Contributor

ariard commented Jan 30, 2020

@hebasto

Concept ACK, reading your comment I do understand that this PR is really-scoped, so don't mind #17326 (comment) and #17428 (comment) but would happily review future work to make anchors better (sorry for the noise)

A note explaining the exact attack scenario mitigated near to m_anchors or somewhere else would be cool to avoid future reviewers relying on wrong-scoped security assumptions.

Goign to test PR soon.

@gmaxwell
Copy link
Contributor

@luke-jr oy. uh. forget them if you uncleanly exit?

@hebasto
Copy link
Member Author

hebasto commented Feb 5, 2020

Rebased after #18023 has been merged.

@hebasto
Copy link
Member Author

hebasto commented Oct 9, 2020

Rebased d77ae88 -> a490d07 (pr17428.31 -> pr17428.32) due to the conflict with #20076.

@jnewbery
Copy link
Contributor

jnewbery commented Oct 9, 2020

code review ACK a490d07

@laanwj
Copy link
Member

laanwj commented Oct 15, 2020

Code review ACK a490d07

@laanwj laanwj merged commit 9855422 into bitcoin:master Oct 15, 2020
@hebasto hebasto deleted the 20191109-anchors branch October 15, 2020 18:47
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 16, 2020
…onnections during restart

a490d07 doc: Add anchors.dat to files.md (Hennadii Stepanov)
0a85e5a p2p: Try to connect to anchors once (Hennadii Stepanov)
5543c7a p2p: Fix off-by-one error in fetching address loop (Hennadii Stepanov)
4170b46 p2p: Integrate DumpAnchors() and ReadAnchors() into CConnman (Hennadii Stepanov)
bad16af p2p: Add CConnman::GetCurrentBlockRelayOnlyConns() (Hennadii Stepanov)
c29272a p2p: Add ReadAnchors() (Hennadii Stepanov)
567008d p2p: Add DumpAnchors() (Hennadii Stepanov)

Pull request description:

  This is an implementation of bitcoin#17326:
  - all (currently 2) outbound block-relay-only connections (bitcoin#15759) are dumped to `anchors.dat` file
  - on restart a node tries to connect to the addresses from `anchors.dat`

  This PR prevents a type of eclipse attack when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers.

ACKs for top commit:
  jnewbery:
    code review ACK a490d07
  laanwj:
    Code review ACK a490d07

Tree-SHA512: 0f5098a3882f2814be1aa21de308cd09e6654f4e7054b79f3cfeaf26bc02b814ca271497ed00018d199ee596a8cb9b126acee8b666a29e225b08eb2a49b02ddd
@maflcko
Copy link
Member

maflcko commented Jan 8, 2021

@hebasto
Copy link
Member Author

hebasto commented Jan 8, 2021

fanquake added a commit that referenced this pull request May 24, 2021
fe3d17d net: ignore block-relay-only peers when skipping DNS seed (Anthony Towns)

Pull request description:

  Since #17428 bitcoind will attempt to reconnect to two block-relay-only anchors before doing any other outbound connections. When determining whether to use DNS seeds, it will currently see these two peers and decide "we're connected to the p2p network, so no need to lookup DNS" -- but block-relay-only peers don't do address relay, so if your address book is full of invalid addresses (apart from your anchors) this behaviour will prevent you from recovering from that situation.

  This patch changes it so that it only skips use of DNS seeds when there are two full-outbound peers, not just block-relay-only peers.

ACKs for top commit:
  Sjors:
    utACK fe3d17d
  amitiuttarwar:
    ACK fe3d17d, this impacts the very common case where we stop/start a node, persisting anchors & have a non-empty addrman (although, to be clear, wouldn't be particularly problematic in the common cases where the addrman has valid addresses)
  mzumsande:
    ACK fe3d17d
  jonatack:
    ACK fe3d17d
  prayank23:
    tACK fe3d17d

Tree-SHA512: 9814b0d84321d7f45b5013eb40c420a0dd93bf9430f5ef12dce50d1912a18d5de2070d890a8c6fe737a3329b31059b823bc660b432d5ba21f02881dc1d951e94
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 24, 2021
Summary:
> This is an implementation of #17326:
>
>   -     all (currently 2) outbound block-relay-only connections are dumped to anchors.dat file
>   -     on restart a node tries to connect to the addresses from anchors.dat

This PR prevents a type of eclipse attack when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers.

This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [1/7]
bitcoin/bitcoin@567008d

Test Plan: `ninja`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10507
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 24, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [2/7]
bitcoin/bitcoin@c29272a

Depends on D10507

Test Plan: `ninja`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10508
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 24, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [3/7]
bitcoin/bitcoin@bad16af

Depends on D10508

Test Plan: `ninja`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10509
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 24, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [4/7]
bitcoin/bitcoin@4170b46

Depends on D10509

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Subscribers: majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10510
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 24, 2021
Summary:
This is a move-only commit.

This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [5/7]
bitcoin/bitcoin@5543c7a

Depends on D10510

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10511
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 24, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [6/7]
bitcoin/bitcoin@0a85e5a

Depends on D10511

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10512
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 24, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [7/7]
bitcoin/bitcoin@a490d07

Depends on D10512

Test Plan: proofreading

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10513
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.