-
Notifications
You must be signed in to change notification settings - Fork 36.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: Try to preserve outbound block-relay-only connections during restart #17428
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Don't think this should get in the way of implementing something like this, but have to note here, as it's a significant change of behavior from before: reconnecting to peers "aggressively", was, up to now, the reserve of spy nodes and mass-connectors and such. For normal nodes the chance of making the same outgoing connection twice was pretty low. Historically this has been one of the ways used to detect these. |
It would be really good to limit it so that an honest node won't attempt to connect twice to the same peer through this mechanism for some prudent amount of time... like two minutes. I don't think it would cause any harm to the intent of this mechanism to not bring them up as fast as possible, and it would avoid tripping up detection of aggressive mass connectors. |
Not sure if I’m misreading this code (I haven’t tested it), but it looks like on restart we would only attempt to connect once to each anchor, and then it gets removed from consideration. Does that satisfy the concern about rapid reconnects? I guess if a user rapidly restarted their node, we could see rapid connections to the same peers, but that seems pretty weird. Anyway I just skimmed the discussion in #17326. I think Greg or Matt commented that we shouldn’t do this with all our block-relay-only connections, but since we only have 2 at the moment, perhaps it’s fine for now. And then in the future if we expand to say 8 block-relay peers we might still cap the anchors at 2. Does that reflect the thinking of others as well? |
99d6f6f
to
d405145
Compare
Thank you for your review. Your comment has been addressed. Also, an additional safety check added to "p2p: Integrate ReadAnchors() into CConnman()" commit in the latest push. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK on the idea but see my concerns in #17326 (comment).
IMO before to go forward we need to agree on a) the scope of anchors: full-relay, block-relay, both, b) anchors selection & lifecyle, i.e after how much time do we select them or based on any metrics, do we prune them from anchors.dat
, do we keep a bigger file and select randomly from it
src/net.cpp
Outdated
m_anchors = ReadAnchors(m_anchors_db_path); | ||
if (m_anchors.size() > m_max_outbound_block_relay) { | ||
// Keep our policy safe. | ||
m_anchors.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why clear the whole instead of picking randomly m_max_outbound_block_relay
among the anchors set? If one of our most recent anchors isn't reliable we should be allowed to pick up among older anchors to still of the counter-measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worst case is assumed. If condition m_anchors.size() > m_max_outbound_block_relay
is satisfied, we could suppose that the anchors.dat
file is wrong or corrupted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you're melting 2 different problems, tracking the anchors set and connecting to peers among this one. I don't understand why reaching the m_max_outbound_block_relay
means we have to delete our whole repository of maybe-good anchors peers.
In the scenario you're aiming, if victim restarts and the 2 anchors peers are honest but unreliable, you're going to drop them and pick maybe new outbound peers from attacker-provided ones. You can avoid this if you have a bigger set of potential anchor peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic implemented in this PR is minimal, and I think that makes sense for keeping it simple to reason about.
with this implementation, its possible for an attacker to poison the anchors and execute an eclipse attack, its just increases the cost of doing so.
until we develop more robust logic around ensuring our anchors are not malicious, I don't think it makes sense to give priority to all historic anchors. I think it could sometimes be beneficial but other times degrade security, eg. if you historically were connected to a malicious peer but now your addrman is clear.
src/net.cpp
Outdated
@@ -1805,11 +1807,24 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect) | |||
if (nTries > 100) | |||
break; | |||
|
|||
CAddrInfo addr = addrman.SelectTriedCollision(); | |||
CAddress addr; | |||
if (!m_anchors.empty() && (nOutboundBlockRelay < m_max_outbound_block_relay) && !fFeeler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we check here also nOutboundFullRelay
, and so keep same connection order, i.e full-relay then block-only or is there any advantage to invert order ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order is not inverted. The first target "anchors" is added, and was not present before, i.e., the new order is "anchors" then "full-relay" then "block-relay-only".
If anchors would not the first, they could be connected as other type nodes, and anchoring could not happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But anchors nodes are block-only, so now you have block-only-as-anchors, full-relay, block-only-classics. You can still test addresses returned from addrman
against m_anchors
andon continue in case of equality to avoid the problem you're describing, but don't think that changes something, do we favor first-connected nodes in anyway ?
src/net_processing.cpp
Outdated
@@ -2088,6 +2088,9 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr | |||
pfrom->nVersion.load(), pfrom->nStartingHeight, | |||
pfrom->GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom->addr.ToString()) : ""), | |||
pfrom->m_tx_relay == nullptr ? "block-relay" : "full-relay"); | |||
if (!pfrom->m_tx_relay) { | |||
DumpAnchors(connman->m_anchors_db_path, connman->GetBlockRelayNodeAddresses()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about waiting some time like 24hours or one block announcement before to qualify node as anchor? That would require at least some good behavior from it, that said and may favor too much well-reliable nodes and decay network-wide connectivity..
nit: you may use scheduler for DumpAnchors like we do for DumpAddresses, that would avoid file syscalls while holding cs_main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of moving this onto the scheduler thread, but not a huge deal given this only happens once per blocks-only connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you may use scheduler for DumpAnchors like we do for DumpAddresses, that would avoid file syscalls while holding cs_main
I like the idea of moving this onto the scheduler thread, but not a huge deal given this only happens once per blocks-only connection.
Agree that using CScheduler.scheduleFromNow()
for DumpAnchors
is an improvement. Let's leave it for follow ups.
Added 0.20 milestone as this mitigates a security issue so it would be good to have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d405145 (jamesob/ackr/17428.1.hebasto.p2p_try_to_preserve_outb
)
Code looks fine, but seeing some strange behavior locally.
Built, started bitcoind, and noted the current blocks-only peers:
$ ./src/bitcoin-cli -datadir=/data2/bitcoin getpeerinfo | jq '.[] | select(.relaytxes==false) | .addr'
"185.217.241.142:8333"
"2.203.128.83:8333"
Verified the creation of anchors.dat:
$ du -hs /data2/bitcoin/anchors.dat
4.0K /data2/bitcoin/anchors.dat
Shutdown bitcoind, restarted, checked blocks-only peers:
$ ./src/bitcoin-cli -datadir=/data2/bitcoin getpeerinfo | jq '.[] | select(.relaytxes==false) | .addr'
"2.203.128.83:8333"
"95.216.198.89:8333"
One of the peers has persisted from the first run, but the second is different.
Unexpectedly, my logs don't indicate the anchor connections were ever reused - the
only occurrence of anchor
in debug.log is when flushing to anchors.dat
.
Show platform data
Tested on Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic
Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror --enable-wallet --enable-debug --with-daemon CXX=/usr/bin/clang++-4.0 CC=/usr/bin/clang-4.0 PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --no-create --no-recursion
Compiled with /usr/bin/ccache /usr/bin/clang++-4.0 -std=c++11 -mavx -mavx2 -std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror -O0 -g3 -ftrapv -Wstack-protector -fstack-protector-all -msse4 -msha -msse4.1 -msse4.2 i
Compiler version: clang version 4.0.1-10 (tags/RELEASE_401/final)
src/addrdb.cpp
Outdated
{ | ||
int64_t start = GetTimeMillis(); | ||
SerializeFileDB("anchors", anchors_db_path, anchors); | ||
LogPrintf("Flushed %d anchor outbound peer addresses to anchors.dat in %d ms\n", anchors.size(), GetTimeMillis() - start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use logging/timer.h
if you care to. (example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done in the latest push.
src/net_processing.cpp
Outdated
@@ -2088,6 +2088,9 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr | |||
pfrom->nVersion.load(), pfrom->nStartingHeight, | |||
pfrom->GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom->addr.ToString()) : ""), | |||
pfrom->m_tx_relay == nullptr ? "block-relay" : "full-relay"); | |||
if (!pfrom->m_tx_relay) { | |||
DumpAnchors(connman->m_anchors_db_path, connman->GetBlockRelayNodeAddresses()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of moving this onto the scheduler thread, but not a huge deal given this only happens once per blocks-only connection.
The only idea of this PR is to make a node to try restore outbound block-relay-only connections as they were just before node restart. It is expected that this change in node behavior mitigates restart-based eclipse attacks without significant other tradeoffs.
Anchoring as a new procedure, or anchors as "seemingly safe" nodes are out of the scope of this PR.
With an idea to make a node to try restore outbound block-relay-only connections as they were just before node restart, anchors are the last known outbound block-relay-only connections only.
In this PR anchors are considered as the last known outbound block-relay-only connections, not as "presumable safe" ones. |
d405145
to
f2b4cfd
Compare
Updated d405145 -> f2b4cfd (pr17428.02 -> pr17428.03, compare). Thank you for your reviews. Your comments have been addressed.
Added additional logging when connecting to an anchor fails. |
f2b4cfd
to
37671dd
Compare
Pushed a fixup. Ready for testing and (re)reviewing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 37671dd (jamesob/ackr/17428.2.hebasto.p2p_try_to_preserve_outb
)
Built locally, verified that anchors were connected to on restart using the same test as above.
Show platform data
Tested on Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic
Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror --enable-wallet --enable-debug --with-daemon CXX=/usr/bin/clang++-4.0 CC=/usr/bin/clang-4.0 PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --no-create --no-recursion
Compiled with /usr/bin/ccache /usr/bin/clang++-4.0 -std=c++11 -mavx -mavx2 -std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror -O0 -g3 -ftrapv -Wstack-protector -fstack-protector-all -msse4 -msha -msse4.1 -msse4.2 i
Compiler version: clang version 4.0.1-10 (tags/RELEASE_401/final)
Couldn't there be a risk here, that if one of the anchor peers is malicious and knows a DoS vulnerability, this ensures they can keep repeatedly crashing your node, even if you setup a watchdog to restart it? |
37671dd
to
a2822de
Compare
Rebased after #16702 has been merged. |
|
Concept ACK, reading your comment I do understand that this PR is really-scoped, so don't mind #17326 (comment) and #17428 (comment) but would happily review future work to make anchors better (sorry for the noise) A note explaining the exact attack scenario mitigated near to m_anchors or somewhere else would be cool to avoid future reviewers relying on wrong-scoped security assumptions. Goign to test PR soon. |
@luke-jr oy. uh. forget them if you uncleanly exit? |
a2822de
to
78b2f95
Compare
Rebased after #18023 has been merged. |
This is a move-only commit.
d77ae88
to
a490d07
Compare
Rebased d77ae88 -> a490d07 (pr17428.31 -> pr17428.32) due to the conflict with #20076. |
code review ACK a490d07 |
Code review ACK a490d07 |
…onnections during restart a490d07 doc: Add anchors.dat to files.md (Hennadii Stepanov) 0a85e5a p2p: Try to connect to anchors once (Hennadii Stepanov) 5543c7a p2p: Fix off-by-one error in fetching address loop (Hennadii Stepanov) 4170b46 p2p: Integrate DumpAnchors() and ReadAnchors() into CConnman (Hennadii Stepanov) bad16af p2p: Add CConnman::GetCurrentBlockRelayOnlyConns() (Hennadii Stepanov) c29272a p2p: Add ReadAnchors() (Hennadii Stepanov) 567008d p2p: Add DumpAnchors() (Hennadii Stepanov) Pull request description: This is an implementation of bitcoin#17326: - all (currently 2) outbound block-relay-only connections (bitcoin#15759) are dumped to `anchors.dat` file - on restart a node tries to connect to the addresses from `anchors.dat` This PR prevents a type of eclipse attack when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers. ACKs for top commit: jnewbery: code review ACK a490d07 laanwj: Code review ACK a490d07 Tree-SHA512: 0f5098a3882f2814be1aa21de308cd09e6654f4e7054b79f3cfeaf26bc02b814ca271497ed00018d199ee596a8cb9b126acee8b666a29e225b08eb2a49b02ddd
There is a TODO to write release notes for this: https://github.com/bitcoin-core/bitcoin-devwiki/wiki/0.21.0-Release-Notes-Draft#p2p-and-network-changes |
Done. |
fe3d17d net: ignore block-relay-only peers when skipping DNS seed (Anthony Towns) Pull request description: Since #17428 bitcoind will attempt to reconnect to two block-relay-only anchors before doing any other outbound connections. When determining whether to use DNS seeds, it will currently see these two peers and decide "we're connected to the p2p network, so no need to lookup DNS" -- but block-relay-only peers don't do address relay, so if your address book is full of invalid addresses (apart from your anchors) this behaviour will prevent you from recovering from that situation. This patch changes it so that it only skips use of DNS seeds when there are two full-outbound peers, not just block-relay-only peers. ACKs for top commit: Sjors: utACK fe3d17d amitiuttarwar: ACK fe3d17d, this impacts the very common case where we stop/start a node, persisting anchors & have a non-empty addrman (although, to be clear, wouldn't be particularly problematic in the common cases where the addrman has valid addresses) mzumsande: ACK fe3d17d jonatack: ACK fe3d17d prayank23: tACK fe3d17d Tree-SHA512: 9814b0d84321d7f45b5013eb40c420a0dd93bf9430f5ef12dce50d1912a18d5de2070d890a8c6fe737a3329b31059b823bc660b432d5ba21f02881dc1d951e94
Summary: > This is an implementation of #17326: > > - all (currently 2) outbound block-relay-only connections are dumped to anchors.dat file > - on restart a node tries to connect to the addresses from anchors.dat This PR prevents a type of eclipse attack when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers. This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [1/7] bitcoin/bitcoin@567008d Test Plan: `ninja` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D10507
Summary: This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [2/7] bitcoin/bitcoin@c29272a Depends on D10507 Test Plan: `ninja` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D10508
Summary: This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [3/7] bitcoin/bitcoin@bad16af Depends on D10508 Test Plan: `ninja` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D10509
Summary: This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [4/7] bitcoin/bitcoin@4170b46 Depends on D10509 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Subscribers: majcosta Differential Revision: https://reviews.bitcoinabc.org/D10510
Summary: This is a move-only commit. This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [5/7] bitcoin/bitcoin@5543c7a Depends on D10510 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D10511
Summary: This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [6/7] bitcoin/bitcoin@0a85e5a Depends on D10511 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D10512
Summary: This is a backport of [[bitcoin/bitcoin#17428 | core#17428]] [7/7] bitcoin/bitcoin@a490d07 Depends on D10512 Test Plan: proofreading Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D10513
This is an implementation of #17326:
anchors.dat
fileanchors.dat
This PR prevents a type of eclipse attack when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers.