Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test-before-evict discipline in Addrman, feeler connections. #6355

Closed
wants to merge 1 commit into from

Conversation

@EthanHeilman
Copy link
Contributor

EthanHeilman commented Jun 29, 2015

These changes implement countermeasures 3 (feeler connections) and 4 (test-before-evict) suggested in our paper: "Eclipse Attacks on Bitcoin’s Peer-to-Peer Network".

Design:

The primary change is the creation of a feeler connection thread. Every 2 minutes this feeler thread launches one feeler connection, increasing the default number of max outgoing connections to 9. Feeler connections are very short lived and disconnect upon verifying the tested host is running bitcoind. Feeler connections exist only to test if the remote host to test is online. The feeler thread pulls the addresses to test from two sources:

Source 1. Tried table collisions.
A collision occurs when an address, addr1, is being moved to the tried table from the new table, but maps to a position in the tried table which already contains an address. This change ensures that during a collision, addr1 is not inserted into tried but instead inserted into a buffer. The to-be-evicted address, addr2, is then tested by the feeler thread. If addr2 is found to be online, we remove addr1 from the buffer and addr2 is not evicted, on the other hand if addr2 is found be offline it is replaced by addr1.

Source 2. The new table.
If the feeler thread has no tried table collisions to be tested, it selects an address from the new table. It does this to grow the number of fresh (recently online) addresses in the tried table.

Advantages:

  • In our paper we sample several peer lists. We found that a large percentage of addresses in tried tables are stale IP addresses (the lowest was 72 percent stale, the highest was 95 percent stale), which increases the risk of eclipse attacks. This change remedies this by ensuring that the tried table grows quickly and contains many recently online addresses. Countermeasure 4 (feeler connections) strengthens countermeasure 3 (test-before-evict).
  • Another small side advantage is that, as no more than ten addresses can be in the test buffer at once, and addresses are only cleared one at a time from the test buffer, an attacker is forced to wait at least two minutes to insert a new address into tried after filling up the test buffer. This rate limits an attacker attempting to launch an eclipse attack.

See our paper for a full analysis of the benefits of these countermeasures.

Risk mitigation:

  • To prevent this functionality from being used as a DoS vector, we limit the number of addresses which are to be tested to ten. If we have more than ten addresses to test, we drop new addresses being added to tried if they would evict an address. Since the feeler thread only creates one new connection every 2 minutes the additional network overhead is limited.
  • An address in tried gains immunity from tests for 4 hours after it has been tested or successfully connected to.
  • To avoid issues of synchronization, the feeler thread sleeps for between 0 and 3 seconds prior to making a connection.

Tests:

We ran an instance with our changes for two days against our in house developed attack code to induce many collisions in the tried table. Under these conditions we used valgrind to look for memory leaks.
See output of the test here:

e0@ubuntu:~/bitcoin-fork/src$ valgrind ./bitcoind -debug -printtoconsole -testnet > output.txt
==1918== Memcheck, a memory error detector
==1918== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1918== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==1918== Command: ./bitcoind -debug -printtoconsole -testnet
==1918==
^C==1918==
==1918== HEAP SUMMARY:
==1918==     in use at exit: 3,864 bytes in 16 blocks
==1918==   total heap usage: 224,281,816 allocs, 224,281,800 frees, 32,337,629,302 bytes allocated
==1918==
==1918== LEAK SUMMARY:
==1918==    definitely lost: 0 bytes in 0 blocks
==1918==    indirectly lost: 0 bytes in 0 blocks
==1918==      possibly lost: 304 bytes in 1 blocks
==1918==    still reachable: 3,560 bytes in 15 blocks
==1918==         suppressed: 0 bytes in 0 blocks
==1918== Rerun with --leak-check=full to see details of leaked memory
==1918==
==1918== For counts of detected and suppressed errors, rerun with: -v
==1918== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

As we have made some cosmetic code changes since this test was run we are rerunning this test and will update this pull request when it is finished. We are launching several test nodes.

If you want to test my code, and you don't want to simulate a large number of incoming connections, you need to generate a bunch of collisions that would trigger feeler connections. The way to do this is to reduce the the number of buckets in tried to 1. (That way, every address inserted into tried will have a high probability (at least p=1/64) to be a collision.)

addrman_tests.cpp contains unit tests for the code I added to addrman. As a small side note, because addrman bucket placement depends on a randomly chosen seed (nKey) I needed to
create a method to set this seed to a known value so that the unit tests would be deterministic. This method is only available during the addrman unittests.

Changes addrman to use the test-before-evict discipline in which an
address is to be evicted from the tried table is first tested and if
it is still online it is not evicted.

Creates a new thread which tests if addresses are online or offline by
briefly connecting to them. These short lived connections are referred
to as feeler connections. Feeler connections have two purposes:
First, to increase the number of addresses in tried, by selecting and
connecting to addresses in new. Second, to implement the testing stage
of the test-before-evict discipline.

Adds tests to provide test coverage for these changes.

This change was suggested as Countermeasure 3 and 4 in
Eclipse Attacks on Bitcoin’s Peer-to-Peer Network, Ethan Heilman,
Alison Kendler, Aviv Zohar, Sharon Goldberg. ePrint Archive Report
2015/263. March 2015.
@laanwj laanwj added the P2P label Jun 30, 2015
@EthanHeilman

This comment has been minimized.

Copy link
Contributor Author

EthanHeilman commented Jul 13, 2015

We ran a node with these changes between July 9th to July 13th. We attempted to connect to our node 1375078 times (not all connections succeeded due to connection exhaustion) from 16384 district IP addresses (256 IPs per group, 64 groups using the unallocated prefix 249\8). Both the output file and valgrind are nominal.

Valgrind output:

e0@ubuntu:~/bitcoin/src$ valgrind ./bitcoind -testnet -debug -printtoconsole > output.txt
==53495== Memcheck, a memory error detector
==53495== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==53495== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==53495== Command: ./bitcoind -testnet -debug -printtoconsole
==53495== 
==53495== 
==53495== HEAP SUMMARY:
==53495==     in use at exit: 5,976 bytes in 18 blocks
==53495==   total heap usage: 1,094,555,059 allocs, 1,094,555,041 frees, 202,228,428,605 bytes allocated
==53495== 
==53495== LEAK SUMMARY:
==53495==    definitely lost: 0 bytes in 0 blocks
==53495==    indirectly lost: 0 bytes in 0 blocks
==53495==      possibly lost: 304 bytes in 1 blocks
==53495==    still reachable: 5,672 bytes in 17 blocks
==53495==         suppressed: 0 bytes in 0 blocks
==53495== Rerun with --leak-check=full to see details of leaked memory
==53495== 
==53495== For counts of detected and suppressed errors, rerun with: -v
==53495== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Abstracts from the output:

We are connecting from 249\8 and running on testnet. We can see here bitcoin swapping an offline address for a recently connected address.

trying connection 243.7.23.180:18333 lastseen=371333.3hrs
received: inv (37 bytes) peer=378634
got inv: tx 4fa9bb2a4f2022cca3652f1a376a28080e7dcebfc28256a13ce3f117c1deb752  have peer=378634
sending: inv (37 bytes) peer=453392
sending: inv (37 bytes) peer=515854
sending: inv (37 bytes) peer=555025
--
sending: inv (73 bytes) peer=1290674
sending: inv (37 bytes) peer=1288871
sending: inv (37 bytes) peer=1288619
received: ping (8 bytes) peer=515854
sending: pong (8 bytes) peer=515854
connection to 243.7.23.180:18333 timeout
sending: inv (37 bytes) peer=453392
sending: inv (73 bytes) peer=1289172
sending: inv (37 bytes) peer=1289071
received: ping (8 bytes) peer=566065
sending: pong (8 bytes) peer=566065
--
sending: inv (37 bytes) peer=1290273
received: ping (8 bytes) peer=515854
sending: pong (8 bytes) peer=515854
sending: inv (37 bytes) peer=1288971
sending: inv (37 bytes) peer=1289923
Swapping 249.48.23.239:18333 for 243.7.23.180:18333 in tried table
Moving 249.48.23.239:18333 to tried
@jgarzik

This comment has been minimized.

Copy link
Contributor

jgarzik commented Sep 16, 2015

concept ACK

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented May 5, 2016

What is the status here?
Needs rebase, and more review/testing.

@EthanHeilman

This comment has been minimized.

Copy link
Contributor Author

EthanHeilman commented May 5, 2016

@laanwj I've been auditing/fuzzing the existing network code and slowing adding unittests for net/addrman to establish a behavior baseline and make unittesting of this feature easier. See #6720, #7212, #7291, #7696

I'm currently planning on breaking this commit into two commits (feeler connections and test-before-evict) and testing them independently. My current roadmap is:

  1. Late June: push out an cleaned up version of feeler connections, spin up some test nodes.
  2. Early July: post results from test nodes.
  3. July-August: depending on how things go, I'd like to push a cleaned up version of test-before-evict.

I have found some other minor bugs in the networking code, I'm trying to figure out if I should prioritize them over this.

@@ -55,6 +58,7 @@ using namespace std;

namespace {
const int MAX_OUTBOUND_CONNECTIONS = 8;
const int MAX_FEELER_CONNECTIONS = 1;

This comment has been minimized.

Copy link
@rebroad

rebroad Aug 6, 2016

Contributor

This number seems to only effectively increase MAX_OUTBOUND_CONNECTIONS as it is not used other than adding a number to this constant.

@sipa

This comment has been minimized.

Copy link
Member

sipa commented Aug 25, 2016

Rebase now that #8282 is merged?

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Sep 9, 2016

Is this still relevant after #8282?

@sipa

This comment has been minimized.

Copy link
Member

sipa commented Sep 9, 2016

Yes, #8282 only implements feeler connections, not test-before-evict.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

TheBlueMatt commented Oct 28, 2016

Needs rebase since Aug 25 :(

@EthanHeilman

This comment has been minimized.

Copy link
Contributor Author

EthanHeilman commented Oct 28, 2016

@TheBlueMatt I have a new version of this change in which test-before-evict is broken out separately from feelers, since feelers is already in 13.1. I haven't created a pull request for it because I'm hunting for typos and mistakes. It should be out either today or Monday.

EthanHeilman@e7157a0

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Nov 2, 2016

Closing in favor of #9037

@laanwj laanwj closed this Nov 2, 2016
laanwj added a commit that referenced this pull request Mar 6, 2018
e68172e Add test-before-evict discipline to addrman (Ethan Heilman)

Pull request description:

  This change implement countermeasures 3 (test-before-evict) suggested in our paper: ["Eclipse Attacks on Bitcoin’s Peer-to-Peer Network"](http://cs-people.bu.edu/heilman/eclipse/).
  # Design:

  A collision occurs when an address, addr1, is being moved to the tried table from the new table, but maps to a position in the tried table which already contains an address (addr2). The current behavior is that addr1 would evict addr2 from the tried table.

  This change ensures that during a collision, addr1 is not inserted into tried but instead inserted into a buffer (setTriedCollisions). The to-be-evicted address, addr2, is then tested by [a feeler connection](#8282). If addr2 is found to be online, we remove addr1 from the buffer and addr2 is not evicted, on the other hand if addr2 is found be offline it is replaced by addr1.

  An additional small advantage of this change is that, as no more than ten addresses can be in the test buffer at once, and addresses are only cleared one at a time from the test buffer (at 2 minute intervals), thus an attacker is forced to wait at least two minutes to insert a new address into tried after filling up the test buffer. This rate limits an attacker attempting to launch an eclipse attack.
  # Risk mitigation:
  - To prevent this functionality from being used as a DoS vector, we limit the number of addresses which are to be tested to ten. If we have more than ten addresses to test, we drop new addresses being added to tried if they would evict an address. Since the feeler thread only creates one new connection every 2 minutes the additional network overhead is limited.
  - An address in tried gains immunity from tests for 4 hours after it has been tested or successfully connected to.
  # Tests:

  This change includes additional addrman unittests which test this behavior.

  I ran an instance of this change with a much smaller tried table (2 buckets of 64 addresses) so that collisions were much more likely and observed evictions.

  ```
  2016-10-27 07:20:26 Swapping 208.12.64.252:8333 for 68.62.95.247:8333 in tried table
  2016-10-27 07:20:26 Moving 208.12.64.252:8333 to tried
  ```

  I documented tests we ran against similar earlier versions of this change in #6355.
  # Security Benefit

  This is was originally posted in PR #8282 see [this comment for full details](#8282 (comment)).

  To determine the security benefit of these larger numbers of IPs in the tried table I modeled the attack presented in [Eclipse Attacks on Bitcoin’s Peer-to-Peer Network](https://eprint.iacr.org/2015/263).

  ![attackergraph40000-10-1000short-line](https://cloud.githubusercontent.com/assets/274814/17366828/372af458-595b-11e6-81e5-2c9f97282305.png)

  **Default node:** 595 attacker IPs for ~50% attack success.
  **Default node + test-before-evict:** 620 attacker IPs for ~50% attack success.
  **Feeler node:** 5540 attacker IPs for ~50% attack success.
  **Feeler node + test-before-evict:** 8600 attacker IPs for ~50% attack success.

  The node running feeler connections has 10 times as many online IP addresses in its tried table making an attack 10 times harder (i.e. requiring the an attacker require 10 times as many IP addresses in different /16s). Adding test-before-evict increases resistance of the node by an additional 3000 attacker IP addresses.

  Below I graph the attack over even greater attacker resources (i.e. more attacker controled IP addresses). Note that test-before-evict maintains some security far longer even against an attacker with 50,000 IPs. If this node had a larger tried table test-before-evict could greatly boost a nodes resistance to eclipse attacks.

  ![attacker graph long view](https://cloud.githubusercontent.com/assets/274814/17367108/96f46d64-595c-11e6-91cd-edba160598e7.png)

Tree-SHA512: fdad4d26aadeaad9bcdc71929b3eb4e1f855b3ee3541fbfbe25dca8d7d0a1667815402db0cb4319db6bd3fcd32d67b5bbc0e12045c4252d62d6239b7d77c4395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.