Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

randomize GETDATA(tx) request order and introduce bias toward outbound #14897

Merged
merged 1 commit into from Feb 8, 2019

Conversation

@naumenkogs
Copy link
Contributor

commented Dec 8, 2018

This code makes executing two particular (and potentially other) attacks harder.

InvBlock

This behavior was described well here (page 11).

Per current implementation, if node A receives INV (tx) from node B, node A sends GETDATA to B and waits for TX message back.

Node A is likely to receive more INVs (regarding the same tx) from other peers. But node A would not send another GETDATA unless it does not hear TX back from node B for next 2 minutes (to save bandwidth)

Thus, if B is a malicious node, it can prevent node A from getting the transaction (even if all A’s peers have it) for 2 minutes.

This behavior seems to be an inherent limitation of the current P2P relay protocol, and I don’t see how it can be fundamentally changed (I can see workarounds which involve rewriting a lot of P2P code though).

What does this PR fix?

The attacks I’m looking at involve preventing A from learning the transaction for 2*N minutes. To do that, an attacker has to spin up N nodes and send N INVs simultaneously to node A (then InvBlocks will be queued with an interval of 2 minutes according to current implementation)

More precisely, 2 scenarios I’m looking at are:

  1. An attacker censors a particular transaction. By performing InvBlock from different nodes, an attacker can execute a network-wide censorship of a particular transaction (or all transactions). The earlier an attacker founds the transaction he wants to censor, the easier it is to perform an attack. As it was pointed out by @gwillen, this is even more dangerous in the case of lightning, where transactions are known in advance.
  2. Topology inference described in papers 1, 2 involve network-wide InvBlock. This fix would not mitigate this type of inference, but I believe it will make it more expensive to perform (an attacker would have to create more transactions and perform more rounds to learn the topology, the second paper itself notes that InvBlock isolation is important for the attack).

How does it work

This PR introduces bias toward outbound connections (they have higher priority when a node chooses from whom it should request a transaction) and randomizes the order.
As per @gmaxwell suggestion, GETDATA requests queue is created after processing all incoming messages from all nodes.

After this fix, if the incoming messages were [I1, I2, I3, O1, O2, O3, O4], the queue for GETDATA may look like [O2, O1, O3, O4, I1, I3, I2, ….].

If {I1, I2, I3} were significantly earlier (but the difference is less than TX_TIMEOUT=60 s) than others, the queue for GETDATA may look like [I2, O2, O1, O3, O4, I1, I3, ….].

Other comments:

  1. This mitigation works better if the connectivity is higher (especially outbound, because it would be less likely that 2 GETDATAs for inbound malicious nodes queued together)

@fanquake fanquake added the P2P label Dec 8, 2018

@gmaxwell

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2018

I'm strongly in support of doing something about this.

But instead, why not do something like: for each tx keep a next-time and ordered data structure, sorted on ( outbound, nonce) where nonce is just selected at random when the inv is processed.

Then after each run of the message handling loop, after processing messages from all peers. go through the txn and for every tx with a next-time in the past, pop the best element from the list and query its peer, then update the next time.

This way there is even a decent chance that you do not fetch the txn from the first to offer it to you, you fetch in random order, and fetch from outpeers first if there are any outstanding offer.

@naumenkogs

This comment has been minimized.

Copy link
Contributor Author

commented Dec 8, 2018

@gmaxwell My implementation was supposed to have minimum intrusion in other parts of the code, but now when you suggested an alternative it seems not much code too. I'll update my PR soon.

@DrahtBot

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #15141 (Rewrite DoS interface between validation and net_processing by sdaftuar)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@gmaxwell

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2018

Awesome, if you find that implementing something else actually does turn out to be complicated, please feel free to loop back. I asked in part because it didn't seem like it would be much more complicated, but the devil is in the details. I think what you propose here sounds better than nothing, but I think we can probably bypass the half-measure. :)

@naumenkogs naumenkogs changed the title bias towards outbound connections for askFor(tx) randomize GETDATA(tx) request order and introduce bias toward outbound Dec 9, 2018

@naumenkogs naumenkogs force-pushed the naumenkogs:master branch 2 times, most recently Dec 9, 2018

Show resolved Hide resolved src/net.cpp Outdated
@jonasschnelli

This comment has been minimized.

Copy link
Member

commented Dec 9, 2018

Nice work!
Concept ACK.

Show resolved Hide resolved src/net.h Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.cpp Outdated
@sdaftuar
Copy link
Member

left a comment

Concept ACK, this would be a nice improvement to make.

However I think it would be better if we could move more, if not all, of this logic to net_processing.cpp. I left some comments below explaining my thinking.

Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net_processing.cpp Outdated
Show resolved Hide resolved src/net.h Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.h Outdated
Show resolved Hide resolved src/net.h Outdated
Show resolved Hide resolved src/net.cpp Outdated
Show resolved Hide resolved src/net.cpp

@naumenkogs naumenkogs force-pushed the naumenkogs:master branch 2 times, most recently Jan 14, 2019

@naumenkogs

This comment has been minimized.

Copy link
Contributor Author

commented Jan 14, 2019

I've updated the implementation in a DoS-resistant way.

Well, the new code is heavily based on the latest @sdaftuar work (mostly his implementation with my review and couple minor edits), but we decided to use this PR.

@DrahtBot DrahtBot removed the Needs rebase label Jan 14, 2019

@laanwj laanwj added this to Blockers in High-priority for review Jan 17, 2019

@ryanofsky

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

I think I might be missing something about when the code in SendMessages() runs. But how is this change randomizing which peer a transaction gets requested from?

It seems like the first request is scheduled based on inv time, and goes to the peer that announced first (with 1 second penalty for an incoming peer):

https://github.com/bitcoin/bitcoin/blob/2f9abf746b3555ca8b47e8d0122bf4252396a7ff/src/net_processing.cpp#L669-L674

And then subsequent requests all get scheduled at the same time based on the first request (again with penalty for incoming peers):

https://github.com/bitcoin/bitcoin/blob/2f9abf746b3555ca8b47e8d0122bf4252396a7ff/src/net_processing.cpp#L3865-L3867

So the peer chosen will depend on the order of SendMessages() calls, which appears to be fixed, not random.

Very possible I am missing something here, though.

As for the PR, I need to look more closely at the code, but in general it looks very good and the comments are great. It would be good to add a commit description and also credit Suhas if it's true this is "heavily based" on his work (#14897 (comment)). I think if you add a

Co-authored-by: Suhas Daftuar <sdaftuar@gmail.com>

line to the end of your commit message it will list him as a coauthor in github.

@naumenkogs

This comment has been minimized.

Copy link
Contributor Author

commented Jan 18, 2019

@ryanofsky

It would not randomize for outbound peers, because the request would be sent right after processMessage for a given peer. We might add a delay to randomize this one too, but under current threat model (sybil attacks through outbound are very unlikely) I don't see a real reason.

And yes, I should modify the github comment to credit Suhas, thanks for the idea.

@ryanofsky

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

It would not randomize for outbound peers, because the request would be sent right after processMessage for a given peer

This is only true for the first request, right? After the first request, assuming it timed out, all the outbound peers would be scheduled to make the next request at the same time (last_request_time + MAX_GETDATA_TX_DELAY), and since SendMessages() seems to be called on peers in a stable order, which peer requested first would be fixed. This seems to contradict the PR description which says "This PR introduces bias toward outbound connections (they have higher priority when a node chooses from whom it should request a transaction) and randomizes the order." Maybe you just need to drop the part about randomizing the order.

I think I also need to look into how SendMessages() is called. It seems like if it were called in a predictable way, like once a second on all peers in a fixed order, then this code would just keep resending the transaction request to the first outbound peer after the initial request timed out, and never try any different peers, because every peer would schedule their next request at the exact same time, and the same peer would always be called first to actually send the request.

@naumenkogs

This comment has been minimized.

Copy link
Contributor Author

commented Jan 18, 2019

@ryanofsky I would rather randomize it for outbound as well, since the point you're making seems correct and it's no good.

It is called in a fixed order. There is a chance of randomization due to non-uniform processing time per request, but this processing time is way less than 1 second, so the chance is very low.

After talking to you, I think we should add PoissonNext() to both outbound and inbound delays (a new delay every time), because this would only make it stronger and won't bring any bad implications. What do you think?

@ryanofsky

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

Adding PoissonNext() seems like a reasonable way to randomize & rotate among peers, but I am far from an expert on this code and am just trying to understand it myself, so hopefully others will weigh in.

@jamesob
Copy link
Member

left a comment

Concept ACK. The documentation is awesome and in general GETDATA(tx) logic is much easier to reason about after this changeset.

Show resolved Hide resolved src/net_processing.cpp
Show resolved Hide resolved src/net_processing.cpp
Show resolved Hide resolved src/net_processing.cpp Outdated
@jamesob

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

utACK modulo the unnecessary ProcessMessage() param.

@naumenkogs naumenkogs force-pushed the naumenkogs:master branch from 1442d29 to acf0532 Feb 4, 2019

@jamesob

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

utACK acf0532

It'd be nice if someone packaged up @sdaftuar's functional test for a follow-up PR.

@sipa
Copy link
Member

left a comment

utACK, but looks like it needs rebasing on the logger changes in #15266.

Show resolved Hide resolved src/net_processing.cpp
Show resolved Hide resolved src/net_processing.cpp Outdated
Show resolved Hide resolved src/net_processing.cpp Outdated
Show resolved Hide resolved src/net_processing.cpp Outdated
Show resolved Hide resolved src/net_processing.cpp Outdated
Show resolved Hide resolved src/net_processing.cpp Outdated
@MarcoFalke
Copy link
Member

left a comment

Concept ACK acf0532

I really like the refactoring that renames the variable names and adds documentation. That makes it a lot easier to understand the code (or at least less confusing).

I will review and test later this week, but initial tests suggest that a remote can fill up the tx state without bounds. (I stopped my node when m_tx_process_time.size()==1359900)

Show resolved Hide resolved src/net_processing.cpp Outdated
Show resolved Hide resolved src/net_processing.cpp Outdated
Show resolved Hide resolved src/net_processing.cpp
Show resolved Hide resolved src/net_processing.cpp Outdated

@naumenkogs naumenkogs force-pushed the naumenkogs:master branch from acf0532 to 4282e14 Feb 5, 2019

@laanwj

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

@naumenkogs Unless you'd like to be credited as User in the release notes, you'll want to change the author name on your commit:

Author: User <...@gmail.com>
@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

utACK 4282e14

@sipa

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

utACK 4282e14

One thing for future consideration: it seems to me that filterInventoryKnown and m_tx_announced have a partial overlapping function, which makes me wonder if they can be merged, or at least have inv announcement test both.

You may want to take @laanwj's comment into account about crediting of your commit.

Change in transaction pull scheduling to prevent InvBlock-related att…
…acks

Co-authored-by: Suhas Daftuar <sdaftuar@gmail.com>

@naumenkogs naumenkogs force-pushed the naumenkogs:master branch from 4282e14 to 1cff3d6 Feb 7, 2019

@naumenkogs

This comment has been minimized.

Copy link
Contributor Author

commented Feb 7, 2019

My last change is just to update author's name.

@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

no-code-changes ACK 1cff3d6

@sipa

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

no-code-changes ACK 1cff3d6

@sipa sipa merged commit 1cff3d6 into bitcoin:master Feb 8, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

sipa added a commit that referenced this pull request Feb 8, 2019

Merge #14897: randomize GETDATA(tx) request order and introduce bias …
…toward outbound

1cff3d6 Change in transaction pull scheduling to prevent InvBlock-related attacks (Gleb Naumenko)

Pull request description:

  This code makes executing two particular (and potentially other) attacks harder.

  ### InvBlock
  This behavior was described well [here](https://www.cs.umd.edu/projects/coinscope/coinscope.pdf) (page 11).

  Per current implementation, if node A receives _INV_ (tx) from node B, node A sends _GETDATA_ to B and waits for _TX_ message back.

  Node A is likely to receive more _INVs_ (regarding the same tx) from other peers. But node A would not send another _GETDATA_ unless it does not hear _TX_ back from node B for next 2 minutes (to save bandwidth)

  Thus, if B is a malicious node, it can prevent node A from getting the transaction (even if all A’s peers have it) for 2 minutes.

  This behavior seems to be an inherent limitation of the current P2P relay protocol, and I don’t see how it can be fundamentally changed (I can see workarounds which involve rewriting a lot of P2P code though).

  ### What does this PR fix?

  The attacks I’m looking at involve preventing A from learning the transaction for 2*N minutes. To do that, an attacker has to spin up N nodes and send N _INVs_ simultaneously to node A (then InvBlocks will be queued with an interval of 2 minutes according to current implementation)

  More precisely, 2 scenarios I’m looking at are:
  1. An attacker censors a particular transaction. By performing InvBlock from different nodes, an attacker can execute a network-wide censorship of a particular transaction (or all transactions). The earlier an attacker founds the transaction he wants to censor, the easier it is to perform an attack. As it was pointed out by @gwillen, this is even more dangerous in the case of lightning, where transactions are known in advance.
  2. Topology inference described in papers [1](https://www.cs.umd.edu/projects/coinscope/coinscope.pdf), [2](https://arxiv.org/pdf/1812.00942.pdf) involve network-wide InvBlock. This fix would not mitigate this type of inference, but I believe it will make it more expensive to perform (an attacker would have to create more transactions and perform more rounds to learn the topology, the second paper itself notes that InvBlock isolation is important for the attack).

  ### How does it work
  This PR introduces bias toward outbound connections (they have higher priority when a node chooses from whom it should request a transaction) and randomizes the order.
  As per @gmaxwell suggestion, GETDATA requests queue is created after processing all incoming messages from all nodes.

  After this fix, if the incoming messages were [I1, I2, I3, O1, O2, O3, O4], the queue for _GETDATA_ may look like [O2, O1, O3, O4, I1, I3, I2, ….].

  If {I1, I2, I3} were significantly earlier (but the difference is less than TX_TIMEOUT=60 s) than others, the queue for _GETDATA_ may look like [I2, O2, O1, O3, O4, I1, I3, ….].

  ### Other comments:
  1. This mitigation works better if the connectivity is higher (especially outbound, because it would be less likely that 2 _GETDATAs_ for inbound malicious nodes queued together)

Tree-SHA512: 2ad1e80c3c7e16ff0f2d1160aa7d9a5eaae88baa88467f156b987fe2a387f767a41e11507d7f99ea02ab75e89ab93b6a278d138cb1054f1aaa2df336e9b2ca6a

@fanquake fanquake removed this from Blockers in High-priority for review Feb 8, 2019

peer_download_state.m_tx_announced.insert(txid);

int64_t process_time;
int64_t last_request_time = GetTxRequestTime(txid);

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Feb 11, 2019

Member

style-nit: Could be const

// If this transaction was last requested more than 1 minute ago,
// then request.
int64_t last_request_time = GetTxRequestTime(inv.hash);
if (last_request_time <= nNow - GETDATA_TX_INTERVAL) {

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Feb 11, 2019

Member

style-nit: Could inline GetTxRequestTime here to avoid the named symbol last_request_time.

// this announcement
return;
}
peer_download_state.m_tx_announced.insert(txid);

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Feb 11, 2019

Member

style-nit: Could make this if (!peer_download_state.m_tx_announced.insert(txid).second) return; and remove the count() above to avoid two lookups. (This will be in line with the previous code in AskFor)

HashUnlimited added a commit to HashUnlimited/chaincoin that referenced this pull request Feb 20, 2019

[doc] clarify getdata limit after bitcoin#14897
GETDATA is limited to blocks and transactions now and can't be used for other non-block data

MarcoFalke added a commit that referenced this pull request Apr 18, 2019

Merge #15839: [0.18] Revert GetData randomization change (#14897)
8602d8b Revert "Change in transaction pull scheduling to prevent InvBlock-related attacks" (Suhas Daftuar)

Pull request description:

  This is for 0.18, not master -- I propose we revert the getdata change for the 0.18 release, rather than try to continue patching it up.  It seems like we've turned up several additional bugs that slipped through initial review (see #15776, #15834), and given the potential severe consequences of these bugs I think it'd make more sense for us to delay releasing this code until 0.19.

  Since the bugfix PRs are getting review, I think we can leave #14897 in master, but we can separately discuss if it should be reverted in master as well if anyone thinks that would be more appropriate.

ACKs for commit 8602d8:

Tree-SHA512: 0389cefc1bc74ac47f87866cf3a4dcfe202740a1baa3db074137e0aa5859672d74a50a34ccdb7cf43b3a3c99ce91e4ba1fb512d04d1d383d4cc184a8ada5543f

MarcoFalke added a commit that referenced this pull request Jun 12, 2019

Merge #15834: Fix transaction relay bugs introduced in #14897 and exp…
…ire transactions from peer in-flight map

308b767 Fix bug around transaction requests (Suhas Daftuar)
f635a3b Expire old entries from the in-flight tx map (Suhas Daftuar)
e32e084 Remove NOTFOUND transactions from in-flight data structures (Suhas Daftuar)
23163b7 Add an explicit memory bound to m_tx_process_time (Suhas Daftuar)
218697b Improve NOTFOUND comment (Suhas Daftuar)

Pull request description:

  #14897 introduced several bugs that could lead to a node no longer requesting transactions from one or more of its peers.  Credit to ajtowns for originally reporting many of these bugs along with an originally proposed fix in #15776.

  This PR does a few things:

  - Fix a bug in NOTFOUND processing, where the in-flight map for a peer was keeping transactions it shouldn't

  - Eliminate the possibility of a memory attack on the CNodeState `m_tx_process_time` data structure by explicitly bounding its size

  - Remove entries from a peer's in-flight map after 10 minutes, so that we should always eventually resume transaction requests even if there are other bugs like the NOTFOUND one

  - Fix a bug relating to the coordination of request times when multiple peers announce the same transaction

  The expiry mechanism added here is something we'll likely want to remove in the future, but is belt-and-suspenders for now to try to ensure we don't have other bugs that could lead to transaction relay failing due to some unforeseen conditions.

ACKs for commit 308b76:
  ajtowns:
    utACK 308b767
  morcos:
    light ACK 308b767
  laanwj:
    Code review ACK 308b767
  jonatack:
    Light ACK 308b767.
  jamesob:
    ACK 308b767
  MarcoFalke:
    ACK 308b767 (Tested two of the three bugs this pull fixes, see comment above)
  jamesob:
    Concept ACK 308b767
  MarcoFalke:
    ACK 308b767

Tree-SHA512: 8865dca5294447859d95655e8699085643db60c22f0719e76e961651a1398251bc932494b68932e33f68d4f6084579ab3bed7d0e7dd4ac6c362590eaf9414eda

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 12, 2019

sidhujag
Merge bitcoin#15834: Fix transaction relay bugs introduced in bitcoin…
…#14897 and exp…  …ire transactions from peer in-flight map

308b767 Fix bug around transaction requests (Suhas Daftuar)
f635a3b Expire old entries from the in-flight tx map (Suhas Daftuar)
e32e084 Remove NOTFOUND transactions from in-flight data structures (Suhas Daftuar)
23163b7 Add an explicit memory bound to m_tx_process_time (Suhas Daftuar)
218697b Improve NOTFOUND comment (Suhas Daftuar)

Pull request description:

  bitcoin#14897 introduced several bugs that could lead to a node no longer requesting transactions from one or more of its peers.  Credit to ajtowns for originally reporting many of these bugs along with an originally proposed fix in bitcoin#15776.

  This PR does a few things:

  - Fix a bug in NOTFOUND processing, where the in-flight map for a peer was keeping transactions it shouldn't

  - Eliminate the possibility of a memory attack on the CNodeState `m_tx_process_time` data structure by explicitly bounding its size

  - Remove entries from a peer's in-flight map after 10 minutes, so that we should always eventually resume transaction requests even if there are other bugs like the NOTFOUND one

  - Fix a bug relating to the coordination of request times when multiple peers announce the same transaction

  The expiry mechanism added here is something we'll likely want to remove in the future, but is belt-and-suspenders for now to try to ensure we don't have other bugs that could lead to transaction relay failing due to some unforeseen conditions.

ACKs for commit 308b76:
  ajtowns:
    utACK 308b767
  morcos:
    light ACK 308b767
  laanwj:
    Code review ACK 308b767
  jonatack:
    Light ACK 308b767.
  jamesob:
    ACK 308b767
  MarcoFalke:
    ACK 308b767 (Tested two of the three bugs this pull fixes, see comment above)
  jamesob:
    Concept ACK 308b767
  MarcoFalke:
    ACK 308b767

Tree-SHA512: 8865dca5294447859d95655e8699085643db60c22f0719e76e961651a1398251bc932494b68932e33f68d4f6084579ab3bed7d0e7dd4ac6c362590eaf9414eda

MarcoFalke added a commit that referenced this pull request Jun 16, 2019

Merge #16196: doc: Add release notes for 14897 & 15834
fa55dd8 doc: Add release notes for 14897 & 15834 (MarcoFalke)

Pull request description:

  #14897 & #15834

ACKs for commit fa55dd:
  fanquake:
    ACK fa55dd8

Tree-SHA512: 301742191f3d0e9383c6fe455d18d1e153168728e75dd29b7d0a0246af1cf024cc8199e82a42d74b3e6f5b556831763e0170ed0cb7b3082c7e0c57b05a5776db
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.