-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There are cases where blocks get pushed to the chain::database out-of-order #247
Comments
I think this has to be handled at the p2p level. The p2p code relies on the backend to tell it whether a block is valid, so it can kick peers that send invalid blocks. Of course validation is done by the backend, so the p2p code needs to ask the backend "Is this block valid?" As we discovered earlier this week when designing the solution to #237, witnesses' block signing keys might suddenly change, so you cannot validate the witness signature on any descendant of the head block beyond its immediate child. Handling this on the fork DB level (alternative 1 above) would mean the backend accepts blocks it cannot validate, which breaks the p2p logic -- a malicious peer would be able to invent huge numbers of invalid blocks with bogus previous block hashes that will never connect, and you'd put them all in your fork DB. I think what you need to do instead is, if you receive an unconnected block, you place it into a lookaside buffer maintained by the p2p code, indexed by previous block ID. When a peer has Whenever you connect a new block, the p2p code should check the lookaside buffer for unconnected blocks whose previous block ID matches the new head block ID, and push all such blocks. If a block from the lookaside buffer does not validate, the consequences should propagate back to the peer (i.e. the peer should be kicked for offering a bogus block). So each lookaside entry needs to have the ID of the peer it came from. Indeed it should be indexed on that peer ID, so the entries for a peer can be removed when the peer disconnects. Since it's an in-memory container with multiple indexes, it seems that a Boost You might be able to get away with only removing blocks for disconnected peers by deferring removal until a "cleanup pass" to remove them when the container becomes full (i.e. the number of entries exceeds |
It happened on my witness node in puppies' testnet. p2p.log attached: https://drive.google.com/open?id=0B3xrm70jSHn4NWNFTDBJLTJHZ00 At first it's syncing (p2p.log line 15185):
Then my witness produced 71843 (default.log)
then stopped syncing (p2p.log line 15329):
It recovered after 12 minutes(p2p.log line 41458):
continue syncing (p2p.log line 47464):
trying to produce a block but failed (default.log):
Then got deadly stuck (p2p.log line 47709):
//Edit: link of log file updates. |
I agree somewhat with what theoretical bts has to say, but he goes too far in pushing this off on the network code. We currently have a vulnerability where an attack node could produce fake blocks that do link to existing blocks and those blocks would get pushed into the fork_database. We also know that the active witnesses can only change once per maintenance interval which means that regardless of the content of any individual block, we know the valid set of witnesses for 1000's of blocks into the future 99% of the time (assuming a 1 day maintenance interval). We also know exactly when the next maintenance block will be. We also know with a high degree of probability that the set of witnesses is unlikely to change by much from day to day. We also know that blocks cannot be produced "in the future" Given that information the network layer would be filtering all blocks that are "in the future" and the fork_database can easily restrict unlinked blocks to those signed by the current set of active witnesses. This in turn will limit the potential "attackers" to the current set of active witnesses. We also know that the same witness must have at least half a round between their slots. With this information we can simply limit the unlinked blocks we cache to at most 1 block per-active-witness per block number. With this limitation in place it would require a witness to "double-sign" before it would even be theoretically possible for a node to get "stuck". A double-signing witness is something that we would love to detect so that they could automatically be fired. My ultimate conclusion is that this is something for the chain database / fork database to resolve and not something for the networking code to resolve. The networking code is already complex enough and it is far easier to reason about a set of blocks outside that code. |
With the changes described above we have one last corner case: A new witness is elected after a maintenance block and a node receives their block prior to receiving the maintenance block. If this were to happen then the block would be "rejected" and the peer that provided the block would be disconnected. The network will have marked that it already received and rejected the block and would not fetch it again leaving the user stuck. The solution is:
Under this algorithm the network layer would catch the errors and punish any nodes that accumulate too many errors pushing blocks around the maintenance interval. |
There is another corner case which the the proposed algorithm of only allowing nodes to push blocks from the current set of active witnesses. If there is a fork which branches off prior to a maintenance block and the maintenance block changes the set of active witnesses then the user would be unable to accept legitimate blocks from the new fork. In this case the user could get "stuck". This means that if syncing to a fork that branches prior to a maintenance block that changes the active set of witnesses would become impossible. |
If the scenario describe above were to occur the minority fork witnesses would get "stuck" and be unable to produce blocks after less than 1 hour. If the split was close-enough, then even the majority fork would get stuck after slightly longer than 1 hour if the minority fork did not switch over. We are left with the challenge of only accepting blocks older than a maintenance interval IF they fill a missing slot AND they link to the current chain by hash. With this restriction an attacking node could construct a fake chain that "fills" each of the missing slots from the main chain. Each node only gets to submit a SINGLE history so it would me an attack of MISSED_BLOCKS_IN_MAIN_CHAIN*NUMBER_OF_CONNECTED_ATTACKING_PEERS Where MISSED_BLOCKS_IN_MAIN_CHAIN only includes blocks missed in the last 1024 blocks. This means that there is an upper bound on the amount of memory an attacker could consume of 1024_CONNECTED_ATTACKING_PEERS_MAX_BLOCK_SIZE This is a reasonable number and probably an acceptable attack surface. |
This issue needs to be divided into "two parts".
The fork_database only allows blocks within head-1024 < ALLOWED_BLOCK_NUM < head + 32 and an attacking peer can only present a single chain so the maximum damage an attacker could do would be to produce 1056 bogus blocks even if no other protections were in place. We are already protected by the timestamps which cannot be from the future. We can also be protected by the maximum undo history which says that any block older than the maximum undo history is unreachable. So if we have ~99% witness participation, the undo history will be very small and therefore the attack surface will be even smaller (just a few blocks at most). |
- with this change the fork database now caches blocks that do not link and attempts to link the unlinked blocks after every new block is successfully linked. - with this change the fork database only tracks blocks as far back as the undo history allows. This significantly reduces memory usage when there is high witness participation.
I haven't read the patch yet, but just from the comments above, I have not found anywhere you consider the fact that a witness can change its block signing key. In the discussion for #237 @nathanhourt and I briefly considered the possibility of changing block signing key updates to be deferred (i.e. not take effect until some time after being published), but this breaks revocation which is an important use case. The set of active signing keys is what matters for validation, and the set of active signing keys can change at any time. |
Also, the assumption that "an attacking peer can only present a single chain" may not be sound. That question can be broken into two parts:
My answer to (1) is "I think so, because it would be easier to implement it that way, but I haven't read the code" and my answer to (2) is "Yes." Consider the case where a non-malicious peer ("Alice") is on a minority fork; she will broadcast blocks from that minority fork to Bob. If Alice later switches to the majority fork, then she needs to broadcast blocks from the majority fork to Bob -- and some of those blocks may have the same height as blocks she's already broadcast. |
Relevant commit eeeab17 |
With the new commit, my witness node got stuck again. Same appearance. Never got another block which has same block number as the block my node generated but didn't be included in the major chain. |
In addition, looks like it doesn't sync after restarted without --resync--blockchain (same behavior with --replay-blockchain).
Seems endless. |
If start with --resync-blockchain, the witness doesn't produce blocks. |
My observer node got stuck as well. That behavior is different. Logs:
|
p2p.log and default.log of observer node uploaded.
//Edit: links of log files updated. |
Observer node is stuck at: |
If you're still getting stuck with the new commit, I think this must be a bug in the P2P code that can't be worked around in the blockchain processing code. @emfrias might be able to help. |
I'm pretty sure this has been fixed now. |
https://bitsharestalk.org/index.php/topic,17962.msg229944.html#msg229944
The fork_database::push_block method requires that blocks are submitted in order by the network.
There appears to be some cases where blocks are pushed out-of-order causing syncing to halt. Our options are to:
This may or may not resolve the issues discussed in the bitsharestalk thread.
The text was updated successfully, but these errors were encountered: