-
Notifications
You must be signed in to change notification settings - Fork 455
Fix fork cause 5 #402
Comments
Nice write up! I like the fact the order of nodes, number of nodes and nodes IP is written by the end user and stay safe information. Monitoring and rebuild would still be needed from time to time but the principal source of forks would be gone. Not sure on the feasibility and robustness but it's definitively a step in the right direction. |
I grabbed a commit ID from 2017-01-24 and assume you're referring to this code: https://github.com/LiskHQ/lisk/blob/b743cf562cbe538302520c72a57fe98f1faf64b9/modules/blocks.js#L1356 which is now in https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/modules/blocks/process.js#L196 I don't see what the issue with this resolution strategy is. For every two blocks A, B with A.id != B.id this defines a winner. The winner is first in the list [A, B] after sorting by the tuple (timestamp, id) where (timestamp, id) cannot be equal for both blocks. |
The issue as always is not all nodes see all versions of a block at a given
height that get broadcast. You can't choose between two if you only ever
receive one.
…On Thu, Mar 29, 2018, 20:39 Simon Warta ***@***.***> wrote:
There has been some code implemented to resolve this issue (found here:
https://github.com/LiskHQ/lisk/blob/development/modules/blocks.js#L1356),
it can also be considered unreliable and a bandaid for the greater issue
that can still occur.
I grabbed a commit ID from 2017-01-24 and assume you're referring to this
code:
https://github.com/LiskHQ/lisk/blob/b743cf562cbe538302520c72a57fe98f1faf64b9/modules/blocks.js#L1356
which is now in
https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/modules/blocks/process.js#L196
I don't see what the issue with this resolution strategy is. For every two
blocks A, B with A.id != B.id this defines a winner. The winner is first in
list when ordered by the tuple (timestamp, id) where (timestamp, id) cannot
be equal for both blocks.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#402 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/APzFsO3zG7giQM83IXxYySlvqZrSKrfqks5tjSpSgaJpZM4Lqi4q>
.
|
Makes sense, thanks! I found this issue where the situation is described beautifully. But why was it closed? Regarding communication between multiple nodes of the same delegate: I don't think it is good to assume that node A will have the chance to tell the next node that it cannot forge right now. Think of networking issues or power outages. If node A crashes, node B may not even receive a socket close. In those cases, a delegate would prefer forging twice instead of not forging. |
The issue with double forging still exists in 1.0.0 on testnet, at least in situations with full blocks, when many transactions are pushed to the network. In situations with low or no traffic, I also saw the situation that only one of my three activated nodes actually forged the block. The assumption of @MaciejBaj was, that the first node forging the block, broadcasted it quickly enough, before node 2 and 3 started trying to forge that block as well. |
Are you sure that this is the case? When there are very few transactions in the pool, all forging nodes produce the exact same block. |
If a node forges a block before the others its entirely possible to
broadcast before the others do. Its a matter of event loop when the nodes
try to forge and the timing. Time drift is also a factor, a node that is
"Earlier" in time forges sooner and broadcasts sooner.
2018-07-06 13:13 GMT+02:00 Simon Warta <notifications@github.com>:
… In situations with low or no traffic, I also saw the situation that only
one of my three activated nodes actually forged the block.
Are you sure that this is the case? When there are very few transactions
in the pool, all forging nodes produce the exact same block.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#402 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/APzFsNv4l--1X7YFSTTmKttblviP443vks5uD0ZJgaJpZM4Lqi4q>
.
|
I'm sure that only one node forged the block, when all had forging activated and all were in sync. At least I saw only on one node the log message "Forged new block id ...". And the other nodes received that new block. And there was no fork and no "Multiple forging". I don't know why. |
This will be resolved by new BFT protocol #3555 |
Fixing fork cause 5 -- Double Forging
What is double forging?
Double forging occurs when two or more nodes on the network have the same delegate(s) enabled to forge. This has the potential to create multiple blocks with the same parent id but a different blockId. While in most circumstances the blocks are the same, there are scenarios where the blockId can be different. In these scenarios we see chaos across the network with many nodes receiving and removing the blocks as they are broadcast due to the code for recovering from fork 5. There has been some code implemented to resolve this issue (found here: https://github.com/LiskHQ/lisk/blob/development/modules/blocks.js#L1356), it can also be considered unreliable and a bandaid for the greater issue that can still occur. See issue: #327
Why it happens.
In the current Lisk architecture a node will attempt to forge for every delegate enabled on it when that delegates slot occurs. This poses many dangers which are listed below
What can be done to fix it?
One step to take in order to fix the issue of double forging is disallow failover during a slot. This decision would be somewhat unpopular as delegates would no longer to forge every block and potentially slow down the network. Since that is a step backwards from current stability and performance we need to look at other options.
Another proposed solution (from @4miners ) was to force nodes to register on the network when forging is enabled with some hashed identifying information so that only a small set of nodes could forge. This suggestion is related to preventing malicious users from forging on a large amount of nodes. This doesn't really help solve the root cause to fork 5 either.
So whats the real solution?
The real solution is quite simple. In the
config.json
it should be defined that there areForging Peers
. These are peers that are forging with the same key. Under this new architecture one node would act as the primary forging node. All other nodes would qualify as failover nodes, in order of priority. When it comes time to generate a block for the delegate slot, the primary attempts to do so. If it fails due to consensus, it will emit a signed message to the defined secondary node that it has failed and is passing the baton. The secondary node verifies this message and attempts to forge during the slot. This continues on down the line until either the block is forged, or no node forges a block.(The above would be best implemented with websockets so that theres always a connection between these nodes)
Example config.json
In order to enforce this standard, all delegates will need a minimum of two instances running, whether on multiple nodes, or vertically stacked with changes to certain
config.json
params, in order to forge at all. Requiring multiple nodes will be perceived as unpopular to newcomers, but in the current state it is considered normal by the majority of delegates. This strategy will resolve Non-Malicious cause 5 situations but the solution only increases the difficulty to perform a double forging attack, not prevent it.So how do we fix that??
This solution is also pretty elegant. We extend the concept of hashing delegate information, except in this case the elected master node (priority 0) will generate a hash and signed message containing a list of nodes that are allowed to forge with its key. This data would need to be stored in a new table.
In order to prevent an attack of updating this information via spamming, we also save a timestamp. This highest frequency of allowed updates should be no less than one round, after the first entry, which is given a free pass.
These entries must go through out the network to prevent malicious attacks, however since only the secret key owner (and their nodes) can decrypt these messages there is no risk to delegate security by IP addresses
The method for updating the network is difficult, since theres no transaction type that covers this and this transaction does not have a specific need to be inserted directly into the blockchain. I will leave this end of the discussion open to debate as I am unsure myself.
The text was updated successfully, but these errors were encountered: