Fix fork cause 5 #402

Isabello · 2017-01-22T23:20:40Z

Fixing fork cause 5 -- Double Forging

What is double forging?

Double forging occurs when two or more nodes on the network have the same delegate(s) enabled to forge. This has the potential to create multiple blocks with the same parent id but a different blockId. While in most circumstances the blocks are the same, there are scenarios where the blockId can be different. In these scenarios we see chaos across the network with many nodes receiving and removing the blocks as they are broadcast due to the code for recovering from fork 5. There has been some code implemented to resolve this issue (found here: https://github.com/LiskHQ/lisk/blob/development/modules/blocks.js#L1356), it can also be considered unreliable and a bandaid for the greater issue that can still occur. See issue: #327

Why it happens.

In the current Lisk architecture a node will attempt to forge for every delegate enabled on it when that delegates slot occurs. This poses many dangers which are listed below

Non malicious double forging. An end user is running a failover script or configuring a delegate in config.json and it fails over within a slot causing two blocks to forge.
A malicious user double (or more) forges in a round by intentionally enabling forging on multiple nodes.

What can be done to fix it?

One step to take in order to fix the issue of double forging is disallow failover during a slot. This decision would be somewhat unpopular as delegates would no longer to forge every block and potentially slow down the network. Since that is a step backwards from current stability and performance we need to look at other options.

Another proposed solution (from @4miners ) was to force nodes to register on the network when forging is enabled with some hashed identifying information so that only a small set of nodes could forge. This suggestion is related to preventing malicious users from forging on a large amount of nodes. This doesn't really help solve the root cause to fork 5 either.

So whats the real solution?

The real solution is quite simple. In the config.json it should be defined that there are Forging Peers. These are peers that are forging with the same key. Under this new architecture one node would act as the primary forging node. All other nodes would qualify as failover nodes, in order of priority. When it comes time to generate a block for the delegate slot, the primary attempts to do so. If it fails due to consensus, it will emit a signed message to the defined secondary node that it has failed and is passing the baton. The secondary node verifies this message and attempts to forge during the slot. This continues on down the line until either the block is forged, or no node forges a block.

(The above would be best implemented with websockets so that theres always a connection between these nodes)

Example config.json

  "forging": {
        "force": false,
        "secret": [],
        "access": {
            "whiteList": [
                "127.0.0.1"
            ]
        }
        "peers": {
          "list": [
              {
                  "ip": "192.168.1.1",
                  "port": 8000
                  "priority": 0   // I forge first!
              },
              {
                  "ip": "192.168.1.2",
                  "port": 8000
                  "priority": 1 // I forge if the node above me fails!
              },
              {
                  "ip": "192.168.1.3",
                  "port": 8000
                  "priority": 2  // I forge if everyone else fails!
              }
          ]
        }
    },

In order to enforce this standard, all delegates will need a minimum of two instances running, whether on multiple nodes, or vertically stacked with changes to certain config.json params, in order to forge at all. Requiring multiple nodes will be perceived as unpopular to newcomers, but in the current state it is considered normal by the majority of delegates. This strategy will resolve Non-Malicious cause 5 situations but the solution only increases the difficulty to perform a double forging attack, not prevent it.

So how do we fix that??

This solution is also pretty elegant. We extend the concept of hashing delegate information, except in this case the elected master node (priority 0) will generate a hash and signed message containing a list of nodes that are allowed to forge with its key. This data would need to be stored in a new table.

Table "public.forging_hashes"
Column       | Type         | Modifiers
---------------+-----------------------+-----------
publicKey    | bytea        | 
hashed_nodes | bytea? text? | 
timestamp    | integer      | not null

In order to prevent an attack of updating this information via spamming, we also save a timestamp. This highest frequency of allowed updates should be no less than one round, after the first entry, which is given a free pass.

These entries must go through out the network to prevent malicious attacks, however since only the secret key owner (and their nodes) can decrypt these messages there is no risk to delegate security by IP addresses

The method for updating the network is difficult, since theres no transaction type that covers this and this transaction does not have a specific need to be inserted directly into the blockchain. I will leave this end of the discussion open to debate as I am unsure myself.

The text was updated successfully, but these errors were encountered:

Gr33nDrag0n69 · 2017-01-22T23:38:09Z

Nice write up! I like the fact the order of nodes, number of nodes and nodes IP is written by the end user and stay safe information. Monitoring and rebuild would still be needed from time to time but the principal source of forks would be gone. Not sure on the feasibility and robustness but it's definitively a step in the right direction.

webmaster128 · 2018-03-29T18:39:03Z

There has been some code implemented to resolve this issue (found here: https://github.com/LiskHQ/lisk/blob/development/modules/blocks.js#L1356), it can also be considered unreliable and a bandaid for the greater issue that can still occur.

I grabbed a commit ID from 2017-01-24 and assume you're referring to this code: https://github.com/LiskHQ/lisk/blob/b743cf562cbe538302520c72a57fe98f1faf64b9/modules/blocks.js#L1356 which is now in https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/modules/blocks/process.js#L196

I don't see what the issue with this resolution strategy is. For every two blocks A, B with A.id != B.id this defines a winner. The winner is first in the list [A, B] after sorting by the tuple (timestamp, id) where (timestamp, id) cannot be equal for both blocks.

Isabello · 2018-03-29T18:47:53Z

The issue as always is not all nodes see all versions of a block at a given height that get broadcast. You can't choose between two if you only ever receive one.

…

On Thu, Mar 29, 2018, 20:39 Simon Warta ***@***.***> wrote: There has been some code implemented to resolve this issue (found here: https://github.com/LiskHQ/lisk/blob/development/modules/blocks.js#L1356), it can also be considered unreliable and a bandaid for the greater issue that can still occur. I grabbed a commit ID from 2017-01-24 and assume you're referring to this code: https://github.com/LiskHQ/lisk/blob/b743cf562cbe538302520c72a57fe98f1faf64b9/modules/blocks.js#L1356 which is now in https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/modules/blocks/process.js#L196 I don't see what the issue with this resolution strategy is. For every two blocks A, B with A.id != B.id this defines a winner. The winner is first in list when ordered by the tuple (timestamp, id) where (timestamp, id) cannot be equal for both blocks. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#402 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APzFsO3zG7giQM83IXxYySlvqZrSKrfqks5tjSpSgaJpZM4Lqi4q> .

webmaster128 · 2018-03-30T13:04:16Z

Makes sense, thanks! I found this issue where the situation is described beautifully. But why was it closed?

Regarding communication between multiple nodes of the same delegate: I don't think it is good to assume that node A will have the chance to tell the next node that it cannot forge right now. Think of networking issues or power outages. If node A crashes, node B may not even receive a socket close. In those cases, a delegate would prefer forging twice instead of not forging.

simonmorgenthaler · 2018-07-06T11:10:49Z

The issue with double forging still exists in 1.0.0 on testnet, at least in situations with full blocks, when many transactions are pushed to the network. In situations with low or no traffic, I also saw the situation that only one of my three activated nodes actually forged the block. The assumption of @MaciejBaj was, that the first node forging the block, broadcasted it quickly enough, before node 2 and 3 started trying to forge that block as well.

webmaster128 · 2018-07-06T11:13:07Z

In situations with low or no traffic, I also saw the situation that only one of my three activated nodes actually forged the block.

Are you sure that this is the case? When there are very few transactions in the pool, all forging nodes produce the exact same block.

Isabello · 2018-07-06T11:15:20Z

If a node forges a block before the others its entirely possible to broadcast before the others do. Its a matter of event loop when the nodes try to forge and the timing. Time drift is also a factor, a node that is "Earlier" in time forges sooner and broadcasts sooner. 2018-07-06 13:13 GMT+02:00 Simon Warta <notifications@github.com>:

…

In situations with low or no traffic, I also saw the situation that only one of my three activated nodes actually forged the block. Are you sure that this is the case? When there are very few transactions in the pool, all forging nodes produce the exact same block. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#402 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APzFsNv4l--1X7YFSTTmKttblviP443vks5uD0ZJgaJpZM4Lqi4q> .

simonmorgenthaler · 2018-07-06T11:23:24Z

I'm sure that only one node forged the block, when all had forging activated and all were in sync. At least I saw only on one node the log message "Forged new block id ...". And the other nodes received that new block. And there was no fork and no "Multiple forging". I don't know why.
Do you have an other idea why this could happen, or what I should verify?

shuse2 · 2019-07-29T08:43:39Z

This will be resolved by new BFT protocol #3555

karmacoma added the type: improvement label Jan 24, 2017

4miners mentioned this issue Apr 7, 2017

Improve blocks processing efficiency #449

Closed

4 tasks

karmacoma added this to the Version 1.0.0 milestone Apr 7, 2017

karmacoma added the hard label Apr 7, 2017

karmacoma changed the title ~~Fix Fork Cause 5 root cause~~ Fix fork cause 5 Apr 10, 2017

karmacoma removed this from the Version 1.0.0 milestone Apr 21, 2017

MaciejBaj added chain framework labels Jun 19, 2018

diego-G added the discussion label Sep 4, 2018

shuse2 removed the *hard label Apr 15, 2019

shuse2 closed this as completed Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fork cause 5 #402

Fix fork cause 5 #402

Isabello commented Jan 22, 2017 •

edited

Loading

Gr33nDrag0n69 commented Jan 22, 2017

webmaster128 commented Mar 29, 2018 •

edited

Loading

Isabello commented Mar 29, 2018 via email

webmaster128 commented Mar 30, 2018

simonmorgenthaler commented Jul 6, 2018

webmaster128 commented Jul 6, 2018

Isabello commented Jul 6, 2018 via email

simonmorgenthaler commented Jul 6, 2018

shuse2 commented Jul 29, 2019

Fix fork cause 5 #402

Fix fork cause 5 #402

Comments

Isabello commented Jan 22, 2017 • edited Loading

Fixing fork cause 5 -- Double Forging

What is double forging?

Why it happens.

What can be done to fix it?

So whats the real solution?

So how do we fix that??

Gr33nDrag0n69 commented Jan 22, 2017

webmaster128 commented Mar 29, 2018 • edited Loading

Isabello commented Mar 29, 2018 via email

webmaster128 commented Mar 30, 2018

simonmorgenthaler commented Jul 6, 2018

webmaster128 commented Jul 6, 2018

Isabello commented Jul 6, 2018 via email

simonmorgenthaler commented Jul 6, 2018

shuse2 commented Jul 29, 2019

Isabello commented Jan 22, 2017 •

edited

Loading

webmaster128 commented Mar 29, 2018 •

edited

Loading