Skip to content
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.

Fix fork cause 5 #402

Closed
Isabello opened this issue Jan 22, 2017 · 9 comments
Closed

Fix fork cause 5 #402

Isabello opened this issue Jan 22, 2017 · 9 comments

Comments

@Isabello
Copy link
Contributor

Isabello commented Jan 22, 2017

Fixing fork cause 5 -- Double Forging

What is double forging?

Double forging occurs when two or more nodes on the network have the same delegate(s) enabled to forge. This has the potential to create multiple blocks with the same parent id but a different blockId. While in most circumstances the blocks are the same, there are scenarios where the blockId can be different. In these scenarios we see chaos across the network with many nodes receiving and removing the blocks as they are broadcast due to the code for recovering from fork 5. There has been some code implemented to resolve this issue (found here: https://github.com/LiskHQ/lisk/blob/development/modules/blocks.js#L1356), it can also be considered unreliable and a bandaid for the greater issue that can still occur. See issue: #327

Why it happens.

In the current Lisk architecture a node will attempt to forge for every delegate enabled on it when that delegates slot occurs. This poses many dangers which are listed below

  1. Non malicious double forging. An end user is running a failover script or configuring a delegate in config.json and it fails over within a slot causing two blocks to forge.
  2. A malicious user double (or more) forges in a round by intentionally enabling forging on multiple nodes.

What can be done to fix it?

One step to take in order to fix the issue of double forging is disallow failover during a slot. This decision would be somewhat unpopular as delegates would no longer to forge every block and potentially slow down the network. Since that is a step backwards from current stability and performance we need to look at other options.

Another proposed solution (from @4miners ) was to force nodes to register on the network when forging is enabled with some hashed identifying information so that only a small set of nodes could forge. This suggestion is related to preventing malicious users from forging on a large amount of nodes. This doesn't really help solve the root cause to fork 5 either.

So whats the real solution?

The real solution is quite simple. In the config.json it should be defined that there are Forging Peers. These are peers that are forging with the same key. Under this new architecture one node would act as the primary forging node. All other nodes would qualify as failover nodes, in order of priority. When it comes time to generate a block for the delegate slot, the primary attempts to do so. If it fails due to consensus, it will emit a signed message to the defined secondary node that it has failed and is passing the baton. The secondary node verifies this message and attempts to forge during the slot. This continues on down the line until either the block is forged, or no node forges a block.

(The above would be best implemented with websockets so that theres always a connection between these nodes)

Example config.json

  "forging": {
        "force": false,
        "secret": [],
        "access": {
            "whiteList": [
                "127.0.0.1"
            ]
        }
        "peers": {
          "list": [
              {
                  "ip": "192.168.1.1",
                  "port": 8000
                  "priority": 0   // I forge first!
              },
              {
                  "ip": "192.168.1.2",
                  "port": 8000
                  "priority": 1 // I forge if the node above me fails!
              },
              {
                  "ip": "192.168.1.3",
                  "port": 8000
                  "priority": 2  // I forge if everyone else fails!
              }
          ]
        }
    },

In order to enforce this standard, all delegates will need a minimum of two instances running, whether on multiple nodes, or vertically stacked with changes to certain config.json params, in order to forge at all. Requiring multiple nodes will be perceived as unpopular to newcomers, but in the current state it is considered normal by the majority of delegates. This strategy will resolve Non-Malicious cause 5 situations but the solution only increases the difficulty to perform a double forging attack, not prevent it.

So how do we fix that??

This solution is also pretty elegant. We extend the concept of hashing delegate information, except in this case the elected master node (priority 0) will generate a hash and signed message containing a list of nodes that are allowed to forge with its key. This data would need to be stored in a new table.

Table "public.forging_hashes"
Column       | Type         | Modifiers
---------------+-----------------------+-----------
publicKey    | bytea        | 
hashed_nodes | bytea? text? | 
timestamp    | integer      | not null

In order to prevent an attack of updating this information via spamming, we also save a timestamp. This highest frequency of allowed updates should be no less than one round, after the first entry, which is given a free pass.

These entries must go through out the network to prevent malicious attacks, however since only the secret key owner (and their nodes) can decrypt these messages there is no risk to delegate security by IP addresses

The method for updating the network is difficult, since theres no transaction type that covers this and this transaction does not have a specific need to be inserted directly into the blockchain. I will leave this end of the discussion open to debate as I am unsure myself.

@Gr33nDrag0n69
Copy link

Nice write up! I like the fact the order of nodes, number of nodes and nodes IP is written by the end user and stay safe information. Monitoring and rebuild would still be needed from time to time but the principal source of forks would be gone. Not sure on the feasibility and robustness but it's definitively a step in the right direction.

@karmacoma karmacoma added this to the Version 1.0.0 milestone Apr 7, 2017
@karmacoma karmacoma added the hard label Apr 7, 2017
@karmacoma karmacoma changed the title Fix Fork Cause 5 root cause Fix fork cause 5 Apr 10, 2017
@karmacoma karmacoma removed this from the Version 1.0.0 milestone Apr 21, 2017
@webmaster128
Copy link
Contributor

webmaster128 commented Mar 29, 2018

There has been some code implemented to resolve this issue (found here: https://github.com/LiskHQ/lisk/blob/development/modules/blocks.js#L1356), it can also be considered unreliable and a bandaid for the greater issue that can still occur.

I grabbed a commit ID from 2017-01-24 and assume you're referring to this code: https://github.com/LiskHQ/lisk/blob/b743cf562cbe538302520c72a57fe98f1faf64b9/modules/blocks.js#L1356 which is now in https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/modules/blocks/process.js#L196

I don't see what the issue with this resolution strategy is. For every two blocks A, B with A.id != B.id this defines a winner. The winner is first in the list [A, B] after sorting by the tuple (timestamp, id) where (timestamp, id) cannot be equal for both blocks.

@Isabello
Copy link
Contributor Author

Isabello commented Mar 29, 2018 via email

@webmaster128
Copy link
Contributor

Makes sense, thanks! I found this issue where the situation is described beautifully. But why was it closed?

Regarding communication between multiple nodes of the same delegate: I don't think it is good to assume that node A will have the chance to tell the next node that it cannot forge right now. Think of networking issues or power outages. If node A crashes, node B may not even receive a socket close. In those cases, a delegate would prefer forging twice instead of not forging.

@simonmorgenthaler
Copy link

The issue with double forging still exists in 1.0.0 on testnet, at least in situations with full blocks, when many transactions are pushed to the network. In situations with low or no traffic, I also saw the situation that only one of my three activated nodes actually forged the block. The assumption of @MaciejBaj was, that the first node forging the block, broadcasted it quickly enough, before node 2 and 3 started trying to forge that block as well.

@webmaster128
Copy link
Contributor

In situations with low or no traffic, I also saw the situation that only one of my three activated nodes actually forged the block.

Are you sure that this is the case? When there are very few transactions in the pool, all forging nodes produce the exact same block.

@Isabello
Copy link
Contributor Author

Isabello commented Jul 6, 2018 via email

@simonmorgenthaler
Copy link

I'm sure that only one node forged the block, when all had forging activated and all were in sync. At least I saw only on one node the log message "Forged new block id ...". And the other nodes received that new block. And there was no fork and no "Multiple forging". I don't know why.
Do you have an other idea why this could happen, or what I should verify?

@shuse2 shuse2 removed the *hard label Apr 15, 2019
@shuse2
Copy link
Collaborator

shuse2 commented Jul 29, 2019

This will be resolved by new BFT protocol #3555

@shuse2 shuse2 closed this as completed Jul 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants