Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crab grandpa finality issues #419

Closed
hackfisher opened this issue Apr 9, 2020 · 9 comments · Fixed by #428
Closed

Crab grandpa finality issues #419

hackfisher opened this issue Apr 9, 2020 · 9 comments · Fixed by #428

Comments

@hackfisher
Copy link
Contributor

hackfisher commented Apr 9, 2020

In v0.5.0 devnet testing, the finalized block stall on zero. Here is some analysis about the case:

Reproduce the steps:

  1. There are 4 validators configed in genesis session with session keys, including grandpa key(ed25519), two of them get online intime with right keys before the on_session_change, but the other two didn't. Because there are only 2/4 (< 2/3) validators gossiping the grandpa, so the blocks in genesis didn't get finalized.

  2. After several session, other validators start bond and start validating, and the grandpa authorities set get changed, even though there might be > 2/3 validators online now, the blocks in later session didn't get finalized because there are previous blocks(those in the genesis session) remain unfinalized.

Notice the assumption here:
Blocks in one session must be finalized by the grandpa keys set of that session, cannot finalized by the keys set in latest session, this assumption need to be confirmed. Because in p2p networks, you cannot make sure what happens to the other offline branch and its successor key sets which could be different. The branches might just fork due to network connections and can not see each other, from decentralized perspective, you can not determine which branch should get finalized.

For the case in crab v0.5.0 devnet, the two genesis offline validators start validating later, but used different grandpa keys by setting rotate_keys using rpc call. Although they have same validator id with genesis config, but because the grandpa key has changed, so I guess that changed grandpa key will not work to help reach finality consensus for genesis session blocks.

Maybe we can make a experiment by requesting the two offline validator to change their grandpa keys back to the keys in genesis config and see whether it will helps?

Currently, substrate lacks diagnostic methods except -lafg option, but some works is already in progress to help improve this.

Related:
paritytech/substrate#4921

@hackfisher
Copy link
Contributor Author

The re-org issues #390 might increase difficulty for grandpa process, but it should not be the root cause since almost all the validators still have babe probabilistic consensus.

@hackfisher
Copy link
Contributor Author

hackfisher commented Apr 9, 2020

Additional info: validators have enough peers and directly connect to each other currently.

@hackfisher
Copy link
Contributor Author

I'm running one of the validators and adding trace -lafg, this validator is not in the genesis session config.

2020-04-09 05:05:36.476 main-tokio- TRACE afg  Polling round 1, state = State { prevote_ghost: None, finalized: None, estimate: None, completable: false }, step = Some(Prevoted)
2020-04-09 05:05:37.577 main-tokio- TRACE afg  Polling round 1, state = State { prevote_ghost: None, finalized: None, estimate: None, completable: false }, step = Some(Prevoted)
2020-04-09 05:05:38.677 main-tokio- TRACE afg  Polling round 1, state = State { prevote_ghost: None, finalized: None, estimate: None, completable: false }, step = Some(Prevoted)
2020-04-09 05:05:39.777 main-tokio- TRACE afg  Polling round 1, state = State { prevote_ghost: None, finalized: None, estimate: None, completable: false }, step = Some(Prevoted)

@AurevoirXavier
Copy link
Member

AurevoirXavier commented Apr 9, 2020

make sure the keys are correct.

  1. remove all node's keystore
  2. rpc_insert again
  3. restart the nodes

I think all the blocks can be finalized. we had done that in icefrog.

@hackfisher
Copy link
Contributor Author

  1. remove all node's keystore
  2. rpc_insert again
  3. restart the nodes

make sure the keys are correct.

We could do some operations hours later to verify this:

Maybe we can make a experiment by requesting the two offline validator to change their grandpa keys back to the keys in genesis config and see whether it will helps?

@hackfisher hackfisher added this to To do in Crab Network Tasks via automation Apr 9, 2020
@hackfisher hackfisher added the U-Question [Uncategorized] Further information/discussion is requested label Apr 9, 2020
@hackfisher
Copy link
Contributor Author

hackfisher commented Apr 9, 2020

  1. remove all node's keystore
  2. rpc_insert again
  3. restart the nodes

make sure the keys are correct.

We could do some operations hours later to verify this:

Maybe we can make a experiment by requesting the two offline validator to change their grandpa keys back to the keys in genesis config and see whether it will helps?

Some updates for the experiments:

  1. Two validator who was offline during genesis session get to validating, and set the right grandpa keys.
  2. After that, the finalized block get to reach 39 from 0, following log was reported from one of the genesis validator:
1|darwinia  | 2020-04-09 15:05:14.995 main-tokio- INFO sc_finality_grandpa::environment  Applying GRANDPA set change to new set [(Public(0000000000000000000000000000000000000000000000000000000000000000 (5C4hrfjw...)), 1), (Public(6a282c7674945c039a9289b702376ae168e8b67c9ed320054e2a019015f236fd (5ETtsEtn...)), 1), (Public(0000000000000000000000000000000000000000000000000000000000000000 (5C4hrfjw...)), 1), (Public(0000000000000000000000000000000000000000000000000000000000000000 (5C4hrfjw...)), 1), (Public(0000000000000000000000000000000000000000000000000000000000000000 (5C4hrfjw...)), 1), (Public(70fa82107e81f20bb4e5b059f4ac800d55aafcff9e918e000899569b4f207976 (5Ecqdt4n...)), 1)]
1|darwinia  | 2020-04-09 15:05:14.997 main-tokio- DEBUG afg  Xavier: Starting new voter with set ID 1
  1. The fianlized block is stoping on 39 now, which should still in genesis era before other validator join in.

  2. There might be too much grandpa voting and pre-voting/set-change now in the network, and there are trivial forks happen. Non-genesis validators has be request for help and go offline, to help the 4 genesis validators reach grandpa consensus ASAP.

@hackfisher
Copy link
Contributor Author

hackfisher commented Apr 9, 2020

Offline genesis validators get slashed on block 21, but the authorities in the consensus log is strange with 4 extra 0x0 account keys:

"011800000000000000000000000000000000000000000000000000000000000000000100000000000000b4f7f03bebc56ebe96bc52ea5ed3159d45a0ce3a8d7f082983c33ef1332747470100000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000584ea8f083c3a9038d57acc5229ab4d790ab6132921d5edc5fae1be4ed89ec1f010000000000000002e802f4a6e28a6684ed09822a58d87a4ad2d38f53d91fd28165147da800244e"

that is:

0118
00000000000000000000000000000000000000000000000000000000000000000100000000000000
b4f7f03bebc56ebe96bc52ea5ed3159d45a0ce3a8d7f082983c33ef1332747470100000000000000
00000000000000000000000000000000000000000000000000000000000000000100000000000000
00000000000000000000000000000000000000000000000000000000000000000100000000000000
00000000000000000000000000000000000000000000000000000000000000000100000000000000
584ea8f083c3a9038d57acc5229ab4d790ab6132921d5edc5fae1be4ed89ec1f0100000000000000

same offences/slash things happen to block 39 and triggered grandpa NewAuthorities event, this is the grandpa keys:


[ { "AuthorityId": "0x0000000000000000000000000000000000000000000000000000000000000000", "weight": 1 }, { "AuthorityId": "0x6a282c7674945c039a9289b702376ae168e8b67c9ed320054e2a019015f236fd", "weight": 1 }, { "AuthorityId": "0x0000000000000000000000000000000000000000000000000000000000000000", "weight": 1 }, { "AuthorityId": "0x0000000000000000000000000000000000000000000000000000000000000000", "weight": 1 }, { "AuthorityId": "0x0000000000000000000000000000000000000000000000000000000000000000", "weight": 1 }, { "AuthorityId": "0x70fa82107e81f20bb4e5b059f4ac800d55aafcff9e918e000899569b4f207976", "weight": 1 } ]

@hackfisher
Copy link
Contributor Author

Remove the chain data and resync with -lafg:

Following log repeatedly showing:

2020-04-10 09:02:51.142 import-queue-worker-0 DEBUG afg  Inserting potential standard set change signaled at block (13718, 0x9412b2533c5bf811b8f52f038e302aeb1975ba22f5efb8ad2d5df4c200062282) (delayed by 0 blocks).
2020-04-10 09:02:51.142 import-queue-worker-0 DEBUG afg  There are now 1 alternatives for the next pending standard change (roots), and a total of 154 pending standard changes (across all forks).
2020-04-10 09:02:51.145 import-queue-worker-0 TRACE afg  Imported unjustified block #13718 that enacts authority set change, waiting for finality for enactment.

@AurevoirXavier AurevoirXavier removed the U-Question [Uncategorized] Further information/discussion is requested label Apr 16, 2020
@AurevoirXavier AurevoirXavier linked a pull request Apr 16, 2020 that will close this issue
@AurevoirXavier
Copy link
Member

Fixed in 0286604

Crab Network Tasks automation moved this from To do to Done Apr 16, 2020
boundless-forest pushed a commit that referenced this issue Aug 1, 2023
* prioritize by gas price directly

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants