Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossamer block finalization stalls on a cross-client dev net #2623

Closed
EclesioMeloJunior opened this issue Jun 28, 2022 · 4 comments
Closed
Assignees
Labels
Epic Issue used to track development status of a complex feature, aggregates several issues

Comments

@EclesioMeloJunior
Copy link
Member

Describe the bug

  • Currently is possible to start a gossamer node A and connect it to two other substrate-based nodes C and D, the problem is that gossamer node A starts building block upon a fork at some block height and from this point, the network doesn’t reach a consensus

Gossamer node A produces block 79, the substrate node B produces block 79 and substrate node C produces block 79 each block with a different hash but substrate node B and C reorg the chain and keep the block 79 produced by substrate node B

Gossamer node A

2022-06-28T16:43:46-04:00 INFO built block 79 with hash 0xbbd7688615da934f69ce7f826e2ba7dcaefa4a0a710a7eb66533f9334cdd23bc, state root 0xeba013eb9f8f289e40821176499be8865b9c681b334a85bf745064ce2b3614c9, epoch 1 and slot 414112256	babe.go:L541	pkg=babe

Substrate node B

🔖 Pre-sealed block for proposal at 79. Hash now 0xaef4bbd6c5684ca660644bf5b971d59c32f9c3986f50fa8b099d9f5b12eb2d16, previously 0x296c5a5eb64c1ff8086e9c1bf411e8636aa004572ebb19950b4ce57df4bf0208.
2022-06-28 16:43:44 ✨ Imported #79 (0xaef4…2d16)
2022-06-28 16:43:44 ✨ Imported #79 (0xd816…f5bb
2022-06-28 16:43:44 💤 Idle (1 peers), best: #79 (0xaef4…2d16), finalized #19 (0x5064…be79)

Substrate node C

2022-06-28 16:43:44 🔖 Pre-sealed block for proposal at 79. Hash now 0xd8167cfa2e35b8456cd3e4787aab4f088987bcc6e48ab61c2d65a498bf8ef5bb, previously 0x158ff0155ba8c3401828b9d2bb63fd89337fe444585dad8c07e9b64b9b1d307c.
2022-06-28 16:43:44 ✨ Imported #79 (0xd816…f5bb)
2022-06-28 16:43:44 ♻️  Reorg on #79,0xd816…f5bb to #79,0xaef4…2d16, common ancestor #78,0xdc0a…096b
2022-06-28 16:43:44 ✨ Imported #79 (0xaef4…2d16)
2022-06-28 16:43:46 💤 Idle (2 peers), best: #79 (0xaef4…2d16), finalized #19 (0x5064…be79), ⬇ 0.8kiB/s ⬆ 0.5kiB/s
2022-06-28 16:43:48 ✨ Imported #80 (0x2df4…2b75)
  • We should apply the same chain reorg rule as substrate does and avoid producing forks.

To Reproduce

Steps to reproduce the behavior:

  1. Setup a three-node network (1 Gossamer, 2 Substrate), use the https://github.com/ChainSafe/substrate-node-template to build the runtime, and the substrate nodes.
  2. Start the gossamer node as Alice and the other substrate nodes as Bob and Charlie
  3. They should start producing and finalizing blocks but at some point, the gossamer node will start a fork
  4. It is possible to watch the forks by connecting one substrate node to the polkadot-js and at forks
@danforbes danforbes changed the title Gossamer block finalization stales on a cross-client dev net Gossamer block finalization stalls on a cross-client dev net Jul 7, 2022
@EclesioMeloJunior EclesioMeloJunior self-assigned this Jul 11, 2022
@EclesioMeloJunior
Copy link
Member Author

I noticed that substrate only cast the vote after receiving a neighbor message from the peers, while we define a prevote without waiting for the peers, other than that we should only send pre-commit message after receiving enough prevote messages. Currently, gossamer only sleeps for some s.interval before sending pre-commit messages.

Another point is that we should send a neighbor message when we start GrandPa otherwise substrate will not send us any vote information

https://matrix.to/#/!oZltgdfyakVMtEAWCI:web3.foundation/$bxs0GCJoeBHgstn8_rFfbN06id9ZUlxb0ErhCyFUq_k?via=web3.foundation&via=matrix.org&via=matrix.parity.io

@EclesioMeloJunior
Copy link
Member Author

After more investigation, I find out the substrate is using a different protocol id for GRANDPA message exchange /{genesis_hash}/grandpa/1 once I changed the protocol ID I was capable to see vote messages

@EclesioMeloJunior
Copy link
Member Author

Gossamer was capable to finalize block for 2 rounds in the third round I notice the following behavior:

  • We sent a prevote for block number 7
sending pre-vote message hash=0x1e6eb4f1383d56973a755677aa360d7b55543227758e1b8e225e129347d7bb12 number=7...
  • Then I received a prevote message from Alice (Auth ID: 0x88dc...) and Charlie (Auth ID: 0xd35dc) but for block number 5
TRCE handling grandpa message: &{3 0 stage=prevote hash=0xd35dccec8ced73b2e12551cec35821752c7f83d555d5e404483d28eb85f13be4 number=5 authorityID=0x88dc3417d5058ec4b4503e0c12ea1a0a89be200fe98922423d4334014fa6b0ee}	message_handler.go:L44	pkg=grandpa

TRCE handling grandpa message: &{3 0 stage=prevote hash=0xd35dccec8ced73b2e12551cec35821752c7f83d555d5e404483d28eb85f13be4 number=5 authorityID=0x439660b36c6c03afafca027b910b4fecf99801834c62a5e6006f27d978de234f}	message_handler.go:L44	pkg=grandpa
  • So we got 3 votes for block number 5 then we sent a precommit message for block number 5
WARN validated vote message hash=0xd35dccec8ced73b2e12551cec35821752c7f83d555d5e404483d28eb85f13be4 number=5 from 0x439660b36c6c03afafca027b910b4fecf99801834c62a5e6006f27d978de234f, round 3, subround 0, prevote count 3, precommit count 0, votes needed 3	vote_message.go:L69	pkg=grandpa

DBUG sending pre-commit message hash=0xd35dccec8ced73b2e12551cec35821752c7f83d555d5e404483d28eb85f13be4 number=5...
  • And now something weird happens, we sent a prevote for block number 7 in the middle of the precommit phase
TRCE sent message: &{3 0 stage=prevote hash=0x1e6eb4f1383d56973a755677aa360d7b55543227758e1b8e225e129347d7bb12 number=7 authorityID=0xd17c2d7823ebf260fd138f2d7e27d114c0145d968b5ff5006125f2414fadae69}	 network.go:L178	pkg=grandpa
  • Right after sending this wrong message we stop receiving prevote/precommit messages from substrate peers

A possible solution is: currently, gossamer spins up two goroutines one to send prevote messages and the other to send precommit messages, those goroutines only stops once the round is completable but we should make sure that after the prevote phase ends we stop sending prevote messages

@kishansagathiya
Copy link
Contributor

kishansagathiya commented Jul 18, 2022

After more investigation, I find out the substrate is using a different protocol id for GRANDPA message exchange /{genesis_hash}/grandpa/1 once I changed the protocol ID I was capable to see vote messages

Can you post some relevant logs for this, like dump of substrate std output and gossamer std output?

@EclesioMeloJunior EclesioMeloJunior added the Epic Issue used to track development status of a complex feature, aggregates several issues label Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic Issue used to track development status of a complex feature, aggregates several issues
Projects
None yet
Development

No branches or pull requests

4 participants