-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chain Stopped producing blocks #1142
Comments
I am getting the same issue. It happened to me twice. I was initially running a PoS ECDSA chain with v0.6.1 and it ran great for almost 3 million blocks before having that issue. So I updated the chain to v0.6.3 and restarted the chain from scratch with a new genesis block. This time it ran for about 150k blocks before running into the same issue again. |
Hi @NicoDFS Please fill out the complete support template, and we require the debug logs before, and after the chain was halted. [ Subject of the issue ]DescriptionDescribe your issue in as much detail as possible here. Your environment
Steps to reproduce
Expected behavior
LogsProvide us with debug logs from all of your validators by setting logging to Proposed solutionIf you have an idea on how to fix this issue, please write it down here, so we can begin discussing it @topdefi please raise the new github issue including the same details. Thank you! |
Your environment
The version of the Polygon Edge.
The branch that causes this issue.
Locally or Cloud hosted (which provider).
Please confirm if the validators are running under containerized environment (K8s, Docker, etc.).
Provide us with commands that you used to start your validators.
This should be all the missing info from the template, Logs are in the zip file I attached. |
Hi @ivanbozic21 is there anything else you need? |
I have been trying everything I can and so I've stopped and started the nodes a few time and on 2 occasions got this error:
I had to remove the metadata and snapshot files from the consensus folder to get the node to start. Which commands triggered the issue, if any.
Expected behavior
|
Hi @NicoDFS Please provide us with logs of your edge validators, also a copy of your genesis file so we can inspect it closely. But please confirm if the chain is now running after the restart. |
Hi @ivanbozic21 The chain is not running, I've restarted and rebooted multiple times and its still down since Sunday. Logs
Genesis
|
@NicoDFS But please send us the complete logs from when the validators were started. |
@NicoDFS But please send us the complete logs from when the validators were started. |
@NicoDFS But please send us the complete logs from when the validators were started. |
Closing this due to inactivity. |
The same issue i am facing @laviniat1996 |
Chain Stopped producing blocks updating to 0.6.3 did not fully solve.
Description
Clinet was running a polygon-edge network on v0.6.1 they added community validators but the majority of them never voted, some went offline completely. Then sending native tokens from 1 address to another where going through but the tx hash was not in the explorer or found, most contract calls showed on explorer as being validated but others did not.
About 1 day later The chain then went into a loop where all project controlled validators (the initial 4 from launch) kept looking for peers that were offline and this caused the chain to slow down and then eventually stop producing blocks all together.
BTW at this same time any tests i was running on testnet using the same exact version had zero issues all worked fine. The only difference in mainnet was the addition of the validators run by community members and the large amount of peers that were unreachable.
So after this I updated testnet to latest v0.6.3 and its running as expected. The project told their community to shut down all nodes, any validators that where not project controlled where voted to to be dropped in prep of mainnet update. I stopped all project control nodes and then ran the updated v0.6.3 (same as what is running on testnet).
Updating to 0.6.3 did stop the loop of looking for peers that were not online but it has yet to produce a block in over 12 hours of running the update. Outputs are showing rounds starting and then timing out like this:
Debug log output looks like this over and over again:
This network currently has 5 Validators, 2 of which are boot nodes, 3 RPC nodes and 1 node for explorer. all running the updated version. however there are still a few peers running old version
As you can see there are more peers than there should be, those rogue peers are running the old version and never got shutdown. I was not able to find a command in the docs to block or ban a peer so I tried to block the ip at the firewall but 3 of them are still showing as peers.
Your environment
All node servers are running Ubuntu 20.04 hosted on cloud vps services, none are running with docker all built from source.
The chain is live and has active users since late October 2022 and has producing blocks for 2 days from the time of this posting. The chain is running in PoA with IBFT and BLS. No changes where made to genesis file for update.
Proposed solution
Not sure but maybe a
ban peer function
since it looks like any old peers still connected with old versions are stopping block production? Seems like a good way a bad actor could set up some peers wait and then purposely not update to halt a chain.If I were to run switch from BLS to ECDSA or even from PoA to PoS will that force the other peers with original genesis off the network? Thank you.
The text was updated successfully, but these errors were encountered: