Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain Stopped producing blocks #1142

Closed
NicoDFS opened this issue Jan 18, 2023 · 12 comments
Closed

Chain Stopped producing blocks #1142

NicoDFS opened this issue Jan 18, 2023 · 12 comments
Assignees

Comments

@NicoDFS
Copy link

NicoDFS commented Jan 18, 2023

Chain Stopped producing blocks updating to 0.6.3 did not fully solve.

Description

Clinet was running a polygon-edge network on v0.6.1 they added community validators but the majority of them never voted, some went offline completely. Then sending native tokens from 1 address to another where going through but the tx hash was not in the explorer or found, most contract calls showed on explorer as being validated but others did not.

About 1 day later The chain then went into a loop where all project controlled validators (the initial 4 from launch) kept looking for peers that were offline and this caused the chain to slow down and then eventually stop producing blocks all together.

BTW at this same time any tests i was running on testnet using the same exact version had zero issues all worked fine. The only difference in mainnet was the addition of the validators run by community members and the large amount of peers that were unreachable.

So after this I updated testnet to latest v0.6.3 and its running as expected. The project told their community to shut down all nodes, any validators that where not project controlled where voted to to be dropped in prep of mainnet update. I stopped all project control nodes and then ran the updated v0.6.3 (same as what is running on testnet).

Updating to 0.6.3 did stop the loop of looking for peers that were not online but it has yet to produce a block in over 12 hours of running the update. Outputs are showing rounds starting and then timing out like this:

2023-01-18T16:38:39.457Z [INFO]  chain.server.ibft.consensus: round timeout expired: round=0
2023-01-18T16:38:39.465Z [INFO]  chain.server.ibft.consensus: round started: round=1

Debug log output looks like this over and over again:

2023-01-18T16:36:30.651Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
2023-01-18T16:36:30.651Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
2023-01-18T16:36:30.887Z [DEBUG] chain.network.discovery: Found new near peers: peer=10
2023-01-18T16:36:35.652Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm8Gojq3VRuJT4tdioQN4Zzy98qFepkH6yqysGz3nGcNd6
2023-01-18T16:36:35.652Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm8Gojq3VRuJT4tdioQN4Zzy98qFepkH6yqysGz3nGcNd6
2023-01-18T16:36:35.725Z [DEBUG] chain.network.discovery: Found new near peers: peer=8
2023-01-18T16:36:40.652Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-18T16:36:40.652Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-18T16:36:40.915Z [DEBUG] chain.network.discovery: Found new near peers: peer=7

This network currently has 5 Validators, 2 of which are boot nodes, 3 RPC nodes and 1 node for explorer. all running the updated version. however there are still a few peers running old version

[PEERS LIST]
Number of peers: 11

[0]  = 16Uiu2HAmPoEyJjHbUM6txAoNDA7ae7sityzDAz93u1NWKjsMD8Do
[1]  = 16Uiu2HAmKsPxTJAzsB3fdrGtQJB78j5qecqkWj8VfhUuqySLWRe9
[2]  = 16Uiu2HAmHccd213AbXd7BrTxg4xBAbPKrMekTNqQsqGDVeeJwJjP
[3]  = 16Uiu2HAkzTsrGRdpzoPD8NRScgRyXmfrprMf1xC4NN7wHxT82EK1
[4]  = 16Uiu2HAmNg4u3Djo4k8nbnj8pNr1MiQ8oZebFqDF1RVW5pSJ7RBX
[5]  = 16Uiu2HAkuc5UsFM9cGmmUwxtg3QrMXTvqXgVtp5P2mHKZDre1Sfr
[6]  = 16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
[7]  = 16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
[8]  = 16Uiu2HAkyknR29Z8xRxDtMQvsbmfnEPwLN2VAFzMFhSKrC7GdDzb
[9]  = 16Uiu2HAmH2XSK7dXAS8U74LvefAQCwSr95E8TUNg76SNdn8nrp2n
[10] = 16Uiu2HAm8Gojq3VRuJT4tdioQN4Zzy98qFepkH6yqysGz3nGcNd6

As you can see there are more peers than there should be, those rogue peers are running the old version and never got shutdown. I was not able to find a command in the docs to block or ban a peer so I tried to block the ip at the firewall but 3 of them are still showing as peers.

Your environment

All node servers are running Ubuntu 20.04 hosted on cloud vps services, none are running with docker all built from source.
The chain is live and has active users since late October 2022 and has producing blocks for 2 days from the time of this posting. The chain is running in PoA with IBFT and BLS. No changes where made to genesis file for update.

Proposed solution

Not sure but maybe a ban peer function since it looks like any old peers still connected with old versions are stopping block production? Seems like a good way a bad actor could set up some peers wait and then purposely not update to halt a chain.

If I were to run switch from BLS to ECDSA or even from PoA to PoS will that force the other peers with original genesis off the network? Thank you.

@topdefi
Copy link

topdefi commented Jan 18, 2023

I am getting the same issue. It happened to me twice. I was initially running a PoS ECDSA chain with v0.6.1 and it ran great for almost 3 million blocks before having that issue. So I updated the chain to v0.6.3 and restarted the chain from scratch with a new genesis block. This time it ran for about 150k blocks before running into the same issue again.

@ivanbozic21
Copy link

Hi @NicoDFS

Please fill out the complete support template, and we require the debug logs before, and after the chain was halted.

[ Subject of the issue ]

Description

Describe your issue in as much detail as possible here.

Your environment

  • OS and version.
  • The version of the Polygon Edge.
    (Confirm the version of your Polygon edge client by running the following command: polygon-edge version --grpc-address GRPC_ADDRESS)
  • The branch that causes this issue.
  • Locally or Cloud hosted (which provider).
  • Please confirm if the validators are running under containerized environment (K8s, Docker, etc.).

Steps to reproduce

  • Tell us how to reproduce this issue.
  • Where the issue is, if you know.
  • Which commands triggered the issue, if any.
  • Provide us with the content of your genesis file.
  • Provide us with commands that you used to start your validators.
  • Provide us with the peer list of each of your validators by running the following command: polygon-edge peers list --grpc-address GRPC_ADDRESS.
  • Is the chain producing blocks and serving customers atm?

Expected behavior

  • Tell us what should happen.
  • Tell us what happened instead.

Logs

Provide us with debug logs from all of your validators by setting logging to debug output with: server --log-level debug

Proposed solution

If you have an idea on how to fix this issue, please write it down here, so we can begin discussing it

@topdefi please raise the new github issue including the same details.

Thank you!

@ivanbozic21 ivanbozic21 self-assigned this Jan 19, 2023
@NicoDFS
Copy link
Author

NicoDFS commented Jan 19, 2023

LOGS.zip

Your environment

  • Ubuntu 20.04

The version of the Polygon Edge.

  • 0.6.3

The branch that causes this issue.

  • Used source code from release not a branch.

Locally or Cloud hosted (which provider).

  • Cloud VPS, Vultr

Please confirm if the validators are running under containerized environment (K8s, Docker, etc.).

  • Not using Docker, built from source.

Provide us with commands that you used to start your validators.

  • chain server --data-dir ./data-dir --chain genesis.json --libp2p 0.0.0.0:1478 --grpc-address 127.0.0.1:10000 --nat ip-address --seal --log-level debug

This should be all the missing info from the template, Logs are in the zip file I attached.

@NicoDFS
Copy link
Author

NicoDFS commented Jan 19, 2023

Hi @ivanbozic21 is there anything else you need?

@NicoDFS
Copy link
Author

NicoDFS commented Jan 19, 2023

I have been trying everything I can and so I've stopped and started the nodes a few time and on 2 occasions got this error:

2023-01-19T16:31:54.474Z [INFO] chain.blockchain: genesis: hash=0xb30c31c393b593b1101db67ee68c2c4eeb044645227fcf02e568dd10c0e92916
unexpected end of JSON input

I had to remove the metadata and snapshot files from the consensus folder to get the node to start.

Which commands triggered the issue, if any.

  • Ctrl C to stop node.
  • Restart Node with chain server --data-dir ./data-dir --chain genesis.json --libp2p 0.0.0.0:1478 --grpc-address 127.0.0.1:10000 --nat ip-address --seal --

Expected behavior
Tell us what should happen.

  • The node should start and find peers
    Tell us what happened instead.
  • Node did not start and gave me the error above.

@ivanbozic21
Copy link

Hi @NicoDFS

Please provide us with logs of your edge validators, also a copy of your genesis file so we can inspect it closely.

But please confirm if the chain is now running after the restart.

@NicoDFS
Copy link
Author

NicoDFS commented Jan 20, 2023

Hi @ivanbozic21 The chain is not running, I've restarted and rebooted multiple times and its still down since Sunday.

Logs

2023-01-20T12:48:18.400Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:48:23.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-20T12:48:23.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-20T12:48:23.431Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:48:28.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAmNg4u3Djo4k8nbnj8pNr1MiQ8oZebFqDF1RVW5pSJ7RBX
2023-01-20T12:48:28.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmNg4u3Djo4k8nbnj8pNr1MiQ8oZebFqDF1RVW5pSJ7RBX
2023-01-20T12:48:28.399Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:48:33.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP
2023-01-20T12:48:33.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP
2023-01-20T12:48:33.191Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:48:38.182Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAkuc5UsFM9cGmmUwxtg3QrMXTvqXgVtp5P2mHKZDre1Sfr
2023-01-20T12:48:38.182Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAkuc5UsFM9cGmmUwxtg3QrMXTvqXgVtp5P2mHKZDre1Sfr
2023-01-20T12:48:38.197Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:48:43.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAmNg4u3Djo4k8nbnj8pNr1MiQ8oZebFqDF1RVW5pSJ7RBX
2023-01-20T12:48:43.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmNg4u3Djo4k8nbnj8pNr1MiQ8oZebFqDF1RVW5pSJ7RBX
2023-01-20T12:48:43.400Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:48:48.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP
2023-01-20T12:48:48.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP
2023-01-20T12:48:48.192Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:48:53.182Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-20T12:48:53.182Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-20T12:48:53.430Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:48:58.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAkuc5UsFM9cGmmUwxtg3QrMXTvqXgVtp5P2mHKZDre1Sfr
2023-01-20T12:48:58.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAkuc5UsFM9cGmmUwxtg3QrMXTvqXgVtp5P2mHKZDre1Sfr
2023-01-20T12:48:58.185Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:03.155Z [DEBUG] chain.server.ibft: validator message received: type=ROUND_CHANGE height=2248797 round=6 addr=0x3CC9bd8931c96E1f4ABB60714fD4eE4989Fc24f7
2023-01-20T12:49:03.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-20T12:49:03.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-20T12:49:03.430Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:08.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
2023-01-20T12:49:08.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
2023-01-20T12:49:08.478Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:11.191Z [ERROR] chain.server.ibft.consensus: unable to build proposal
2023-01-20T12:49:11.191Z [INFO]  chain.server.ibft.consensus: round timeout expired: round=3
2023-01-20T12:49:11.193Z [INFO]  chain.server.ibft.consensus: round started: round=4
2023-01-20T12:49:11.194Z [DEBUG] chain.server.ibft: validator message received: type=ROUND_CHANGE height=2248797 round=4 addr=0xDD5217493685c376142f6eB26Ba012d6656f3EbE
2023-01-20T12:49:11.194Z [DEBUG] chain.server.ibft.consensus: enter: new round state
2023-01-20T12:49:13.182Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAkuc5UsFM9cGmmUwxtg3QrMXTvqXgVtp5P2mHKZDre1Sfr
2023-01-20T12:49:13.182Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAkuc5UsFM9cGmmUwxtg3QrMXTvqXgVtp5P2mHKZDre1Sfr
2023-01-20T12:49:13.188Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:18.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP
2023-01-20T12:49:18.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP
2023-01-20T12:49:18.193Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:23.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAmH2XSK7dXAS8U74LvefAQCwSr95E8TUNg76SNdn8nrp2n
2023-01-20T12:49:23.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmH2XSK7dXAS8U74LvefAQCwSr95E8TUNg76SNdn8nrp2n
2023-01-20T12:49:23.263Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:28.182Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
2023-01-20T12:49:28.182Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
2023-01-20T12:49:28.477Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:33.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
2023-01-20T12:49:33.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm42yKEJA9xWEDpxFWoBiaCfnhx9UFWmnNYuuWrRRT2vxi
2023-01-20T12:49:33.478Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:38.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAmH2XSK7dXAS8U74LvefAQCwSr95E8TUNg76SNdn8nrp2n
2023-01-20T12:49:38.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmH2XSK7dXAS8U74LvefAQCwSr95E8TUNg76SNdn8nrp2n
2023-01-20T12:49:38.262Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:43.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP
2023-01-20T12:49:43.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP
2023-01-20T12:49:43.200Z [DEBUG] chain.network.discovery: Found new near peers: peer=5
2023-01-20T12:49:48.183Z [DEBUG] chain.network.discovery: running regular peer discovery: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-20T12:49:48.183Z [DEBUG] chain.network.discovery: Querying a peer for near peers: peer=16Uiu2HAm6RpA3KXPkPoGkJqgT2eMYcwqkFgmvPkHJjQ45uRnpwbm
2023-01-20T12:49:48.430Z [DEBUG] chain.network.discovery: Found new near peers: peer=5

Genesis

{
    "name": "chain",
    "genesis": {
        "nonce": "0x0000000000000000",
        "timestamp": "0x0",
        "extraData": "0x0000000000000000000000000000000000000000000000000000000000000000f90127f90120f846943cc9bd8931c96e1f4abb60714fd4ee4989fc24f7b0ad07531dbfe53f5975cd2928c19a354847129e85070d53f5a08f0608b785ab559e498e4fa48a0313065c565fbd1ba809f84694dd5217493685c376142f6eb26ba012d6656f3ebeb0a5973bbe559e01a06b7af6d22b3c075966d19e680e1ddcdcaa55784d1dbc3fb7af05940fcb9bc99918523996145a7876f846944d2b668973615eafe5fdf714ce9225b7c228746cb0b6394fe18c80c9d01406658c327c8120caa763fd327b6706758bf53a5eba9fccde86fe5dabb2f999ec5e8654a81143fbf8469453fd7ca148ea849dfed902ece64d8e11f07e146cb0a9451e66d24b3f05347f889b5128b66d6245ceef138dbfb4b0e1514219bb67de3de90bfa5ac01b348c19c8984b51df4280c28080",
        "gasLimit": "0x3b9aca00",
        "difficulty": "0x1",
        "mixHash": "0x0000000000000000000000000000000000000000000000000000000000000000",
        "coinbase": "0x0000000000000000000000000000000000000000",
        "alloc": {
            "0x74De2b1ba0a10BF61a2733282dB9a3D60DeeBA23": {
                "balance": "0x84595161401484a0000000"
            }
        },
        "number": "0x0",
        "gasUsed": "0x70000",
        "parentHash": "0x0000000000000000000000000000000000000000000000000000000000000000"
    },
    "params": {
        "forks": {
            "homestead": 0,
            "byzantium": 0,
            "constantinople": 0,
            "petersburg": 0,
            "istanbul": 0,
            "EIP150": 0,
            "EIP158": 0,
            "EIP155": 0
        },
        "chainID": 3797,
        "engine": {
            "ibft": {
                "epochSize": 50,
                "type": "PoA",
                "validator_type": "bls"
            }
        },
        "blockGasTarget": 0
    },
    "bootnodes": [
        "/ip4/95.179.246.28/tcp/1478/p2p/16Uiu2HAkxcdj1KNakRqyXm49PQmsFLqFGh41MpXbqJNoSU2ssPWP",
        "/ip4/185.92.221.129/tcp/1478/p2p/16Uiu2HAkzTsrGRdpzoPD8NRScgRyXmfrprMf1xC4NN7wHxT82EK1"
    ]
}

@ivanbozic21
Copy link

ivanbozic21 commented Jan 26, 2023

@NicoDFS
Apology for the late response.

But please send us the complete logs from when the validators were started.
Including the logs of bootnodes as well.

@ivanbozic21
Copy link

@NicoDFS
Apology for the late response.

But please send us the complete logs from when the validators were started.
Including the logs of bootnodes as well.

@ivanbozic21
Copy link

@NicoDFS
Apology for the late response.

But please send us the complete logs from when the validators were started.
Including the logs of bootnodes as well.

@laviniat1996
Copy link
Contributor

Closing this due to inactivity.

@puneet166
Copy link

The same issue i am facing @laviniat1996

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants