Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beaconchain sync stuck on mainnet, pyrmont, and prater #2949

Closed
q9f opened this issue Aug 12, 2021 · 6 comments · Fixed by #2955
Closed

beaconchain sync stuck on mainnet, pyrmont, and prater #2949

q9f opened this issue Aug 12, 2021 · 6 comments · Fixed by #2955
Assignees
Labels
prio-critical Drop everything to resolve this immediately. Examples: consensus bug, un-usable CLI

Comments

@q9f
Copy link
Contributor

q9f commented Aug 12, 2021

Describe the bug

Lodestar 0.28 does not sync Prater testnet. It get's stuck on epoch 126 / head https://prater.beaconcha.in/block/4095

Expected behavior

Lodestar 0.28 does sync Prater testnet. ;)

Steps to Reproduce

NODE_OPTIONS="--max_old_space_size=8192" lodestar beacon \
  --metrics.enabled \
  --eth1.providerUrl https://goerli.infura.io/v3/***SNAP*** \
  --rootDir /srv/chain/lodestar \
  --network "prater"

Screenshots

Screenshot at 2021-08-12 11-40-51

Aug-12 11:32:33.431 []                 info: Lodestar version=0.28.0 master 2c010b25, network=prater
Aug-12 11:32:33.642 [DB]               info: Connected to LevelDB database name=/srv/chain/lodestar/chain-db
Aug-12 11:32:41.468 []                 info: Initializing beacon state from anchor state slot=4000, epoch=125, stateRoot=0x683bca5fd958e3df7ef3eb0605b46ab22999dcac3e8e3addc407dbbb1b205f4d
Aug-12 11:33:01.616 [METRICS]          info: Starting metrics HTTP server port=8008
(node:2774722) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 uncaughtException listeners added to [process]. Use emitter.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created
Aug-12 11:33:03.894 [NETWORK]          info: PeerId 16Uiu2HAm3e15g4BB56pfBFmmPFeq1Z7pR6n573SMYDQfqCiV2qTS, Multiaddrs /ip4/127.0.0.1/tcp/9000,/ip4/135.181.182.51/tcp/9000
Aug-12 11:33:03.895 []                 warn: Low peer count peers=0
Aug-12 11:33:03.895 []                 info: Searching for peers - peers: 0 - finalized: 123 0x311e…cbcc - head: 4000 0xb750…33e5 - clockSlot: 1021065
Aug-12 11:33:06.009 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 123 0x311e…cbcc - head: 4000 0xb750…33e5 - clockSlot: 1021065 - peers: 3
Aug-12 11:33:18.001 []                 info: Syncing - 1.6 days left - 7.14 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021066 - peers: 3
Aug-12 11:33:30.000 []                 info: Syncing - 2.8 days left - 4.14 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021067 - peers: 4
Aug-12 11:33:42.001 []                 info: Syncing - 4.3 days left - 2.74 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021068 - peers: 3
Aug-12 11:33:54.001 []                 info: Syncing - 6 days left - 1.96 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021069 - peers: 3
Aug-12 11:34:06.002 []                 info: Syncing - 8 days left - 1.48 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021070 - peers: 4
Aug-12 11:34:18.001 []                 info: Syncing - 10 days left - 1.16 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021071 - peers: 4
Aug-12 11:34:30.004 []                 info: Syncing - 13 days left - 0.932 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021072 - peers: 5
Aug-12 11:34:42.001 []                 info: Syncing - 15 days left - 0.768 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021073 - peers: 6
Aug-12 11:34:54.001 []                 info: Syncing - 27 days left - 0.432 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021074 - peers: 6
Aug-12 11:35:06.000 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021075 - peers: 8
Aug-12 11:35:18.201 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021076 - peers: 11
Aug-12 11:35:30.217 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021077 - peers: 14
Aug-12 11:35:42.524 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021078 - peers: 16
Aug-12 11:35:54.324 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021079 - peers: 16
Aug-12 11:36:06.476 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021080 - peers: 18
Aug-12 11:36:18.003 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021081 - peers: 18
Aug-12 11:36:29.999 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021082 - peers: 19
Aug-12 11:36:30.001 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021082 - peers: 19
Aug-12 11:36:42.000 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021083 - peers: 20
Aug-12 11:36:54.422 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021084 - peers: 20
Aug-12 11:37:06.001 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 126 0x8131…acb4 - head: 4095 0xb02b…cc80 - clockSlot: 1021085 - peers: 21

Desktop (please complete the following information):

  • OS: Archlinux kernel 5.10.11-arch1-1 #1 SMP PREEMPT Wed, 27 Jan 2021 13:53:16 +0000 x86_64 GNU/Linux
  • Version: v0.28.0
  • Branch: master
  • Commit hash: 2c010b25
  • Node: nvm use lts/fermium / Now using node v14.17.4 (npm v7.20.5)
@q9f
Copy link
Contributor Author

q9f commented Aug 12, 2021

Similar issue with Pyrmont actually. :(

Aug-12 11:59:37.001 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921797 - peers: 30
Aug-12 11:59:49.150 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921798 - peers: 26
Aug-12 12:00:01.001 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921799 - peers: 25
Aug-12 12:00:13.001 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921800 - peers: 25
Aug-12 12:00:25.000 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921801 - peers: 26
Aug-12 12:00:37.000 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921802 - peers: 26
Aug-12 12:00:49.245 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921803 - peers: 26
Aug-12 12:01:01.001 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921804 - peers: 29
Aug-12 12:01:13.000 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921805 - peers: 29
Aug-12 12:01:25.001 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 320 0x198f…7f39 - head: 10334 0x20b0…52cc - clockSlot: 1921806 - peers: 27

https://pyrmont.beaconcha.in/block/10334

@q9f
Copy link
Contributor Author

q9f commented Aug 12, 2021

Mainnet ....

Aug-12 12:09:15.515 []                 info: Lodestar version=0.28.0 master 2c010b25, network=mainnet
Aug-12 12:09:15.589 [DB]               info: Connected to LevelDB database name=/srv/chain/lodestar/chain-db
Aug-12 12:09:16.806 []                 info: Initializing beacon state from anchor state slot=1344, epoch=42, stateRoot=0x1cc89c759c7f9b0a0a72a372084f5f5411fab1b447f15f752f4b1273c1e080b6
Aug-12 12:09:18.994 [METRICS]          info: Starting metrics HTTP server port=8008
(node:1737972) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 uncaughtException listeners added to [process]. Use emitter.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created)
Aug-12 12:09:22.250 [NETWORK]          info: PeerId 16Uiu2HAmCfEjsSVivDfDauGEtiDcuYHFYidFCm5hzxt4Y4tU51QQ, Multiaddrs /ip4/127.0.0.1/tcp/9000,/ip4/135.181.182.54/tcp/9000
Aug-12 12:09:22.251 []                 warn: Low peer count peers=0
Aug-12 12:09:22.251 []                 info: Searching for peers - peers: 0 - finalized: 40 0x5f74…7330 - head: 1343 0x19d2…5114 - clockSlot: 1828244
Aug-12 12:09:29.001 []                 info: Syncing - 1.5 days left - 14.2 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828245 - peers: 4
Aug-12 12:09:41.005 []                 info: Syncing - 4.7 days left - 4.52 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828246 - peers: 4
Aug-12 12:09:53.066 []                 info: Syncing - 8.7 days left - 2.44 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828247 - peers: 6
Aug-12 12:10:05.002 []                 info: Syncing - 13 days left - 1.57 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828248 - peers: 9
Aug-12 12:10:17.002 []                 info: Syncing - 19 days left - 1.10 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828249 - peers: 9
Aug-12 12:10:29.333 []                 info: Syncing - 26 days left - 0.819 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828250 - peers: 13
Aug-12 12:10:41.010 []                 info: Syncing - 33 days left - 0.636 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828251 - peers: 15
Aug-12 12:10:53.004 []                 info: Syncing - 42 days left - 0.509 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828252 - peers: 16
Aug-12 12:11:05.002 []                 info: Syncing - 51 days left - 0.417 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828253 - peers: 20
Aug-12 12:11:17.048 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828254 - peers: 21
Aug-12 12:11:29.001 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828255 - peers: 23
Aug-12 12:11:41.075 []                 info: Syncing - ? left - 0.00 slots/s - finalized: 43 0x1c6b…71d1 - head: 1439 0xa35e…f52c - clockSlot: 1828256 - peers: 26

@q9f
Copy link
Contributor Author

q9f commented Aug 12, 2021

Ok, I was resetting the databases and did fresh sync on all networks and they get stuck on the very same blocks again:

@q9f q9f changed the title prater sync stuck on head 4095 0xb02b…cc80 (epoch 126 0x8131…acb4) beaconchain sync stuck on mainnet, pyrmont, and prater Aug 12, 2021
@dapplion
Copy link
Contributor

Thanks for reporting! Could you dump here the debug logs? If block processing of sync blocks fails it will be visible there

@q9f
Copy link
Contributor Author

q9f commented Aug 12, 2021

I think I already identified it:

Aug-12 16:07:19.142 [CHAIN]         verbose: New chain head headSlot=1425, headRoot=0xab734a302f23b747e5f87d4e3c20f62a48f7f260db047dea107fd57399fe6b6c
Aug-12 16:07:19.176 [CHAIN]         verbose: Block processed slot=1426, root=0xfd61c648f9690512197501c1e7d467ca72de0ac923ccd3feed76644a6d29407d
Aug-12 16:07:19.177 [CHAIN]         verbose: New chain head headSlot=1426, headRoot=0xfd61c648f9690512197501c1e7d467ca72de0ac923ccd3feed76644a6d29407d
Aug-12 16:07:19.211 [CHAIN]         verbose: Block processed slot=1427, root=0x14e66dde7eaf4115b5ccb5e78c7377e6c2e9e05a96e0c17d6937df4ecab58434
Aug-12 16:07:19.212 [CHAIN]         verbose: New chain head headSlot=1427, headRoot=0x14e66dde7eaf4115b5ccb5e78c7377e6c2e9e05a96e0c17d6937df4ecab58434
Aug-12 16:07:19.246 [CHAIN]         verbose: Block processed slot=1428, root=0x0937f156137fefe2472e7a0a749dae46da28855c745fe33dfc30adb901ebde7a
Aug-12 16:07:19.247 [CHAIN]         verbose: New chain head headSlot=1428, headRoot=0x0937f156137fefe2472e7a0a749dae46da28855c745fe33dfc30adb901ebde7a
Aug-12 16:07:19.284 [CHAIN]         verbose: Block processed slot=1429, root=0x3b5195dc747737a572cab221bb8fcd9fea0daad3a4ff956106b174b46ac24555
Aug-12 16:07:19.285 [CHAIN]         verbose: New chain head headSlot=1429, headRoot=0x3b5195dc747737a572cab221bb8fcd9fea0daad3a4ff956106b174b46ac24555
Aug-12 16:07:19.320 [CHAIN]         verbose: Block processed slot=1431, root=0xedac6acc7f65396aa497db50638b5fe21fb44ec0465056a0caa30f4c27bbf98e
Aug-12 16:07:19.321 [CHAIN]         verbose: New chain head headSlot=1431, headRoot=0xedac6acc7f65396aa497db50638b5fe21fb44ec0465056a0caa30f4c27bbf98e
Aug-12 16:07:19.355 [CHAIN]         verbose: Block processed slot=1432, root=0xf0915c32c74c58d06752777d2f4d0a1dfe2dafa717210997504948b9848acd5a
Aug-12 16:07:19.356 [CHAIN]         verbose: New chain head headSlot=1432, headRoot=0xf0915c32c74c58d06752777d2f4d0a1dfe2dafa717210997504948b9848acd5a
Aug-12 16:07:19.392 [CHAIN]         verbose: Block processed slot=1433, root=0x70b94dc4a8ebf1df562510d172a4ab6b207ea58690d5ada7facdeba481b718a6
Aug-12 16:07:19.393 [CHAIN]         verbose: New chain head headSlot=1433, headRoot=0x70b94dc4a8ebf1df562510d172a4ab6b207ea58690d5ada7facdeba481b718a6
Aug-12 16:07:19.426 [CHAIN]         verbose: Block processed slot=1434, root=0xd24fb726512adac1dd79a6bd5a34529ce02141e3ffff3d4e03e80a4a2bd37899
Aug-12 16:07:19.427 [CHAIN]         verbose: New chain head headSlot=1434, headRoot=0xd24fb726512adac1dd79a6bd5a34529ce02141e3ffff3d4e03e80a4a2bd37899
Aug-12 16:07:19.462 [CHAIN]         verbose: Block processed slot=1437, root=0x05d3a948feba7df106f35b8829b660184eb736bddb08fcf259ea34afb0090160
Aug-12 16:07:19.465 [CHAIN]         verbose: New chain head headSlot=1437, headRoot=0x05d3a948feba7df106f35b8829b660184eb736bddb08fcf259ea34afb0090160
Aug-12 16:07:19.510 [CHAIN]         verbose: Block processed slot=1439, root=0xa35e75bf8bc3316d4c33fda7fadec35c363774467455def4591aa283511af52c
Aug-12 16:07:19.511 [CHAIN]         verbose: New chain head headSlot=1439, headRoot=0xa35e75bf8bc3316d4c33fda7fadec35c363774467455def4591aa283511af52c
Aug-12 16:07:19.776 [CHAIN]         verbose: Checkpoint processed epoch=45, root=0xa35e75bf8bc3316d4c33fda7fadec35c363774467455def4591aa283511af52c
Aug-12 16:07:19.777 [CHAIN]         verbose: Checkpoint justified epoch=44, root=0x9a790dd0acc27f40fa61d3e45393590d21a7a841ce7fa08d3d3c78b4b68a6e75
Aug-12 16:07:19.777 [CHAIN]         verbose: Checkpoint finalized epoch=43, root=0x1c6b0e854533f7824c8de6e8747567dd0737637f4170c08dd8437e18f7ba71d1
Aug-12 16:07:19.793 [SYNC]          verbose: Batch process error id=Finalized, startEpoch=44, status=Processing code=BLOCK_ERROR_BEACON_CHAIN_ERROR, error={"message":"Block proposer index does not match state proposer indexblockProposerIndex=16980 stateProposerIndex=13462"}
Error: Block proposer index does not match state proposer indexblockProposerIndex=16980 stateProposerIndex=13462
    at processBlockHeader (/home/user/lodestar/packages/beacon-state-transition/src/allForks/block/processBlockHeader.ts:21:11)
    at Object.processBlock [as phase0] (/home/user/lodestar/packages/beacon-state-transition/src/phase0/block/index.ts:31:3)
    at processBlock (/home/user/lodestar/packages/beacon-state-transition/src/allForks/stateTransition.ts:92:5)
    at Object.stateTransition (/home/user/lodestar/packages/beacon-state-transition/src/allForks/stateTransition.ts:61:3)
    at runStateTransition (/home/user/lodestar/packages/lodestar/src/chain/blocks/stateTransition.ts:83:30)
    at processBlocksInEpoch (/home/user/lodestar/packages/lodestar/src/chain/blocks/process.ts:108:18)
    at processChainSegment (/home/user/lodestar/packages/lodestar/src/chain/blocks/process.ts:77:7)
    at processChainSegmentJob (/home/user/lodestar/packages/lodestar/src/chain/blocks/processor.ts:151:3)
    at JobItemQueue.runJob (/home/user/lodestar/packages/lodestar/src/util/queue/itemQueue.ts:85:22)
Error: BLOCK_ERROR_BEACON_CHAIN_ERROR
    at processChainSegment (/home/user/lodestar/packages/lodestar/src/chain/blocks/process.ts:80:15)
    at processChainSegmentJob (/home/user/lodestar/packages/lodestar/src/chain/blocks/processor.ts:151:3)
    at JobItemQueue.runJob (/home/user/lodestar/packages/lodestar/src/util/queue/itemQueue.ts:85:22)

@dapplion dapplion added the prio-high Resolve issues as soon as possible. label Aug 12, 2021
@dapplion
Copy link
Contributor

Oh damn! I wonder how is that possible, given that we pass specs tests. @tuyennhv Could you take a look?

Error: Block proposer index does not match state proposer indexblockProposerIndex=16980 stateProposerIndex=13462
    at processBlockHeader (/home/user/lodestar/packages/beacon-state-transition/src/allForks/block/processBlockHeader.ts:21:11)
    at Object.processBlock [as phase0] (/home/user/lodestar/packages/beacon-state-transition/src/phase0/block/index.ts:31:3)
    at processBlock (/home/user/lodestar/packages/beacon-state-transition/src/allForks/stateTransition.ts:92:5)
    at Object.stateTransition (/home/user/lodestar/packages/beacon-state-transition/src/allForks/stateTransition.ts:61:3)

@twoeths twoeths self-assigned this Aug 13, 2021
@q9f q9f added mod6-statetransition prio-critical Drop everything to resolve this immediately. Examples: consensus bug, un-usable CLI and removed prio-high Resolve issues as soon as possible. labels Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-critical Drop everything to resolve this immediately. Examples: consensus bug, un-usable CLI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants