Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kurtosis <> Lodestar [merge-testnet] - Beacons diverge at/post the bellatrix(merge) epoch #3639

Closed
g11tech opened this issue Jan 19, 2022 · 3 comments · Fixed by #3641
Closed
Assignees

Comments

@g11tech
Copy link
Contributor

g11tech commented Jan 19, 2022

Kurtosis is a team doing merge audit of lodestar. They configured to run a 2 node lodestar network connected with ELs, however the forkmon would show that the lodestar nodes would diverge. This was consistent and reproducible.

Debug and fix the issue.

Expected: The nodes shouldn't diverge and the network should continue.

FYI: @dapplion

@g11tech g11tech self-assigned this Jan 19, 2022
@g11tech
Copy link
Contributor Author

g11tech commented Jan 19, 2022

UPDATE: figured out that the block by one node doesn't reach the other.

filtered log node 0

Kt4bTJXYqHFiok9mE8b8iLKRECvZ
Jan-18 21:45:57.271 [NETWORK]         debug: gossip - beacon_block - accept slot=61, root=0x9cdc8df022efe7d78330e50e96195edd1aa496e8ae3d0c3e3d09d83f38464371
Jan-18 21:46:03.002 []                 info: Synced - slot: 61 - head: 61 0x9cdc…4371 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:09.229 [NETWORK]       verbose: Publish to topic topic=/eth2/ac437f43/beacon_block/ssz_snappy
Jan-18 21:46:14.980 []                 info: Synced - slot: 62 - head: 62 0xa561…0383 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:15.001 []                 info: Synced - slot: 62 - head: 62 0xa561…0383 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:21.248 [NETWORK]       verbose: Publish to topic topic=/eth2/ac437f43/beacon_block/ssz_snappy
Jan-18 21:46:27.001 []                 info: Synced - slot: 63 - head: 63 0x8f04…80f5 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:38.979 []                 info: Synced - slot: 64 (skipped 1) - head: 63 0x8f04…80f5 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:39.001 []                 info: Synced - slot: 64 (skipped 1) - head: 63 0x8f04…80f5 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:45.406 [NETWORK]       verbose: Publish to topic topic=/eth2/d52d57b1/beacon_block/ssz_snappy
Jan-18 21:46:45.776 [NETWORK]       verbose: Publish to topic topic=/eth2/d52d57b1/beacon_block/ssz_snappy
Jan-18 21:46:51.003 []                 info: Synced - slot: 65 - head: 65 0x62bf…6e78 - execution: valid(0x50d9…d3e1) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:47:03.001 []                 info: Synced - slot: 66 (skipped 1) - head: 65 0x62bf…6e78 - execution: valid(0x50d9…d3e1) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:47:14.979 []                 info: Synced - slot: 67 (skipped 2) - head: 65 0x62bf…6e78 - execution: valid(0x50d9…d3e1) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:47:15.000 []                 info: Synced - slot: 67 (skipped 2) - head: 65 0x62bf…6e78 - execution: valid(0x50d9…d3e1) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:47:21.418 [NETWORK]       verbose: Publish to topic topic=/eth2/d52d57b1/beacon_block/ssz_snappy
Jan-18 21:47:27.002 []                 info: Synced - slot: 68 - head: 68 0x63b6…2cda - execution: valid(0xee37…05d3) - finalized: 0x0000…0000:0 - peers: 1

filtered log node: 1

4b0b9647ce90383
Jan-18 21:46:14.980 []                 info: Synced - slot: 62 - head: 62 0xa561…0383 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:15.001 []                 info: Synced - slot: 62 - head: 62 0xa561…0383 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:21.253 [NETWORK]       verbose: Received gossip block slot=63, root=0x8f04…80f5, curentSlot=63, peerId=16Uiu2HAkwExZk1VQs8gCgqS8HNhoNsPP3t6qBfQU1iEstRVz9zNA
Jan-18 21:46:21.261 [NETWORK]         debug: gossip - beacon_block - accept slot=63, root=0x8f04ca923e9dee7203cd06a8c043194066fe9a137138ef24875f558b9a0380f5
Jan-18 21:46:27.001 []                 info: Synced - slot: 63 - head: 63 0x8f04…80f5 - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:33.596 [NETWORK]       verbose: Publish to topic topic=/eth2/d52d57b1/beacon_block/ssz_snappy
Jan-18 21:46:38.978 []                 info: Synced - slot: 64 - head: 64 0x58cb…77e3 - execution: valid(0xaadf…49e1) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:39.001 []                 info: Synced - slot: 64 - head: 64 0x58cb…77e3 - execution: valid(0xaadf…49e1) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:51.003 []                 info: Synced - slot: 65 (skipped 1) - head: 64 0x58cb…77e3 - execution: valid(0xaadf…49e1) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:46:57.582 [NETWORK]       verbose: Publish to topic topic=/eth2/d52d57b1/beacon_block/ssz_snappy
Jan-18 21:47:03.001 []                 info: Synced - slot: 66 - head: 66 0xf947…1422 - execution: valid(0x81d9…35f3) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:47:09.610 [NETWORK]       verbose: Publish to topic topic=/eth2/d52d57b1/beacon_block/ssz_snappy
Jan-18 21:47:14.999 []                 info: Synced - slot: 67 - head: 67 0x08ec…0325 - execution: valid(0xf856…951d) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:47:15.000 []                 info: Synced - slot: 67 - head: 67 0x08ec…0325 - execution: valid(0xf856…951d) - finalized: 0x0000…0000:0 - peers: 1
Jan-18 21:47:27.002 []                 info: Synced - slot: 68 (skipped 1) - head: 67 0x08ec…0325 - execution: valid(0xf856…951d) - finalized: 0x0000…0000:0 - peers: 1

As is clear by the above the published block by node 0 is not reaching node 1, and vice versa.

@g11tech
Copy link
Contributor Author

g11tech commented Jan 19, 2022

UPDATE:
On futher digging through realized while phase 0 ( topic=/eth2/2c25c978/beacon_block/ssz_snapp) and altair (topic=/eth2/ac437f43/beacon_block/ssz_snappy) beacon block topics were subscribed to, the topic block that was being published for bellatrix:
Publish to topic topic=/eth2/d52d57b1/beacon_block/ssz_snappy
wasn't in the subscribing log of any of the nodes.
node 0

$ cat eth2__user-service__cl-client-0-beacon_1642541534.log | grep Subs | grep block
Jan-18 21:33:45.005 [NETWORK]       verbose: Subscribe to gossipsub topic topic=/eth2/2c25c978/beacon_block/ssz_snappy
Jan-18 21:33:45.088 [NETWORK]       verbose: Subscribe to gossipsub topic topic=/eth2/ac437f43/beacon_block/ssz_snappy

node 1:

$ cat eth2__user-service__cl-client-1-beacon_1642541570.log | grep Subs | grep block
Jan-18 21:33:45.033 [NETWORK]       verbose: Subscribe to gossipsub topic topic=/eth2/2c25c978/beacon_block/ssz_snappy
Jan-18 21:33:45.132 [NETWORK]       verbose: Subscribe to gossipsub topic topic=/eth2/ac437f43/beacon_block/ssz_snappy

Hence the nodes were sending blocks on the correct topics but none of them subscribed to it

@g11tech
Copy link
Contributor Author

g11tech commented Jan 19, 2022

On further digging through realized that the the forks were lined up at 0-> phase0, 1 -> altair, 2-> bellatrix. phase 0 and altair were subscribed at/before epoch 0, but the bellatrix fork subscription was never happening.

On further digging, figured out that getActiveForks being used to subscribe new forks on onEpoch in the network, wouldn't even get bellatrix till epoch 4. because the getActiveForks shifts the epoch back by 3 to calculate current and next epoch, so till epoch 3, it would only give phase0, and altair.

However our bellatrix epoch is at 2. So hence the bug!!!

@g11tech g11tech changed the title Kurtosis <> lodestar interop: The network diverges at/post the bellatrix(merge) epoch Kurtosis devnet sims: Lodestar beacons diverge at/post the bellatrix(merge) epoch Jan 19, 2022
@g11tech g11tech changed the title Kurtosis devnet sims: Lodestar beacons diverge at/post the bellatrix(merge) epoch Kurtosis <> Lodestar [merge-testnet] - Beacons diverge at/post the bellatrix(merge) epoch Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant