Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix computed db checkpoint for weak subjectivity checks #4575

Merged
merged 4 commits into from Sep 21, 2022

Conversation

g11tech
Copy link
Contributor

@g11tech g11tech commented Sep 20, 2022

Motivation
The checkpoint computed from db state for weak subjetcivity checks is wrongly based on its latestBlockHeader's slot's epoch
which causes a mismatch while comparing epochs in isWithinWeakSubjectivityPeriod and can cause a restarted node to be stuck if the last state written's latestBlockHeader's slot doesn't align

This PR fixes the calculation of a state's checkpoint epoch

@g11tech g11tech requested a review from a team as a code owner September 20, 2022 20:00
@g11tech g11tech enabled auto-merge (squash) September 20, 2022 20:00
@g11tech g11tech added the prio-critical Drop everything to resolve this immediately. Examples: consensus bug, un-usable CLI label Sep 20, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 20, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 03f53e3 Previous: 23c88a2 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 2.4942 ms/op 2.0604 ms/op 1.21
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 72.365 us/op 71.151 us/op 1.02
BLS verify - blst-native 2.1727 ms/op 1.8546 ms/op 1.17
BLS verifyMultipleSignatures 3 - blst-native 4.4819 ms/op 3.7980 ms/op 1.18
BLS verifyMultipleSignatures 8 - blst-native 9.6862 ms/op 8.1792 ms/op 1.18
BLS verifyMultipleSignatures 32 - blst-native 35.208 ms/op 29.639 ms/op 1.19
BLS aggregatePubkeys 32 - blst-native 46.567 us/op 39.095 us/op 1.19
BLS aggregatePubkeys 128 - blst-native 181.95 us/op 153.09 us/op 1.19
getAttestationsForBlock 85.025 ms/op 84.051 ms/op 1.01
isKnown best case - 1 super set check 480.00 ns/op 421.00 ns/op 1.14
isKnown normal case - 2 super set checks 471.00 ns/op 414.00 ns/op 1.14
isKnown worse case - 16 super set checks 473.00 ns/op 414.00 ns/op 1.14
CheckpointStateCache - add get delete 9.3460 us/op 8.7240 us/op 1.07
validate gossip signedAggregateAndProof - struct 5.0482 ms/op 4.2690 ms/op 1.18
validate gossip attestation - struct 2.3654 ms/op 2.0367 ms/op 1.16
pickEth1Vote - no votes 2.2173 ms/op 2.2480 ms/op 0.99
pickEth1Vote - max votes 22.251 ms/op 22.259 ms/op 1.00
pickEth1Vote - Eth1Data hashTreeRoot value x2048 13.329 ms/op 11.250 ms/op 1.18
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 21.929 ms/op 20.846 ms/op 1.05
pickEth1Vote - Eth1Data fastSerialize value x2048 1.5030 ms/op 1.5746 ms/op 0.95
pickEth1Vote - Eth1Data fastSerialize tree x2048 15.617 ms/op 14.993 ms/op 1.04
bytes32 toHexString 1.1330 us/op 1.1390 us/op 0.99
bytes32 Buffer.toString(hex) 781.00 ns/op 768.00 ns/op 1.02
bytes32 Buffer.toString(hex) from Uint8Array 1.0070 us/op 1.0580 us/op 0.95
bytes32 Buffer.toString(hex) + 0x 797.00 ns/op 770.00 ns/op 1.04
Object access 1 prop 0.40500 ns/op 0.39500 ns/op 1.03
Map access 1 prop 0.31600 ns/op 0.29700 ns/op 1.06
Object get x1000 11.291 ns/op 17.459 ns/op 0.65
Map get x1000 0.93900 ns/op 1.0030 ns/op 0.94
Object set x1000 87.519 ns/op 127.39 ns/op 0.69
Map set x1000 58.523 ns/op 76.522 ns/op 0.76
Return object 10000 times 0.43800 ns/op 0.36850 ns/op 1.19
Throw Error 10000 times 6.0168 us/op 5.8947 us/op 1.02
enrSubnets - fastDeserialize 64 bits 2.9640 us/op 2.9960 us/op 0.99
enrSubnets - ssz BitVector 64 bits 807.00 ns/op 807.00 ns/op 1.00
enrSubnets - fastDeserialize 4 bits 418.00 ns/op 432.00 ns/op 0.97
enrSubnets - ssz BitVector 4 bits 827.00 ns/op 832.00 ns/op 0.99
prioritizePeers score -10:0 att 32-0.1 sync 2-0 91.180 us/op 96.091 us/op 0.95
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 129.22 us/op 121.64 us/op 1.06
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 229.28 us/op 222.95 us/op 1.03
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 338.19 us/op 470.23 us/op 0.72
prioritizePeers score 0:0 att 64-1 sync 4-1 408.15 us/op 458.06 us/op 0.89
RateTracker 1000000 limit, 1 obj count per request 190.75 ns/op 216.32 ns/op 0.88
RateTracker 1000000 limit, 2 obj count per request 142.56 ns/op 164.97 ns/op 0.86
RateTracker 1000000 limit, 4 obj count per request 117.92 ns/op 141.18 ns/op 0.84
RateTracker 1000000 limit, 8 obj count per request 106.40 ns/op 128.93 ns/op 0.83
RateTracker with prune 4.8740 us/op 4.6220 us/op 1.05
array of 16000 items push then shift 51.613 us/op 3.1871 us/op 16.19
LinkedList of 16000 items push then shift 13.063 ns/op 17.335 ns/op 0.75
array of 16000 items push then pop 220.97 ns/op 242.40 ns/op 0.91
LinkedList of 16000 items push then pop 12.345 ns/op 16.212 ns/op 0.76
array of 24000 items push then shift 77.374 us/op 4.5593 us/op 16.97
LinkedList of 24000 items push then shift 13.878 ns/op 20.541 ns/op 0.68
array of 24000 items push then pop 205.41 ns/op 208.01 ns/op 0.99
LinkedList of 24000 items push then pop 12.428 ns/op 17.809 ns/op 0.70
intersect bitArray bitLen 8 10.696 ns/op 11.527 ns/op 0.93
intersect array and set length 8 159.52 ns/op 170.97 ns/op 0.93
intersect bitArray bitLen 128 55.674 ns/op 72.058 ns/op 0.77
intersect array and set length 128 1.9443 us/op 2.2608 us/op 0.86
Buffer.concat 32 items 2.1480 ns/op 2.2880 ns/op 0.94
pass gossip attestations to forkchoice per slot 2.9247 ms/op 3.1978 ms/op 0.91
computeDeltas 3.3370 ms/op 3.7758 ms/op 0.88
computeProposerBoostScoreFromBalances 804.00 us/op 920.99 us/op 0.87
altair processAttestation - 250000 vs - 7PWei normalcase 4.1827 ms/op 3.5687 ms/op 1.17
altair processAttestation - 250000 vs - 7PWei worstcase 6.6359 ms/op 5.5283 ms/op 1.20
altair processAttestation - setStatus - 1/6 committees join 194.46 us/op 228.22 us/op 0.85
altair processAttestation - setStatus - 1/3 committees join 369.85 us/op 433.97 us/op 0.85
altair processAttestation - setStatus - 1/2 committees join 532.63 us/op 600.28 us/op 0.89
altair processAttestation - setStatus - 2/3 committees join 685.66 us/op 780.11 us/op 0.88
altair processAttestation - setStatus - 4/5 committees join 972.69 us/op 1.0798 ms/op 0.90
altair processAttestation - setStatus - 100% committees join 1.1660 ms/op 1.2569 ms/op 0.93
altair processBlock - 250000 vs - 7PWei normalcase 25.269 ms/op 26.836 ms/op 0.94
altair processBlock - 250000 vs - 7PWei normalcase hashState 40.148 ms/op 41.469 ms/op 0.97
altair processBlock - 250000 vs - 7PWei worstcase 83.911 ms/op 74.445 ms/op 1.13
altair processBlock - 250000 vs - 7PWei worstcase hashState 113.89 ms/op 107.87 ms/op 1.06
phase0 processBlock - 250000 vs - 7PWei normalcase 3.6172 ms/op 3.2938 ms/op 1.10
phase0 processBlock - 250000 vs - 7PWei worstcase 51.815 ms/op 44.811 ms/op 1.16
altair processEth1Data - 250000 vs - 7PWei normalcase 899.19 us/op 801.99 us/op 1.12
Tree 40 250000 create 1.0254 s/op 799.60 ms/op 1.28
Tree 40 250000 get(125000) 238.81 ns/op 297.20 ns/op 0.80
Tree 40 250000 set(125000) 3.0223 us/op 2.6614 us/op 1.14
Tree 40 250000 toArray() 28.506 ms/op 31.269 ms/op 0.91
Tree 40 250000 iterate all - toArray() + loop 28.530 ms/op 32.550 ms/op 0.88
Tree 40 250000 iterate all - get(i) 112.14 ms/op 111.52 ms/op 1.01
MutableVector 250000 create 14.479 ms/op 17.243 ms/op 0.84
MutableVector 250000 get(125000) 10.657 ns/op 13.052 ns/op 0.82
MutableVector 250000 set(125000) 698.01 ns/op 602.98 ns/op 1.16
MutableVector 250000 toArray() 6.6804 ms/op 7.3221 ms/op 0.91
MutableVector 250000 iterate all - toArray() + loop 6.8815 ms/op 7.4857 ms/op 0.92
MutableVector 250000 iterate all - get(i) 2.9973 ms/op 3.2902 ms/op 0.91
Array 250000 create 6.7692 ms/op 6.7995 ms/op 1.00
Array 250000 clone - spread 3.6782 ms/op 3.8292 ms/op 0.96
Array 250000 get(125000) 1.6470 ns/op 1.5980 ns/op 1.03
Array 250000 set(125000) 1.5400 ns/op 1.5820 ns/op 0.97
Array 250000 iterate all - loop 150.98 us/op 170.14 us/op 0.89
effectiveBalanceIncrements clone Uint8Array 300000 63.278 us/op 86.281 us/op 0.73
effectiveBalanceIncrements clone MutableVector 300000 1.0590 us/op 1.2740 us/op 0.83
effectiveBalanceIncrements rw all Uint8Array 300000 248.83 us/op 252.51 us/op 0.99
effectiveBalanceIncrements rw all MutableVector 300000 190.17 ms/op 210.26 ms/op 0.90
phase0 afterProcessEpoch - 250000 vs - 7PWei 192.54 ms/op 190.07 ms/op 1.01
phase0 beforeProcessEpoch - 250000 vs - 7PWei 78.314 ms/op 68.193 ms/op 1.15
altair processEpoch - mainnet_e81889 549.15 ms/op 576.83 ms/op 0.95
mainnet_e81889 - altair beforeProcessEpoch 120.06 ms/op 146.86 ms/op 0.82
mainnet_e81889 - altair processJustificationAndFinalization 16.752 us/op 20.372 us/op 0.82
mainnet_e81889 - altair processInactivityUpdates 9.0451 ms/op 11.130 ms/op 0.81
mainnet_e81889 - altair processRewardsAndPenalties 80.528 ms/op 89.532 ms/op 0.90
mainnet_e81889 - altair processRegistryUpdates 2.6030 us/op 3.1250 us/op 0.83
mainnet_e81889 - altair processSlashings 578.00 ns/op 931.00 ns/op 0.62
mainnet_e81889 - altair processEth1DataReset 731.00 ns/op 1.2560 us/op 0.58
mainnet_e81889 - altair processEffectiveBalanceUpdates 2.2802 ms/op 2.4092 ms/op 0.95
mainnet_e81889 - altair processSlashingsReset 4.1990 us/op 6.5040 us/op 0.65
mainnet_e81889 - altair processRandaoMixesReset 4.1870 us/op 8.6490 us/op 0.48
mainnet_e81889 - altair processHistoricalRootsUpdate 734.00 ns/op 1.0740 us/op 0.68
mainnet_e81889 - altair processParticipationFlagUpdates 2.0570 us/op 3.3820 us/op 0.61
mainnet_e81889 - altair processSyncCommitteeUpdates 578.00 ns/op 873.00 ns/op 0.66
mainnet_e81889 - altair afterProcessEpoch 201.46 ms/op 219.83 ms/op 0.92
phase0 processEpoch - mainnet_e58758 496.94 ms/op 579.29 ms/op 0.86
mainnet_e58758 - phase0 beforeProcessEpoch 192.66 ms/op 241.16 ms/op 0.80
mainnet_e58758 - phase0 processJustificationAndFinalization 16.630 us/op 22.975 us/op 0.72
mainnet_e58758 - phase0 processRewardsAndPenalties 125.80 ms/op 115.20 ms/op 1.09
mainnet_e58758 - phase0 processRegistryUpdates 8.3280 us/op 18.042 us/op 0.46
mainnet_e58758 - phase0 processSlashings 608.00 ns/op 1.3010 us/op 0.47
mainnet_e58758 - phase0 processEth1DataReset 653.00 ns/op 1.2290 us/op 0.53
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.8775 ms/op 2.4125 ms/op 0.78
mainnet_e58758 - phase0 processSlashingsReset 3.5580 us/op 6.8000 us/op 0.52
mainnet_e58758 - phase0 processRandaoMixesReset 4.0490 us/op 10.777 us/op 0.38
mainnet_e58758 - phase0 processHistoricalRootsUpdate 687.00 ns/op 1.5230 us/op 0.45
mainnet_e58758 - phase0 processParticipationRecordUpdates 3.4500 us/op 6.5810 us/op 0.52
mainnet_e58758 - phase0 afterProcessEpoch 165.40 ms/op 160.41 ms/op 1.03
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.9944 ms/op 2.7040 ms/op 0.74
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 2.3041 ms/op 3.1710 ms/op 0.73
altair processInactivityUpdates - 250000 normalcase 41.057 ms/op 38.377 ms/op 1.07
altair processInactivityUpdates - 250000 worstcase 34.441 ms/op 38.871 ms/op 0.89
phase0 processRegistryUpdates - 250000 normalcase 6.3740 us/op 14.076 us/op 0.45
phase0 processRegistryUpdates - 250000 badcase_full_deposits 370.24 us/op 555.57 us/op 0.67
phase0 processRegistryUpdates - 250000 worstcase 0.5 176.20 ms/op 199.65 ms/op 0.88
altair processRewardsAndPenalties - 250000 normalcase 79.040 ms/op 121.40 ms/op 0.65
altair processRewardsAndPenalties - 250000 worstcase 99.294 ms/op 86.339 ms/op 1.15
phase0 getAttestationDeltas - 250000 normalcase 14.554 ms/op 13.091 ms/op 1.11
phase0 getAttestationDeltas - 250000 worstcase 14.072 ms/op 13.783 ms/op 1.02
phase0 processSlashings - 250000 worstcase 5.1826 ms/op 5.3748 ms/op 0.96
altair processSyncCommitteeUpdates - 250000 301.36 ms/op 281.07 ms/op 1.07
BeaconState.hashTreeRoot - No change 541.00 ns/op 490.00 ns/op 1.10
BeaconState.hashTreeRoot - 1 full validator 66.004 us/op 61.573 us/op 1.07
BeaconState.hashTreeRoot - 32 full validator 745.75 us/op 601.03 us/op 1.24
BeaconState.hashTreeRoot - 512 full validator 7.8148 ms/op 7.4795 ms/op 1.04
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 91.822 us/op 77.032 us/op 1.19
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.3296 ms/op 1.1545 ms/op 1.15
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 17.846 ms/op 14.628 ms/op 1.22
BeaconState.hashTreeRoot - 1 balances 68.442 us/op 58.966 us/op 1.16
BeaconState.hashTreeRoot - 32 balances 620.26 us/op 548.39 us/op 1.13
BeaconState.hashTreeRoot - 512 balances 6.6709 ms/op 5.6681 ms/op 1.18
BeaconState.hashTreeRoot - 250000 balances 100.89 ms/op 94.590 ms/op 1.07
aggregationBits - 2048 els - zipIndexesInBitList 29.221 us/op 32.542 us/op 0.90
regular array get 100000 times 61.040 us/op 67.444 us/op 0.91
wrappedArray get 100000 times 61.071 us/op 67.452 us/op 0.91
arrayWithProxy get 100000 times 28.471 ms/op 28.863 ms/op 0.99
ssz.Root.equals 498.00 ns/op 493.00 ns/op 1.01
byteArrayEquals 487.00 ns/op 494.00 ns/op 0.99
shuffle list - 16384 els 11.585 ms/op 11.295 ms/op 1.03
shuffle list - 250000 els 169.31 ms/op 165.94 ms/op 1.02
processSlot - 1 slots 13.334 us/op 14.391 us/op 0.93
processSlot - 32 slots 2.0138 ms/op 1.7270 ms/op 1.17
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 396.57 us/op 339.57 us/op 1.17
getCommitteeAssignments - req 1 vs - 250000 vc 5.4586 ms/op 5.3422 ms/op 1.02
getCommitteeAssignments - req 100 vs - 250000 vc 7.9962 ms/op 7.3427 ms/op 1.09
getCommitteeAssignments - req 1000 vs - 250000 vc 8.5947 ms/op 7.7428 ms/op 1.11
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 8.3000 ns/op 10.270 ns/op 0.81
state getBlockRootAtSlot - 250000 vs - 7PWei 1.1965 us/op 1.1568 us/op 1.03
computeProposers - vc 250000 17.904 ms/op 16.974 ms/op 1.05
computeEpochShuffling - vc 250000 172.84 ms/op 170.33 ms/op 1.01
getNextSyncCommittee - vc 250000 297.86 ms/op 284.01 ms/op 1.05

by benchmarkbot/action

@@ -111,7 +111,7 @@ export function isWithinWeakSubjectivityPeriod(
wsState: BeaconStateAllForks,
wsCheckpoint: Checkpoint
): boolean {
const wsStateEpoch = computeEpochAtSlot(wsState.slot);
const wsStateEpoch = Math.ceil(wsState.slot / SLOTS_PER_EPOCH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@g11tech do we really need this for the fix? it's quite confusing here so either not to include this change, or drop some comments to clarify

epoch: computeEpochAtSlot(state.latestBlockHeader.slot),
// the correct checkpoint is based on state's slot, its latestBlockHeader's slot's epoch can be
// behind the state
epoch: Math.ceil(state.slot / SLOTS_PER_EPOCH),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using state.slot makes sense, thru I think computeEpochAtSlot(state.slot) is enough

if SLOTS_PER_EPOCH = 32 and state.slot = 40 then this function return an epoch of 2 does not sounds right, otherwise please drop some comments to avoid the confusing

also note that there is another getCheckpointFromState function defined somewhere else which expect the real checkpoint state (state.slot % 32 = 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my reasoning: if state slot is > some epoch boundary (ideally it should be on the boundary), and it its a checkpoint state, it should be for next epoch+1 only right? could be possible that the latestBlockHeader belongs from > epoch*32 slot, and hence the checkpoint root might belong to epoch+1
Also this: https://github.com/ChainSafe/lodestar/blob/unstable/packages/beacon-node/src/chain/initState.ts#L226

@g11tech g11tech merged commit fd8e335 into unstable Sep 21, 2022
@g11tech g11tech deleted the g11tech/fix-state-checkpoint branch September 21, 2022 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-critical Drop everything to resolve this immediately. Examples: consensus bug, un-usable CLI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants