Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster fastMsgIdFn using xxhash #4649

Merged
merged 6 commits into from Oct 7, 2022

Conversation

twoeths
Copy link
Contributor

@twoeths twoeths commented Oct 7, 2022

Motivation

  • We want faster fastMsgIdFn(), it turns out it has a big impact on the I/O lagged issue

Description

Closes #4603
could close #4600 due to its significantly faster sumitPoolAttestations request time

TODO

  • monitor the metrics on feat1 group

@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 04f82c0 Previous: ca27458 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 2.5423 ms/op 1.7624 ms/op 1.44
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 75.813 us/op 64.241 us/op 1.18
BLS verify - blst-native 1.8556 ms/op 2.1652 ms/op 0.86
BLS verifyMultipleSignatures 3 - blst-native 3.8043 ms/op 4.4787 ms/op 0.85
BLS verifyMultipleSignatures 8 - blst-native 8.1872 ms/op 9.6628 ms/op 0.85
BLS verifyMultipleSignatures 32 - blst-native 29.698 ms/op 35.144 ms/op 0.85
BLS aggregatePubkeys 32 - blst-native 39.281 us/op 46.721 us/op 0.84
BLS aggregatePubkeys 128 - blst-native 153.14 us/op 182.19 us/op 0.84
getAttestationsForBlock 92.794 ms/op 75.866 ms/op 1.22
isKnown best case - 1 super set check 425.00 ns/op 491.00 ns/op 0.87
isKnown normal case - 2 super set checks 411.00 ns/op 479.00 ns/op 0.86
isKnown worse case - 16 super set checks 410.00 ns/op 480.00 ns/op 0.85
CheckpointStateCache - add get delete 9.0140 us/op 8.7250 us/op 1.03
validate gossip signedAggregateAndProof - struct 4.2638 ms/op 5.0209 ms/op 0.85
validate gossip attestation - struct 2.0367 ms/op 2.3771 ms/op 0.86
pickEth1Vote - no votes 2.2551 ms/op 2.1786 ms/op 1.04
pickEth1Vote - max votes 23.662 ms/op 18.038 ms/op 1.31
pickEth1Vote - Eth1Data hashTreeRoot value x2048 11.808 ms/op 12.287 ms/op 0.96
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 22.558 ms/op 19.746 ms/op 1.14
pickEth1Vote - Eth1Data fastSerialize value x2048 1.6088 ms/op 1.4400 ms/op 1.12
pickEth1Vote - Eth1Data fastSerialize tree x2048 15.834 ms/op 12.626 ms/op 1.25
bytes32 toHexString 1.1770 us/op 980.00 ns/op 1.20
bytes32 Buffer.toString(hex) 758.00 ns/op 739.00 ns/op 1.03
bytes32 Buffer.toString(hex) from Uint8Array 919.00 ns/op 998.00 ns/op 0.92
bytes32 Buffer.toString(hex) + 0x 734.00 ns/op 752.00 ns/op 0.98
Object access 1 prop 0.40000 ns/op 0.37000 ns/op 1.08
Map access 1 prop 0.29700 ns/op 0.30200 ns/op 0.98
Object get x1000 18.473 ns/op 11.894 ns/op 1.55
Map get x1000 1.0250 ns/op 0.95800 ns/op 1.07
Object set x1000 128.01 ns/op 71.139 ns/op 1.80
Map set x1000 74.795 ns/op 47.155 ns/op 1.59
Return object 10000 times 0.36660 ns/op 0.43350 ns/op 0.85
Throw Error 10000 times 6.1772 us/op 6.0625 us/op 1.02
fastMsgIdFn sha256 / 200 bytes 4.3070 us/op
fastMsgIdFn h32 xxhash / 200 bytes 559.00 ns/op
fastMsgIdFn h64 xxhash / 200 bytes 687.00 ns/op
fastMsgIdFn sha256 / 1000 bytes 13.350 us/op
fastMsgIdFn h32 xxhash / 1000 bytes 696.00 ns/op
fastMsgIdFn h64 xxhash / 1000 bytes 875.00 ns/op
fastMsgIdFn sha256 / 10000 bytes 113.03 us/op
fastMsgIdFn h32 xxhash / 10000 bytes 2.2630 us/op
fastMsgIdFn h64 xxhash / 10000 bytes 1.8040 us/op
enrSubnets - fastDeserialize 64 bits 2.8840 us/op 2.5040 us/op 1.15
enrSubnets - ssz BitVector 64 bits 763.00 ns/op 814.00 ns/op 0.94
enrSubnets - fastDeserialize 4 bits 429.00 ns/op 376.00 ns/op 1.14
enrSubnets - ssz BitVector 4 bits 819.00 ns/op 830.00 ns/op 0.99
prioritizePeers score -10:0 att 32-0.1 sync 2-0 102.10 us/op 81.436 us/op 1.25
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 140.28 us/op 123.28 us/op 1.14
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 254.28 us/op 206.14 us/op 1.23
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 393.60 us/op 334.37 us/op 1.18
prioritizePeers score 0:0 att 64-1 sync 4-1 467.83 us/op 404.53 us/op 1.16
RateTracker 1000000 limit, 1 obj count per request 215.40 ns/op 182.42 ns/op 1.18
RateTracker 1000000 limit, 2 obj count per request 164.98 ns/op 131.98 ns/op 1.25
RateTracker 1000000 limit, 4 obj count per request 139.50 ns/op 107.09 ns/op 1.30
RateTracker 1000000 limit, 8 obj count per request 124.33 ns/op 94.564 ns/op 1.31
RateTracker with prune 4.8590 us/op 3.8560 us/op 1.26
array of 16000 items push then shift 3.1579 us/op 51.590 us/op 0.06
LinkedList of 16000 items push then shift 19.067 ns/op 12.274 ns/op 1.55
array of 16000 items push then pop 262.06 ns/op 202.26 ns/op 1.30
LinkedList of 16000 items push then pop 17.676 ns/op 12.216 ns/op 1.45
array of 24000 items push then shift 4.6062 us/op 77.345 us/op 0.06
LinkedList of 24000 items push then shift 23.969 ns/op 12.314 ns/op 1.95
array of 24000 items push then pop 220.27 ns/op 192.82 ns/op 1.14
LinkedList of 24000 items push then pop 20.045 ns/op 12.189 ns/op 1.64
intersect bitArray bitLen 8 11.620 ns/op 10.958 ns/op 1.06
intersect array and set length 8 178.42 ns/op 134.67 ns/op 1.32
intersect bitArray bitLen 128 72.222 ns/op 57.218 ns/op 1.26
intersect array and set length 128 2.4137 us/op 1.8024 us/op 1.34
Buffer.concat 32 items 1.9540 ns/op 1.8030 ns/op 1.08
pass gossip attestations to forkchoice per slot 4.1182 ms/op 5.1841 ms/op 0.79
computeDeltas 5.9953 ms/op 4.8346 ms/op 1.24
computeProposerBoostScoreFromBalances 921.27 us/op 807.90 us/op 1.14
altair processAttestation - 250000 vs - 7PWei normalcase 4.2640 ms/op 3.3314 ms/op 1.28
altair processAttestation - 250000 vs - 7PWei worstcase 6.1838 ms/op 5.1085 ms/op 1.21
altair processAttestation - setStatus - 1/6 committees join 218.51 us/op 179.51 us/op 1.22
altair processAttestation - setStatus - 1/3 committees join 429.13 us/op 352.05 us/op 1.22
altair processAttestation - setStatus - 1/2 committees join 579.01 us/op 502.88 us/op 1.15
altair processAttestation - setStatus - 2/3 committees join 750.09 us/op 663.82 us/op 1.13
altair processAttestation - setStatus - 4/5 committees join 1.0292 ms/op 922.43 us/op 1.12
altair processAttestation - setStatus - 100% committees join 1.2274 ms/op 1.1121 ms/op 1.10
altair processBlock - 250000 vs - 7PWei normalcase 29.725 ms/op 25.206 ms/op 1.18
altair processBlock - 250000 vs - 7PWei normalcase hashState 41.495 ms/op 38.511 ms/op 1.08
altair processBlock - 250000 vs - 7PWei worstcase 84.768 ms/op 77.254 ms/op 1.10
altair processBlock - 250000 vs - 7PWei worstcase hashState 102.30 ms/op 102.99 ms/op 0.99
phase0 processBlock - 250000 vs - 7PWei normalcase 3.9904 ms/op 3.0701 ms/op 1.30
phase0 processBlock - 250000 vs - 7PWei worstcase 48.149 ms/op 50.249 ms/op 0.96
altair processEth1Data - 250000 vs - 7PWei normalcase 936.39 us/op 662.50 us/op 1.41
Tree 40 250000 create 916.17 ms/op 675.49 ms/op 1.36
Tree 40 250000 get(125000) 294.46 ns/op 262.31 ns/op 1.12
Tree 40 250000 set(125000) 2.7275 us/op 2.2658 us/op 1.20
Tree 40 250000 toArray() 33.745 ms/op 27.395 ms/op 1.23
Tree 40 250000 iterate all - toArray() + loop 34.137 ms/op 27.629 ms/op 1.24
Tree 40 250000 iterate all - get(i) 112.82 ms/op 109.44 ms/op 1.03
MutableVector 250000 create 16.579 ms/op 14.482 ms/op 1.14
MutableVector 250000 get(125000) 13.414 ns/op 11.140 ns/op 1.20
MutableVector 250000 set(125000) 706.95 ns/op 541.26 ns/op 1.31
MutableVector 250000 toArray() 7.7176 ms/op 5.9968 ms/op 1.29
MutableVector 250000 iterate all - toArray() + loop 7.8503 ms/op 6.0070 ms/op 1.31
MutableVector 250000 iterate all - get(i) 3.2823 ms/op 2.8564 ms/op 1.15
Array 250000 create 7.2739 ms/op 5.3576 ms/op 1.36
Array 250000 clone - spread 3.7890 ms/op 2.3122 ms/op 1.64
Array 250000 get(125000) 1.5330 ns/op 1.1330 ns/op 1.35
Array 250000 set(125000) 1.5350 ns/op 1.1300 ns/op 1.36
Array 250000 iterate all - loop 167.84 us/op 150.93 us/op 1.11
effectiveBalanceIncrements clone Uint8Array 300000 94.391 us/op 35.161 us/op 2.68
effectiveBalanceIncrements clone MutableVector 300000 1.1200 us/op 697.00 ns/op 1.61
effectiveBalanceIncrements rw all Uint8Array 300000 252.50 us/op 247.20 us/op 1.02
effectiveBalanceIncrements rw all MutableVector 300000 240.12 ms/op 128.79 ms/op 1.86
phase0 afterProcessEpoch - 250000 vs - 7PWei 191.13 ms/op 201.07 ms/op 0.95
phase0 beforeProcessEpoch - 250000 vs - 7PWei 87.629 ms/op 58.005 ms/op 1.51
altair processEpoch - mainnet_e81889 542.32 ms/op 559.89 ms/op 0.97
mainnet_e81889 - altair beforeProcessEpoch 170.17 ms/op 112.09 ms/op 1.52
mainnet_e81889 - altair processJustificationAndFinalization 31.313 us/op 17.082 us/op 1.83
mainnet_e81889 - altair processInactivityUpdates 12.061 ms/op 8.8775 ms/op 1.36
mainnet_e81889 - altair processRewardsAndPenalties 99.178 ms/op 77.482 ms/op 1.28
mainnet_e81889 - altair processRegistryUpdates 5.9500 us/op 2.6220 us/op 2.27
mainnet_e81889 - altair processSlashings 1.1280 us/op 617.00 ns/op 1.83
mainnet_e81889 - altair processEth1DataReset 1.3700 us/op 587.00 ns/op 2.33
mainnet_e81889 - altair processEffectiveBalanceUpdates 2.2746 ms/op 2.0588 ms/op 1.10
mainnet_e81889 - altair processSlashingsReset 9.0630 us/op 4.4700 us/op 2.03
mainnet_e81889 - altair processRandaoMixesReset 9.7590 us/op 4.7180 us/op 2.07
mainnet_e81889 - altair processHistoricalRootsUpdate 1.4570 us/op 693.00 ns/op 2.10
mainnet_e81889 - altair processParticipationFlagUpdates 6.8380 us/op 2.3080 us/op 2.96
mainnet_e81889 - altair processSyncCommitteeUpdates 1.2240 us/op 509.00 ns/op 2.40
mainnet_e81889 - altair afterProcessEpoch 200.50 ms/op 199.75 ms/op 1.00
phase0 processEpoch - mainnet_e58758 562.33 ms/op 480.85 ms/op 1.17
mainnet_e58758 - phase0 beforeProcessEpoch 253.72 ms/op 175.28 ms/op 1.45
mainnet_e58758 - phase0 processJustificationAndFinalization 27.751 us/op 16.114 us/op 1.72
mainnet_e58758 - phase0 processRewardsAndPenalties 82.103 ms/op 71.658 ms/op 1.15
mainnet_e58758 - phase0 processRegistryUpdates 15.875 us/op 8.3080 us/op 1.91
mainnet_e58758 - phase0 processSlashings 1.3550 us/op 545.00 ns/op 2.49
mainnet_e58758 - phase0 processEth1DataReset 1.2550 us/op 618.00 ns/op 2.03
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 2.0162 ms/op 1.6985 ms/op 1.19
mainnet_e58758 - phase0 processSlashingsReset 7.7450 us/op 3.4600 us/op 2.24
mainnet_e58758 - phase0 processRandaoMixesReset 9.6630 us/op 6.5510 us/op 1.48
mainnet_e58758 - phase0 processHistoricalRootsUpdate 1.5230 us/op 620.00 ns/op 2.46
mainnet_e58758 - phase0 processParticipationRecordUpdates 9.2490 us/op 3.3930 us/op 2.73
mainnet_e58758 - phase0 afterProcessEpoch 165.05 ms/op 164.23 ms/op 1.01
phase0 processEffectiveBalanceUpdates - 250000 normalcase 2.6894 ms/op 2.0384 ms/op 1.32
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 3.4639 ms/op 2.3031 ms/op 1.50
altair processInactivityUpdates - 250000 normalcase 43.818 ms/op 34.260 ms/op 1.28
altair processInactivityUpdates - 250000 worstcase 41.903 ms/op 41.118 ms/op 1.02
phase0 processRegistryUpdates - 250000 normalcase 12.883 us/op 7.2980 us/op 1.77
phase0 processRegistryUpdates - 250000 badcase_full_deposits 449.57 us/op 385.89 us/op 1.17
phase0 processRegistryUpdates - 250000 worstcase 0.5 252.13 ms/op 175.94 ms/op 1.43
altair processRewardsAndPenalties - 250000 normalcase 137.97 ms/op 74.677 ms/op 1.85
altair processRewardsAndPenalties - 250000 worstcase 90.865 ms/op 72.667 ms/op 1.25
phase0 getAttestationDeltas - 250000 normalcase 14.376 ms/op 12.011 ms/op 1.20
phase0 getAttestationDeltas - 250000 worstcase 14.707 ms/op 12.231 ms/op 1.20
phase0 processSlashings - 250000 worstcase 5.4441 ms/op 4.9218 ms/op 1.11
altair processSyncCommitteeUpdates - 250000 288.44 ms/op 289.22 ms/op 1.00
BeaconState.hashTreeRoot - No change 503.00 ns/op 560.00 ns/op 0.90
BeaconState.hashTreeRoot - 1 full validator 60.925 us/op 66.039 us/op 0.92
BeaconState.hashTreeRoot - 32 full validator 854.54 us/op 761.60 us/op 1.12
BeaconState.hashTreeRoot - 512 full validator 6.0243 ms/op 6.8300 ms/op 0.88
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 77.362 us/op 88.991 us/op 0.87
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.1918 ms/op 1.3021 ms/op 0.92
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 16.385 ms/op 17.995 ms/op 0.91
BeaconState.hashTreeRoot - 1 balances 64.138 us/op 67.072 us/op 0.96
BeaconState.hashTreeRoot - 32 balances 585.65 us/op 640.34 us/op 0.91
BeaconState.hashTreeRoot - 512 balances 6.0611 ms/op 6.2114 ms/op 0.98
BeaconState.hashTreeRoot - 250000 balances 100.04 ms/op 101.42 ms/op 0.99
aggregationBits - 2048 els - zipIndexesInBitList 43.710 us/op 28.342 us/op 1.54
regular array get 100000 times 67.551 us/op 60.530 us/op 1.12
wrappedArray get 100000 times 67.443 us/op 60.565 us/op 1.11
arrayWithProxy get 100000 times 29.323 ms/op 28.752 ms/op 1.02
ssz.Root.equals 571.00 ns/op 490.00 ns/op 1.17
byteArrayEquals 543.00 ns/op 483.00 ns/op 1.12
shuffle list - 16384 els 12.781 ms/op 11.354 ms/op 1.13
shuffle list - 250000 els 166.14 ms/op 168.52 ms/op 0.99
processSlot - 1 slots 11.771 us/op 13.849 us/op 0.85
processSlot - 32 slots 1.8049 ms/op 2.0069 ms/op 0.90
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 602.53 us/op 388.20 us/op 1.55
getCommitteeAssignments - req 1 vs - 250000 vc 5.3462 ms/op 5.3828 ms/op 0.99
getCommitteeAssignments - req 100 vs - 250000 vc 7.3242 ms/op 7.8650 ms/op 0.93
getCommitteeAssignments - req 1000 vs - 250000 vc 7.7942 ms/op 8.4669 ms/op 0.92
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 10.060 ns/op 9.0300 ns/op 1.11
state getBlockRootAtSlot - 250000 vs - 7PWei 1.2289 us/op 982.38 ns/op 1.25
computeProposers - vc 250000 18.016 ms/op 16.783 ms/op 1.07
computeEpochShuffling - vc 250000 171.69 ms/op 170.52 ms/op 1.01
getNextSyncCommittee - vc 250000 283.08 ms/op 280.33 ms/op 1.01

by benchmarkbot/action

@twoeths
Copy link
Contributor Author

twoeths commented Oct 7, 2022

it's been 8h since I tested this branch on feat1 (vs feat2, both configured after_block_delay_slot_fraction as 3 in order to submit attestations right at 1/3 of slot, that's a very busy time to reproduce the missed attestation issue on goerli)

  • feat1 (this branch)

Screen Shot 2022-10-07 at 21 39 03

  • feat2 (unstable)

Screen Shot 2022-10-07 at 21 39 38

  • feat1 (this branch) - submitPoolsAttestation request time

Screen Shot 2022-10-07 at 21 40 31

  • feat2 (unstable) - submitPoolsAttestation request time

Screen Shot 2022-10-07 at 21 41 03

all other metrics are the same

@twoeths twoeths marked this pull request as ready for review October 7, 2022 14:43
@twoeths twoeths requested a review from a team as a code owner October 7, 2022 14:43
@twoeths
Copy link
Contributor Author

twoeths commented Oct 7, 2022

also posting the benchmark in this PR

network / gossip / fastMsgIdFn
    ✔ fastMsgIdFn sha256 / 200 bytes                                      232180.2 ops/s    4.307000 us/op        -      84581 runs  0.404 s
    ✔ fastMsgIdFn h32 xxhash / 200 bytes                                   178[89](https://github.com/ChainSafe/lodestar/actions/runs/3205610042/jobs/5238344436#step:9:90)09 ops/s    559.0000 ns/op        -    1215000 runs   1.11 s
    ✔ fastMsgIdFn h64 xxhash / 200 bytes                                   1455604 ops/s    687.0000 ns/op        -     [90](https://github.com/ChainSafe/lodestar/actions/runs/3205610042/jobs/5238344436#step:9:91)8438 runs   1.01 s
    ✔ fastMsgIdFn sha256 / 1000 bytes                                     74906.37 ops/s    13.35000 us/op        -      57568 runs  0.808 s
    ✔ fastMsgIdFn h32 xxhash / 1000 bytes                                  1436782 ops/s    696.0000 ns/op        -     666281 runs  0.707 s
    ✔ fastMsgIdFn h64 xxhash / 1000 bytes                                  1142857 ops/s    875.0000 ns/op        -     731729 runs  0.[91](https://github.com/ChainSafe/lodestar/actions/runs/3205610042/jobs/5238344436#step:9:92)4 s
    ✔ fastMsgIdFn sha256 / 10000 bytes                                    8846.896 ops/s    113.0340 us/op        -       1779 runs  0.316 s
    ✔ fastMsgIdFn h32 xxhash / 10000 bytes                                441891.3 ops/s    2.263000 us/op        -     109369 runs  0.303 s
    ✔ fastMsgIdFn h64 xxhash / 10000 bytes                                554323.7 ops/s    1.804000 us/op        -     275290 runs  0.610 s

Copy link
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work

@wemeetagain wemeetagain merged commit 081afbb into unstable Oct 7, 2022
@wemeetagain wemeetagain deleted the tuyen/dapplion/gossip-fast-msg-id branch October 7, 2022 20:45
@philknows
Copy link
Member

<3 thanks @tuyennhv !

@twoeths
Copy link
Contributor Author

twoeths commented Oct 9, 2022

This PR only reduce fastMsgIdFn time, it does not help I/O lag issue in #4600
For some reasons, metrics in feat1 is better than feat2 even I deploy unstable to both of them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve fastMsgIdFn Missed attestation after v1.1.0
4 participants