Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: selectively use nodejs crypto for noise #5900

Merged
merged 3 commits into from
Sep 15, 2023

Conversation

wemeetagain
Copy link
Member

@wemeetagain wemeetagain commented Aug 22, 2023

Motivation

Description

  • Use nodejs sha256, benchmarks show that it is faster for all payloads except for 64 byte payloads
  • For chacha20poly1305, use the as implementation for smaller payloads, use nodejs crypto for larger payloads
  • benchmark

@github-actions
Copy link
Contributor

github-actions bot commented Aug 22, 2023

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: badcf20 Previous: 80c001b Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 725.62 us/op 826.83 us/op 0.88
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 78.015 us/op 78.044 us/op 1.00
BLS verify - blst-native 1.3146 ms/op 1.2952 ms/op 1.01
BLS verifyMultipleSignatures 3 - blst-native 2.7905 ms/op 2.8022 ms/op 1.00
BLS verifyMultipleSignatures 8 - blst-native 6.0921 ms/op 6.1054 ms/op 1.00
BLS verifyMultipleSignatures 32 - blst-native 22.040 ms/op 21.977 ms/op 1.00
BLS verifyMultipleSignatures 64 - blst-native 43.440 ms/op 45.506 ms/op 0.95
BLS verifyMultipleSignatures 128 - blst-native 88.541 ms/op 86.207 ms/op 1.03
BLS deserializing 10000 signatures 904.60 ms/op 878.03 ms/op 1.03
BLS deserializing 100000 signatures 9.1873 s/op 8.8224 s/op 1.04
BLS verifyMultipleSignatures - same message - 3 - blst-native 1.3670 ms/op 1.3025 ms/op 1.05
BLS verifyMultipleSignatures - same message - 8 - blst-native 1.5395 ms/op 1.4612 ms/op 1.05
BLS verifyMultipleSignatures - same message - 32 - blst-native 2.3962 ms/op 2.2724 ms/op 1.05
BLS verifyMultipleSignatures - same message - 64 - blst-native 3.5040 ms/op 3.8077 ms/op 0.92
BLS verifyMultipleSignatures - same message - 128 - blst-native 5.7675 ms/op 6.4120 ms/op 0.90
BLS aggregatePubkeys 32 - blst-native 26.537 us/op 26.233 us/op 1.01
BLS aggregatePubkeys 128 - blst-native 107.64 us/op 98.584 us/op 1.09
getAttestationsForBlock 50.855 ms/op 40.019 ms/op 1.27
isKnown best case - 1 super set check 329.00 ns/op 292.00 ns/op 1.13
isKnown normal case - 2 super set checks 329.00 ns/op 267.00 ns/op 1.23
isKnown worse case - 16 super set checks 305.00 ns/op 262.00 ns/op 1.16
CheckpointStateCache - add get delete 6.3410 us/op 4.9620 us/op 1.28
validate api signedAggregateAndProof - struct 2.9336 ms/op 2.7490 ms/op 1.07
validate gossip signedAggregateAndProof - struct 2.8836 ms/op 2.6733 ms/op 1.08
validate gossip attestation - vc 640000 1.4129 ms/op 1.3437 ms/op 1.05
batch validate gossip attestation - vc 640000 - chunk 32 174.67 us/op 159.71 us/op 1.09
batch validate gossip attestation - vc 640000 - chunk 64 154.01 us/op 147.17 us/op 1.05
batch validate gossip attestation - vc 640000 - chunk 128 145.82 us/op 140.10 us/op 1.04
batch validate gossip attestation - vc 640000 - chunk 256 137.73 us/op 135.18 us/op 1.02
pickEth1Vote - no votes 1.3016 ms/op 1.2692 ms/op 1.03
pickEth1Vote - max votes 11.029 ms/op 8.6211 ms/op 1.28
pickEth1Vote - Eth1Data hashTreeRoot value x2048 21.451 ms/op 17.008 ms/op 1.26
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 28.372 ms/op 23.936 ms/op 1.19
pickEth1Vote - Eth1Data fastSerialize value x2048 683.90 us/op 580.23 us/op 1.18
pickEth1Vote - Eth1Data fastSerialize tree x2048 5.4295 ms/op 4.9360 ms/op 1.10
bytes32 toHexString 641.00 ns/op 478.00 ns/op 1.34
bytes32 Buffer.toString(hex) 318.00 ns/op 310.00 ns/op 1.03
bytes32 Buffer.toString(hex) from Uint8Array 498.00 ns/op 452.00 ns/op 1.10
bytes32 Buffer.toString(hex) + 0x 332.00 ns/op 282.00 ns/op 1.18
Object access 1 prop 0.18900 ns/op 0.16200 ns/op 1.17
Map access 1 prop 0.15900 ns/op 0.14900 ns/op 1.07
Object get x1000 8.5810 ns/op 7.9340 ns/op 1.08
Map get x1000 0.68800 ns/op 0.63600 ns/op 1.08
Object set x1000 84.203 ns/op 51.214 ns/op 1.64
Map set x1000 52.800 ns/op 38.616 ns/op 1.37
Return object 10000 times 0.28710 ns/op 0.25330 ns/op 1.13
Throw Error 10000 times 4.9189 us/op 3.8457 us/op 1.28
fastMsgIdFn sha256 / 200 bytes 3.5920 us/op 3.3340 us/op 1.08
fastMsgIdFn h32 xxhash / 200 bytes 318.00 ns/op 281.00 ns/op 1.13
fastMsgIdFn h64 xxhash / 200 bytes 380.00 ns/op 347.00 ns/op 1.10
fastMsgIdFn sha256 / 1000 bytes 12.193 us/op 11.633 us/op 1.05
fastMsgIdFn h32 xxhash / 1000 bytes 446.00 ns/op 399.00 ns/op 1.12
fastMsgIdFn h64 xxhash / 1000 bytes 452.00 ns/op 429.00 ns/op 1.05
fastMsgIdFn sha256 / 10000 bytes 108.23 us/op 106.25 us/op 1.02
fastMsgIdFn h32 xxhash / 10000 bytes 2.0440 us/op 1.9440 us/op 1.05
fastMsgIdFn h64 xxhash / 10000 bytes 1.4250 us/op 1.3370 us/op 1.07
send data - 1000 256B messages 21.156 ms/op
send data - 1000 512B messages 28.023 ms/op
send data - 1000 1024B messages 42.868 ms/op
send data - 1000 1200B messages 29.054 ms/op
send data - 1000 2048B messages 40.470 ms/op
send data - 1000 4096B messages 35.549 ms/op
send data - 1000 16384B messages 70.369 ms/op
send data - 1000 65536B messages 242.30 ms/op
enrSubnets - fastDeserialize 64 bits 1.2660 us/op 1.2030 us/op 1.05
enrSubnets - ssz BitVector 64 bits 443.00 ns/op 428.00 ns/op 1.04
enrSubnets - fastDeserialize 4 bits 186.00 ns/op 163.00 ns/op 1.14
enrSubnets - ssz BitVector 4 bits 454.00 ns/op 414.00 ns/op 1.10
prioritizePeers score -10:0 att 32-0.1 sync 2-0 111.00 us/op 104.90 us/op 1.06
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 126.65 us/op 129.47 us/op 0.98
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 164.27 us/op 163.19 us/op 1.01
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 293.91 us/op 284.65 us/op 1.03
prioritizePeers score 0:0 att 64-1 sync 4-1 351.45 us/op 344.36 us/op 1.02
array of 16000 items push then shift 1.6653 us/op 1.6224 us/op 1.03
LinkedList of 16000 items push then shift 9.1600 ns/op 9.0520 ns/op 1.01
array of 16000 items push then pop 80.368 ns/op 89.989 ns/op 0.89
LinkedList of 16000 items push then pop 8.9120 ns/op 8.8300 ns/op 1.01
array of 24000 items push then shift 2.4409 us/op 2.4184 us/op 1.01
LinkedList of 24000 items push then shift 9.0470 ns/op 9.1240 ns/op 0.99
array of 24000 items push then pop 84.824 ns/op 108.97 ns/op 0.78
LinkedList of 24000 items push then pop 8.7070 ns/op 8.9260 ns/op 0.98
intersect bitArray bitLen 8 6.8520 ns/op 6.7710 ns/op 1.01
intersect array and set length 8 54.065 ns/op 55.102 ns/op 0.98
intersect bitArray bitLen 128 31.943 ns/op 31.641 ns/op 1.01
intersect array and set length 128 747.06 ns/op 779.62 ns/op 0.96
bitArray.getTrueBitIndexes() bitLen 128 1.3780 us/op 1.4470 us/op 0.95
bitArray.getTrueBitIndexes() bitLen 248 2.3370 us/op 2.5400 us/op 0.92
bitArray.getTrueBitIndexes() bitLen 512 4.4890 us/op 5.1650 us/op 0.87
Buffer.concat 32 items 904.00 ns/op 960.00 ns/op 0.94
Uint8Array.set 32 items 1.5940 us/op 1.7980 us/op 0.89
Set add up to 64 items then delete first 4.2864 us/op 4.6916 us/op 0.91
OrderedSet add up to 64 items then delete first 5.3624 us/op 5.3950 us/op 0.99
Set add up to 64 items then delete last 4.5863 us/op 5.0045 us/op 0.92
OrderedSet add up to 64 items then delete last 5.6820 us/op 6.3464 us/op 0.90
Set add up to 64 items then delete middle 4.5504 us/op 4.9064 us/op 0.93
OrderedSet add up to 64 items then delete middle 6.9071 us/op 7.5993 us/op 0.91
Set add up to 128 items then delete first 9.4423 us/op 10.119 us/op 0.93
OrderedSet add up to 128 items then delete first 12.349 us/op 13.797 us/op 0.90
Set add up to 128 items then delete last 9.1800 us/op 9.8234 us/op 0.93
OrderedSet add up to 128 items then delete last 11.588 us/op 12.704 us/op 0.91
Set add up to 128 items then delete middle 8.9872 us/op 9.7341 us/op 0.92
OrderedSet add up to 128 items then delete middle 16.790 us/op 18.226 us/op 0.92
Set add up to 256 items then delete first 18.790 us/op 20.469 us/op 0.92
OrderedSet add up to 256 items then delete first 24.648 us/op 25.466 us/op 0.97
Set add up to 256 items then delete last 17.944 us/op 19.924 us/op 0.90
OrderedSet add up to 256 items then delete last 25.116 us/op 24.372 us/op 1.03
Set add up to 256 items then delete middle 19.504 us/op 19.083 us/op 1.02
OrderedSet add up to 256 items then delete middle 51.580 us/op 46.693 us/op 1.10
transfer serialized Status (84 B) 1.8720 us/op 1.7800 us/op 1.05
copy serialized Status (84 B) 1.5290 us/op 1.5860 us/op 0.96
transfer serialized SignedVoluntaryExit (112 B) 1.9150 us/op 1.8850 us/op 1.02
copy serialized SignedVoluntaryExit (112 B) 1.5730 us/op 1.6470 us/op 0.96
transfer serialized ProposerSlashing (416 B) 2.1300 us/op 2.1640 us/op 0.98
copy serialized ProposerSlashing (416 B) 2.0710 us/op 2.2650 us/op 0.91
transfer serialized Attestation (485 B) 2.1610 us/op 2.7200 us/op 0.79
copy serialized Attestation (485 B) 2.1390 us/op 2.9100 us/op 0.74
transfer serialized AttesterSlashing (33232 B) 2.5620 us/op 2.7460 us/op 0.93
copy serialized AttesterSlashing (33232 B) 7.5120 us/op 7.8540 us/op 0.96
transfer serialized Small SignedBeaconBlock (128000 B) 3.0100 us/op 2.8160 us/op 1.07
copy serialized Small SignedBeaconBlock (128000 B) 17.453 us/op 14.362 us/op 1.22
transfer serialized Avg SignedBeaconBlock (200000 B) 3.6530 us/op 3.1880 us/op 1.15
copy serialized Avg SignedBeaconBlock (200000 B) 19.837 us/op 23.747 us/op 0.84
transfer serialized BlobsSidecar (524380 B) 4.2650 us/op 3.2520 us/op 1.31
copy serialized BlobsSidecar (524380 B) 181.97 us/op 92.329 us/op 1.97
transfer serialized Big SignedBeaconBlock (1000000 B) 4.6280 us/op 3.5780 us/op 1.29
copy serialized Big SignedBeaconBlock (1000000 B) 240.23 us/op 360.78 us/op 0.67
pass gossip attestations to forkchoice per slot 3.9021 ms/op 3.8700 ms/op 1.01
forkChoice updateHead vc 100000 bc 64 eq 0 833.79 us/op 681.37 us/op 1.22
forkChoice updateHead vc 600000 bc 64 eq 0 6.9901 ms/op 4.5565 ms/op 1.53
forkChoice updateHead vc 1000000 bc 64 eq 0 8.0697 ms/op 8.2263 ms/op 0.98
forkChoice updateHead vc 600000 bc 320 eq 0 4.6849 ms/op 4.4566 ms/op 1.05
forkChoice updateHead vc 600000 bc 1200 eq 0 4.3999 ms/op 4.5478 ms/op 0.97
forkChoice updateHead vc 600000 bc 7200 eq 0 6.3719 ms/op 5.8192 ms/op 1.09
forkChoice updateHead vc 600000 bc 64 eq 1000 12.452 ms/op 11.974 ms/op 1.04
forkChoice updateHead vc 600000 bc 64 eq 10000 13.780 ms/op 12.927 ms/op 1.07
forkChoice updateHead vc 600000 bc 64 eq 300000 43.988 ms/op 16.684 ms/op 2.64
computeDeltas 500000 validators 300 proto nodes 7.0342 ms/op 6.4984 ms/op 1.08
computeDeltas 500000 validators 1200 proto nodes 6.3659 ms/op 6.2674 ms/op 1.02
computeDeltas 500000 validators 7200 proto nodes 6.2602 ms/op 6.3189 ms/op 0.99
computeDeltas 750000 validators 300 proto nodes 9.4101 ms/op 9.7302 ms/op 0.97
computeDeltas 750000 validators 1200 proto nodes 9.2896 ms/op 9.6109 ms/op 0.97
computeDeltas 750000 validators 7200 proto nodes 9.2377 ms/op 9.6803 ms/op 0.95
computeDeltas 1400000 validators 300 proto nodes 17.583 ms/op 18.168 ms/op 0.97
computeDeltas 1400000 validators 1200 proto nodes 17.364 ms/op 18.231 ms/op 0.95
computeDeltas 1400000 validators 7200 proto nodes 17.418 ms/op 17.955 ms/op 0.97
computeDeltas 2100000 validators 300 proto nodes 25.920 ms/op 27.389 ms/op 0.95
computeDeltas 2100000 validators 1200 proto nodes 26.232 ms/op 26.923 ms/op 0.97
computeDeltas 2100000 validators 7200 proto nodes 26.410 ms/op 26.805 ms/op 0.99
computeProposerBoostScoreFromBalances 500000 validators 3.2789 ms/op 3.2360 ms/op 1.01
computeProposerBoostScoreFromBalances 750000 validators 3.3318 ms/op 3.2414 ms/op 1.03
computeProposerBoostScoreFromBalances 1400000 validators 3.2941 ms/op 3.2422 ms/op 1.02
computeProposerBoostScoreFromBalances 2100000 validators 3.3068 ms/op 3.2674 ms/op 1.01
altair processAttestation - 250000 vs - 7PWei normalcase 2.4934 ms/op 3.0804 ms/op 0.81
altair processAttestation - 250000 vs - 7PWei worstcase 3.3056 ms/op 3.1786 ms/op 1.04
altair processAttestation - setStatus - 1/6 committees join 190.61 us/op 169.63 us/op 1.12
altair processAttestation - setStatus - 1/3 committees join 379.45 us/op 339.33 us/op 1.12
altair processAttestation - setStatus - 1/2 committees join 517.11 us/op 447.46 us/op 1.16
altair processAttestation - setStatus - 2/3 committees join 664.30 us/op 569.33 us/op 1.17
altair processAttestation - setStatus - 4/5 committees join 857.59 us/op 772.04 us/op 1.11
altair processAttestation - setStatus - 100% committees join 1.0122 ms/op 888.12 us/op 1.14
altair processBlock - 250000 vs - 7PWei normalcase 7.3190 ms/op 8.9171 ms/op 0.82
altair processBlock - 250000 vs - 7PWei normalcase hashState 31.084 ms/op 32.873 ms/op 0.95
altair processBlock - 250000 vs - 7PWei worstcase 38.495 ms/op 40.568 ms/op 0.95
altair processBlock - 250000 vs - 7PWei worstcase hashState 97.851 ms/op 88.037 ms/op 1.11
phase0 processBlock - 250000 vs - 7PWei normalcase 2.6414 ms/op 2.9373 ms/op 0.90
phase0 processBlock - 250000 vs - 7PWei worstcase 30.377 ms/op 31.095 ms/op 0.98
altair processEth1Data - 250000 vs - 7PWei normalcase 498.51 us/op 592.04 us/op 0.84
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15 13.458 us/op 11.834 us/op 1.14
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219 79.955 us/op 41.951 us/op 1.91
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42 26.720 us/op 27.519 us/op 0.97
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18 14.392 us/op 16.167 us/op 0.89
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020 164.46 us/op 158.60 us/op 1.04
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777 1.1185 ms/op 1.6127 ms/op 0.69
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384 1.8845 ms/op 1.7815 ms/op 1.06
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384 1.7510 ms/op 1.8483 ms/op 0.95
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384 4.2356 ms/op 3.8874 ms/op 1.09
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384 2.6263 ms/op 2.4962 ms/op 1.05
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384 4.7233 ms/op 6.1052 ms/op 0.77
Tree 40 250000 create 343.53 ms/op 410.38 ms/op 0.84
Tree 40 250000 get(125000) 212.00 ns/op 215.83 ns/op 0.98
Tree 40 250000 set(125000) 848.57 ns/op 1.0764 us/op 0.79
Tree 40 250000 toArray() 20.146 ms/op 23.985 ms/op 0.84
Tree 40 250000 iterate all - toArray() + loop 21.350 ms/op 22.010 ms/op 0.97
Tree 40 250000 iterate all - get(i) 73.343 ms/op 73.569 ms/op 1.00
MutableVector 250000 create 12.328 ms/op 14.899 ms/op 0.83
MutableVector 250000 get(125000) 6.5780 ns/op 7.2460 ns/op 0.91
MutableVector 250000 set(125000) 270.82 ns/op 311.21 ns/op 0.87
MutableVector 250000 toArray() 3.6171 ms/op 4.0712 ms/op 0.89
MutableVector 250000 iterate all - toArray() + loop 3.7840 ms/op 4.1555 ms/op 0.91
MutableVector 250000 iterate all - get(i) 1.5699 ms/op 1.6079 ms/op 0.98
Array 250000 create 3.2543 ms/op 3.6012 ms/op 0.90
Array 250000 clone - spread 1.1172 ms/op 1.1439 ms/op 0.98
Array 250000 get(125000) 0.51900 ns/op 0.61100 ns/op 0.85
Array 250000 set(125000) 0.59200 ns/op 0.71500 ns/op 0.83
Array 250000 iterate all - loop 88.221 us/op 88.407 us/op 1.00
effectiveBalanceIncrements clone Uint8Array 300000 31.812 us/op 48.265 us/op 0.66
effectiveBalanceIncrements clone MutableVector 300000 273.00 ns/op 294.00 ns/op 0.93
effectiveBalanceIncrements rw all Uint8Array 300000 184.65 us/op 182.89 us/op 1.01
effectiveBalanceIncrements rw all MutableVector 300000 77.115 ms/op 92.827 ms/op 0.83
phase0 afterProcessEpoch - 250000 vs - 7PWei 114.77 ms/op 125.31 ms/op 0.92
phase0 beforeProcessEpoch - 250000 vs - 7PWei 42.434 ms/op 44.660 ms/op 0.95
altair processEpoch - mainnet_e81889 444.47 ms/op 483.13 ms/op 0.92
mainnet_e81889 - altair beforeProcessEpoch 66.241 ms/op 71.562 ms/op 0.93
mainnet_e81889 - altair processJustificationAndFinalization 17.213 us/op 20.719 us/op 0.83
mainnet_e81889 - altair processInactivityUpdates 6.5714 ms/op 9.7143 ms/op 0.68
mainnet_e81889 - altair processRewardsAndPenalties 68.001 ms/op 69.842 ms/op 0.97
mainnet_e81889 - altair processRegistryUpdates 3.0820 us/op 3.5720 us/op 0.86
mainnet_e81889 - altair processSlashings 629.00 ns/op 592.00 ns/op 1.06
mainnet_e81889 - altair processEth1DataReset 753.00 ns/op 1.8570 us/op 0.41
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.3065 ms/op 1.2991 ms/op 1.01
mainnet_e81889 - altair processSlashingsReset 3.7150 us/op 5.6540 us/op 0.66
mainnet_e81889 - altair processRandaoMixesReset 7.5790 us/op 8.7170 us/op 0.87
mainnet_e81889 - altair processHistoricalRootsUpdate 731.00 ns/op 1.6220 us/op 0.45
mainnet_e81889 - altair processParticipationFlagUpdates 2.9680 us/op 3.5140 us/op 0.84
mainnet_e81889 - altair processSyncCommitteeUpdates 706.00 ns/op 461.00 ns/op 1.53
mainnet_e81889 - altair afterProcessEpoch 128.08 ms/op 130.79 ms/op 0.98
capella processEpoch - mainnet_e217614 1.5641 s/op 1.4751 s/op 1.06
mainnet_e217614 - capella beforeProcessEpoch 245.42 ms/op 243.60 ms/op 1.01
mainnet_e217614 - capella processJustificationAndFinalization 17.031 us/op 13.547 us/op 1.26
mainnet_e217614 - capella processInactivityUpdates 21.441 ms/op 19.425 ms/op 1.10
mainnet_e217614 - capella processRewardsAndPenalties 279.02 ms/op 267.01 ms/op 1.04
mainnet_e217614 - capella processRegistryUpdates 29.167 us/op 25.100 us/op 1.16
mainnet_e217614 - capella processSlashings 775.00 ns/op 1.0010 us/op 0.77
mainnet_e217614 - capella processEth1DataReset 749.00 ns/op 997.00 ns/op 0.75
mainnet_e217614 - capella processEffectiveBalanceUpdates 4.3189 ms/op 4.9883 ms/op 0.87
mainnet_e217614 - capella processSlashingsReset 4.5520 us/op 4.4970 us/op 1.01
mainnet_e217614 - capella processRandaoMixesReset 4.9720 us/op 7.0010 us/op 0.71
mainnet_e217614 - capella processHistoricalRootsUpdate 579.00 ns/op 1.0020 us/op 0.58
mainnet_e217614 - capella processParticipationFlagUpdates 1.7830 us/op 2.6010 us/op 0.69
mainnet_e217614 - capella afterProcessEpoch 302.23 ms/op 308.08 ms/op 0.98
phase0 processEpoch - mainnet_e58758 418.70 ms/op 472.98 ms/op 0.89
mainnet_e58758 - phase0 beforeProcessEpoch 127.53 ms/op 137.97 ms/op 0.92
mainnet_e58758 - phase0 processJustificationAndFinalization 16.812 us/op 14.728 us/op 1.14
mainnet_e58758 - phase0 processRewardsAndPenalties 49.390 ms/op 64.073 ms/op 0.77
mainnet_e58758 - phase0 processRegistryUpdates 10.123 us/op 8.8010 us/op 1.15
mainnet_e58758 - phase0 processSlashings 484.00 ns/op 442.00 ns/op 1.10
mainnet_e58758 - phase0 processEth1DataReset 528.00 ns/op 395.00 ns/op 1.34
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.0975 ms/op 1.3502 ms/op 0.81
mainnet_e58758 - phase0 processSlashingsReset 3.4550 us/op 2.1510 us/op 1.61
mainnet_e58758 - phase0 processRandaoMixesReset 6.4210 us/op 3.8620 us/op 1.66
mainnet_e58758 - phase0 processHistoricalRootsUpdate 589.00 ns/op 392.00 ns/op 1.50
mainnet_e58758 - phase0 processParticipationRecordUpdates 4.8660 us/op 3.9590 us/op 1.23
mainnet_e58758 - phase0 afterProcessEpoch 106.49 ms/op 98.021 ms/op 1.09
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.3352 ms/op 1.2183 ms/op 1.10
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.5010 ms/op 1.4434 ms/op 1.04
altair processInactivityUpdates - 250000 normalcase 21.560 ms/op 18.467 ms/op 1.17
altair processInactivityUpdates - 250000 worstcase 21.015 ms/op 17.805 ms/op 1.18
phase0 processRegistryUpdates - 250000 normalcase 12.243 us/op 8.0800 us/op 1.52
phase0 processRegistryUpdates - 250000 badcase_full_deposits 381.02 us/op 354.29 us/op 1.08
phase0 processRegistryUpdates - 250000 worstcase 0.5 132.96 ms/op 137.21 ms/op 0.97
altair processRewardsAndPenalties - 250000 normalcase 52.360 ms/op 64.043 ms/op 0.82
altair processRewardsAndPenalties - 250000 worstcase 59.516 ms/op 63.043 ms/op 0.94
phase0 getAttestationDeltas - 250000 normalcase 8.4965 ms/op 7.7017 ms/op 1.10
phase0 getAttestationDeltas - 250000 worstcase 8.6532 ms/op 7.9561 ms/op 1.09
phase0 processSlashings - 250000 worstcase 2.4830 ms/op 2.4566 ms/op 1.01
altair processSyncCommitteeUpdates - 250000 157.65 ms/op 149.66 ms/op 1.05
BeaconState.hashTreeRoot - No change 385.00 ns/op 303.00 ns/op 1.27
BeaconState.hashTreeRoot - 1 full validator 125.49 us/op 117.81 us/op 1.07
BeaconState.hashTreeRoot - 32 full validator 1.3137 ms/op 1.3607 ms/op 0.97
BeaconState.hashTreeRoot - 512 full validator 12.813 ms/op 17.552 ms/op 0.73
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 152.68 us/op 141.30 us/op 1.08
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 2.0821 ms/op 2.1121 ms/op 0.99
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 29.749 ms/op 28.832 ms/op 1.03
BeaconState.hashTreeRoot - 1 balances 156.44 us/op 174.26 us/op 0.90
BeaconState.hashTreeRoot - 32 balances 1.1015 ms/op 1.1055 ms/op 1.00
BeaconState.hashTreeRoot - 512 balances 9.4922 ms/op 10.065 ms/op 0.94
BeaconState.hashTreeRoot - 250000 balances 186.85 ms/op 197.07 ms/op 0.95
aggregationBits - 2048 els - zipIndexesInBitList 14.917 us/op 15.668 us/op 0.95
regular array get 100000 times 33.170 us/op 35.212 us/op 0.94
wrappedArray get 100000 times 33.143 us/op 33.061 us/op 1.00
arrayWithProxy get 100000 times 14.710 ms/op 14.357 ms/op 1.02
ssz.Root.equals 208.00 ns/op 239.00 ns/op 0.87
byteArrayEquals 208.00 ns/op 230.00 ns/op 0.90
shuffle list - 16384 els 6.9850 ms/op 7.1618 ms/op 0.98
shuffle list - 250000 els 102.32 ms/op 104.75 ms/op 0.98
processSlot - 1 slots 15.027 us/op 17.748 us/op 0.85
processSlot - 32 slots 2.7303 ms/op 3.5846 ms/op 0.76
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 53.946 ms/op 53.647 ms/op 1.01
getCommitteeAssignments - req 1 vs - 250000 vc 2.4958 ms/op 2.5316 ms/op 0.99
getCommitteeAssignments - req 100 vs - 250000 vc 3.6846 ms/op 3.7378 ms/op 0.99
getCommitteeAssignments - req 1000 vs - 250000 vc 4.0267 ms/op 4.0918 ms/op 0.98
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 4.4700 ns/op 4.6100 ns/op 0.97
state getBlockRootAtSlot - 250000 vs - 7PWei 493.42 ns/op 621.36 ns/op 0.79
computeProposers - vc 250000 8.8755 ms/op 9.4198 ms/op 0.94
computeEpochShuffling - vc 250000 102.32 ms/op 106.77 ms/op 0.96
getNextSyncCommittee - vc 250000 145.36 ms/op 158.60 ms/op 0.92
computeSigningRoot for AttestationData 20.149 us/op 23.395 us/op 0.86
hash AttestationData serialized data then Buffer.toString(base64) 2.3439 us/op 2.3391 us/op 1.00
toHexString serialized data 1.0638 us/op 1.0950 us/op 0.97
Buffer.toString(base64) 227.74 ns/op 223.14 ns/op 1.02

by benchmarkbot/action

@dapplion
Copy link
Contributor

You should measure both cpu time and allocations, @tuyennhv 's version recycles the input buffers to reduce allocations

@wemeetagain
Copy link
Member Author

Before:

  network / noise / sendData
    ✔ send data - 1000 1024B messages                                     28.71387 ops/s    34.82637 ms/op        -         17 runs   1.84 s
    ✔ send data - 1000 4096B messages                                     10.38442 ops/s    96.29807 ms/op        -         12 runs   2.12 s
    ✔ send data - 1000 16384B messages                                    2.932366 ops/s    341.0215 ms/op        -         12 runs   5.12 s
    ✔ send data - 1000 65536B messages                                   0.9123945 ops/s    1.096017  s/op        -         12 runs   14.5 s

After:

  network / noise / sendData
    ✔ send data - 1000 1024B messages                                     28.77133 ops/s    34.75682 ms/op        -         21 runs   2.07 s
    ✔ send data - 1000 4096B messages                                     21.41420 ops/s    46.69798 ms/op        -         17 runs   1.96 s
    ✔ send data - 1000 16384B messages                                    11.46818 ops/s    87.19782 ms/op        -         17 runs   2.49 s
    ✔ send data - 1000 65536B messages                                    4.508977 ops/s    221.7798 ms/op        -         14 runs   4.16 s

@wemeetagain wemeetagain marked this pull request as ready for review August 23, 2023 19:27
@wemeetagain wemeetagain requested a review from a team as a code owner August 23, 2023 19:27
@wemeetagain
Copy link
Member Author

roughly doubled the bytes that are gc'd. Eg: from ~800B per scavenge to ~1600B per scavenge

@twoeths
Copy link
Contributor

twoeths commented Aug 24, 2023

after 3h of testing, there's a gap in Memory Usage and other panels in VM dashboard

Screenshot 2023-08-24 at 11 15 49

our node seems to be suspended at the time

Screenshot 2023-08-24 at 11 16 31

@wemeetagain
Copy link
Member Author

Deserves more testing. Since that blip, everything looks good.

@twoeths
Copy link
Contributor

twoeths commented Sep 8, 2023

I deployed this branch to beta-mainnet node for 4 days
network_thread_nodejs_crypto.cpuprofile.zip

the chacha20Poly1305Decrypt function takes 13.96% while in v1.11.0 (#5912 (comment)) release it takes 7.47%

Screenshot 2023-09-08 at 14 43 43

also gc time is a bit higher which is expected (5.58% vs 4.49%)

I also compare to another test mainnet node, which have more gossipsub bandwidth (11.5k rpc/s vs 8k rpc/s in beta-mainnet node) and the time for chacha20Poly1305Decrypt is only 8.86%

@twoeths
Copy link
Contributor

twoeths commented Sep 8, 2023

Before:

  network / noise / sendData
    ✔ send data - 1000 1024B messages                                     28.71387 ops/s    34.82637 ms/op        -         17 runs   1.84 s
    ✔ send data - 1000 4096B messages                                     10.38442 ops/s    96.29807 ms/op        -         12 runs   2.12 s
    ✔ send data - 1000 16384B messages                                    2.932366 ops/s    341.0215 ms/op        -         12 runs   5.12 s
    ✔ send data - 1000 65536B messages                                   0.9123945 ops/s    1.096017  s/op        -         12 runs   14.5 s

After:

  network / noise / sendData
    ✔ send data - 1000 1024B messages                                     28.77133 ops/s    34.75682 ms/op        -         21 runs   2.07 s
    ✔ send data - 1000 4096B messages                                     21.41420 ops/s    46.69798 ms/op        -         17 runs   1.96 s
    ✔ send data - 1000 16384B messages                                    11.46818 ops/s    87.19782 ms/op        -         17 runs   2.49 s
    ✔ send data - 1000 65536B messages                                    4.508977 ops/s    221.7798 ms/op        -         14 runs   4.16 s

NodeJS crypto tends to have better performance with longer message while lodestar mostly have messages with <512 bytes, I think we need to add benchmark for it @wemeetagain

@wemeetagain wemeetagain changed the title feat: use nodejs crypto for noise feat: selectively use nodejs crypto for noise Sep 11, 2023
@twoeths
Copy link
Contributor

twoeths commented Sep 15, 2023

Latest profile shows 7.26% of cpu time which is expected, gc is a little bit higher at 5.8%
Screenshot 2023-09-15 at 16 58 16

beta_mainnet_nodejs_crypto_Sep_15.cpuprofile.zip

the interesting part is event loop lag way better than stable mainnet node, not sure if it only happens to beta mainnet node

Screenshot 2023-09-15 at 16 59 27

gossipsub traffic (or mesh peers) is only a little bit less than stable mainnet node

@twoeths twoeths merged commit 8794025 into unstable Sep 15, 2023
14 checks passed
@twoeths twoeths deleted the cayman/node-noise-crypto branch September 15, 2023 10:06
@twoeths
Copy link
Contributor

twoeths commented Sep 16, 2023

after merging to master, the event loop lag in network worker is the same

Screenshot 2023-09-16 at 16 39 35

so either it's due to the way we deploy (service deployment vs docker) or somehow this metric depends on node (in this case beta mainnet is better than unstable mainnet)

@wemeetagain
Copy link
Member Author

🎉 This PR is included in v1.12.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants