Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize backfill sync to efficiently use reqresp fetched block data #3669

Closed
wants to merge 5 commits into from

Conversation

g11tech
Copy link
Contributor

@g11tech g11tech commented Jan 26, 2022

Motivation
As @tuyennhv pointed out after doing profiler runs, backfill sync could be optimized to use treebacked data from the sync range/block by root.

This PR changes the following:

  • Use treebacked block's hashtree root of the synced block to get its blockroot to verify parent/child relationship in verify block sequence
  • Use blockArchive's batch put binary using treebacked block's data
  • add a hidden cli option --sync.backfillBatchSize to speciify the batch size for backfill sync
    Description

Closes #3657

@codeclimate
Copy link

codeclimate bot commented Jan 26, 2022

Code Climate has analyzed commit 7928f92 and detected 0 issues on this pull request.

View more on Code Climate.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 26, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 3ecd574 Previous: 52032f8 Ratio
BeaconState.hashTreeRoot - No change 632.00 ns/op 459.00 ns/op 1.38
BeaconState.hashTreeRoot - 1 full validator 146.65 us/op 119.54 us/op 1.23
BeaconState.hashTreeRoot - 32 full validator 2.2193 ms/op 1.6142 ms/op 1.37
BeaconState.hashTreeRoot - 512 full validator 29.189 ms/op 21.199 ms/op 1.38
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 148.58 us/op 114.92 us/op 1.29
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 2.4174 ms/op 1.8303 ms/op 1.32
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 31.894 ms/op 24.288 ms/op 1.31
BeaconState.hashTreeRoot - 1 balances 108.14 us/op 90.823 us/op 1.19
BeaconState.hashTreeRoot - 32 balances 874.42 us/op 699.99 us/op 1.25
BeaconState.hashTreeRoot - 512 balances 8.3269 ms/op 6.4099 ms/op 1.30
BeaconState.hashTreeRoot - 250000 balances 154.29 ms/op 130.19 ms/op 1.19
processSlot - 1 slots 55.128 us/op 42.485 us/op 1.30
processSlot - 32 slots 3.3190 ms/op 2.7116 ms/op 1.22
getCommitteeAssignments - req 1 vs - 250000 vc 6.2915 ms/op 4.5003 ms/op 1.40
getCommitteeAssignments - req 100 vs - 250000 vc 8.7312 ms/op 6.3586 ms/op 1.37
getCommitteeAssignments - req 1000 vs - 250000 vc 9.4162 ms/op 6.6776 ms/op 1.41
computeProposers - vc 250000 24.151 ms/op 17.569 ms/op 1.37
computeEpochShuffling - vc 250000 216.77 ms/op 155.20 ms/op 1.40
getNextSyncCommittee - vc 250000 398.75 ms/op 287.40 ms/op 1.39
altair processAttestation - 250000 vs - 7PWei normalcase 62.361 ms/op 41.547 ms/op 1.50
altair processAttestation - 250000 vs - 7PWei worstcase 51.513 ms/op 41.234 ms/op 1.25
altair processAttestation - setStatus - 1/6 committees join 11.157 ms/op 8.2089 ms/op 1.36
altair processAttestation - setStatus - 1/3 committees join 23.840 ms/op 18.702 ms/op 1.27
altair processAttestation - setStatus - 1/2 committees join 36.733 ms/op 28.477 ms/op 1.29
altair processAttestation - setStatus - 2/3 committees join 48.837 ms/op 38.715 ms/op 1.26
altair processAttestation - setStatus - 4/5 committees join 57.705 ms/op 45.828 ms/op 1.26
altair processAttestation - setStatus - 100% committees join 73.015 ms/op 61.495 ms/op 1.19
altair processAttestation - updateEpochParticipants - 1/6 committees join 12.001 ms/op 9.6705 ms/op 1.24
altair processAttestation - updateEpochParticipants - 1/3 committees join 25.396 ms/op 20.499 ms/op 1.24
altair processAttestation - updateEpochParticipants - 1/2 committees join 33.657 ms/op 69.434 ms/op 0.48
altair processAttestation - updateEpochParticipants - 2/3 committees join 39.757 ms/op 24.073 ms/op 1.65
altair processAttestation - updateEpochParticipants - 4/5 committees join 34.734 ms/op 24.759 ms/op 1.40
altair processAttestation - updateEpochParticipants - 100% committees join 36.780 ms/op 28.344 ms/op 1.30
altair processAttestation - updateAllStatus 26.778 ms/op 19.884 ms/op 1.35
altair processBlock - 250000 vs - 7PWei normalcase 52.484 ms/op 45.081 ms/op 1.16
altair processBlock - 250000 vs - 7PWei worstcase 135.07 ms/op 113.09 ms/op 1.19
altair processEpoch - mainnet_e81889 1.3341 s/op 1.0446 s/op 1.28
mainnet_e81889 - altair beforeProcessEpoch 311.04 ms/op 270.21 ms/op 1.15
mainnet_e81889 - altair processJustificationAndFinalization 100.50 us/op 114.38 us/op 0.88
mainnet_e81889 - altair processInactivityUpdates 20.626 ms/op 15.352 ms/op 1.34
mainnet_e81889 - altair processRewardsAndPenalties 280.16 ms/op 245.52 ms/op 1.14
mainnet_e81889 - altair processRegistryUpdates 15.891 us/op 11.040 us/op 1.44
mainnet_e81889 - altair processSlashings 4.5470 us/op 2.5330 us/op 1.80
mainnet_e81889 - altair processEth1DataReset 3.7860 us/op 2.1130 us/op 1.79
mainnet_e81889 - altair processEffectiveBalanceUpdates 13.703 ms/op 12.072 ms/op 1.14
mainnet_e81889 - altair processSlashingsReset 19.564 us/op 17.163 us/op 1.14
mainnet_e81889 - altair processRandaoMixesReset 21.061 us/op 24.986 us/op 0.84
mainnet_e81889 - altair processHistoricalRootsUpdate 5.3090 us/op 2.4540 us/op 2.16
mainnet_e81889 - altair processParticipationFlagUpdates 115.28 ms/op 129.29 ms/op 0.89
mainnet_e81889 - altair processSyncCommitteeUpdates 3.4330 us/op 1.7710 us/op 1.94
mainnet_e81889 - altair afterProcessEpoch 258.03 ms/op 218.70 ms/op 1.18
altair processInactivityUpdates - 250000 normalcase 75.366 ms/op 59.047 ms/op 1.28
altair processInactivityUpdates - 250000 worstcase 76.656 ms/op 61.189 ms/op 1.25
altair processParticipationFlagUpdates - 250000 anycase 105.92 ms/op 87.069 ms/op 1.22
altair processRewardsAndPenalties - 250000 normalcase 282.07 ms/op 213.94 ms/op 1.32
altair processRewardsAndPenalties - 250000 worstcase 246.84 ms/op 242.88 ms/op 1.02
altair processSyncCommitteeUpdates - 250000 418.10 ms/op 303.21 ms/op 1.38
Tree 40 250000 create 916.60 ms/op 557.26 ms/op 1.64
Tree 40 250000 get(125000) 393.40 ns/op 272.30 ns/op 1.44
Tree 40 250000 set(125000) 2.4816 us/op 1.7373 us/op 1.43
Tree 40 250000 toArray() 46.567 ms/op 32.249 ms/op 1.44
Tree 40 250000 iterate all - toArray() + loop 53.635 ms/op 33.170 ms/op 1.62
Tree 40 250000 iterate all - get(i) 145.72 ms/op 99.432 ms/op 1.47
MutableVector 250000 create 24.008 ms/op 22.480 ms/op 1.07
MutableVector 250000 get(125000) 17.081 ns/op 11.539 ns/op 1.48
MutableVector 250000 set(125000) 538.90 ns/op 460.66 ns/op 1.17
MutableVector 250000 toArray() 10.052 ms/op 7.8415 ms/op 1.28
MutableVector 250000 iterate all - toArray() + loop 23.223 ms/op 7.0594 ms/op 3.29
MutableVector 250000 iterate all - get(i) 4.1264 ms/op 2.9306 ms/op 1.41
Array 250000 create 6.2166 ms/op 4.7171 ms/op 1.32
Array 250000 clone - spread 2.0731 ms/op 2.1283 ms/op 0.97
Array 250000 get(125000) 1.0460 ns/op 1.0280 ns/op 1.02
Array 250000 set(125000) 1.0400 ns/op 1.0180 ns/op 1.02
Array 250000 iterate all - loop 200.67 us/op 168.79 us/op 1.19
aggregationBits - 2048 els - readonlyValues 244.92 us/op 233.26 us/op 1.05
aggregationBits - 2048 els - zipIndexesInBitList 41.836 us/op 39.256 us/op 1.07
regular array get 100000 times 80.775 us/op 67.401 us/op 1.20
wrappedArray get 100000 times 80.687 us/op 67.405 us/op 1.20
arrayWithProxy get 100000 times 32.526 ms/op 28.384 ms/op 1.15
ssz.Root.equals 1.3990 us/op 1.0450 us/op 1.34
ssz.Root.equals with valueOf() 1.4960 us/op 1.2210 us/op 1.23
byteArrayEquals with valueOf() 1.4820 us/op 1.2450 us/op 1.19
phase0 processBlock - 250000 vs - 7PWei normalcase 12.137 ms/op 10.161 ms/op 1.19
phase0 processBlock - 250000 vs - 7PWei worstcase 88.955 ms/op 73.096 ms/op 1.22
phase0 afterProcessEpoch - 250000 vs - 7PWei 243.79 ms/op 202.06 ms/op 1.21
phase0 beforeProcessEpoch - 250000 vs - 7PWei 625.07 ms/op 552.49 ms/op 1.13
phase0 processEpoch - mainnet_e58758 974.57 ms/op 796.05 ms/op 1.22
mainnet_e58758 - phase0 beforeProcessEpoch 487.61 ms/op 400.57 ms/op 1.22
mainnet_e58758 - phase0 processJustificationAndFinalization 76.449 us/op 109.10 us/op 0.70
mainnet_e58758 - phase0 processRewardsAndPenalties 162.06 ms/op 139.01 ms/op 1.17
mainnet_e58758 - phase0 processRegistryUpdates 42.166 us/op 71.956 us/op 0.59
mainnet_e58758 - phase0 processSlashings 3.1350 us/op 2.6440 us/op 1.19
mainnet_e58758 - phase0 processEth1DataReset 2.9860 us/op 1.7650 us/op 1.69
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 12.142 ms/op 9.9046 ms/op 1.23
mainnet_e58758 - phase0 processSlashingsReset 12.134 us/op 15.071 us/op 0.81
mainnet_e58758 - phase0 processRandaoMixesReset 15.691 us/op 25.337 us/op 0.62
mainnet_e58758 - phase0 processHistoricalRootsUpdate 3.9010 us/op 2.8800 us/op 1.35
mainnet_e58758 - phase0 processParticipationRecordUpdates 12.983 us/op 17.851 us/op 0.73
mainnet_e58758 - phase0 afterProcessEpoch 214.99 ms/op 176.54 ms/op 1.22
phase0 processEffectiveBalanceUpdates - 250000 normalcase 12.763 ms/op 10.929 ms/op 1.17
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.5112 s/op 1.2208 s/op 1.24
phase0 processRegistryUpdates - 250000 normalcase 58.184 us/op 73.436 us/op 0.79
phase0 processRegistryUpdates - 250000 badcase_full_deposits 3.2927 ms/op 2.9192 ms/op 1.13
phase0 processRegistryUpdates - 250000 worstcase 0.5 1.9872 s/op 1.6337 s/op 1.22
phase0 getAttestationDeltas - 250000 normalcase 93.124 ms/op 85.672 ms/op 1.09
phase0 getAttestationDeltas - 250000 worstcase 94.772 ms/op 86.084 ms/op 1.10
phase0 processSlashings - 250000 worstcase 45.429 ms/op 33.164 ms/op 1.37
shuffle list - 16384 els 15.163 ms/op 12.414 ms/op 1.22
shuffle list - 250000 els 216.75 ms/op 178.82 ms/op 1.21
getEffectiveBalances - 250000 vs - 7PWei 12.021 ms/op 9.8422 ms/op 1.22
pass gossip attestations to forkchoice per slot 17.035 ms/op 18.253 ms/op 0.93
computeDeltas 4.5872 ms/op 3.2430 ms/op 1.41
computeProposerBoostScoreFromBalances 403.29 us/op 337.43 us/op 1.20
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 2.3421 ms/op 1.8950 ms/op 1.24
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 823.91 us/op 689.93 us/op 1.19
BLS verify - blst-native 2.2173 ms/op 1.8596 ms/op 1.19
BLS verifyMultipleSignatures 3 - blst-native 4.5675 ms/op 3.8180 ms/op 1.20
BLS verifyMultipleSignatures 8 - blst-native 9.8641 ms/op 8.2345 ms/op 1.20
BLS verifyMultipleSignatures 32 - blst-native 35.666 ms/op 29.884 ms/op 1.19
BLS aggregatePubkeys 32 - blst-native 46.988 us/op 39.940 us/op 1.18
BLS aggregatePubkeys 128 - blst-native 182.89 us/op 153.93 us/op 1.19
getAttestationsForBlock 105.70 ms/op 77.669 ms/op 1.36
CheckpointStateCache - add get delete 21.665 us/op 17.047 us/op 1.27
validate gossip signedAggregateAndProof - struct 5.3564 ms/op 4.4499 ms/op 1.20
validate gossip signedAggregateAndProof - treeBacked 5.2764 ms/op 4.3957 ms/op 1.20
validate gossip attestation - struct 2.4972 ms/op 2.0965 ms/op 1.19
validate gossip attestation - treeBacked 2.5167 ms/op 2.1146 ms/op 1.19
bytes32 toHexString 1.9250 us/op 1.5300 us/op 1.26
bytes32 Buffer.toString(hex) 849.00 ns/op 683.00 ns/op 1.24
bytes32 Buffer.toString(hex) from Uint8Array 1.1280 us/op 887.00 ns/op 1.27
bytes32 Buffer.toString(hex) + 0x 849.00 ns/op 677.00 ns/op 1.25
Object access 1 prop 0.37900 ns/op 0.31200 ns/op 1.21
Map access 1 prop 0.33000 ns/op 0.28800 ns/op 1.15
Object get x1000 20.805 ns/op 17.891 ns/op 1.16
Map get x1000 1.1450 ns/op 0.98300 ns/op 1.16
Object set x1000 128.17 ns/op 98.363 ns/op 1.30
Map set x1000 74.763 ns/op 58.384 ns/op 1.28
Return object 10000 times 0.44480 ns/op 0.36960 ns/op 1.20
Throw Error 10000 times 7.0932 us/op 5.9089 us/op 1.20
enrSubnets - fastDeserialize 64 bits 1.3960 us/op 1.1420 us/op 1.22
enrSubnets - ssz BitVector 64 bits 19.975 us/op 16.416 us/op 1.22
enrSubnets - fastDeserialize 4 bits 556.00 ns/op 402.00 ns/op 1.38
enrSubnets - ssz BitVector 4 bits 3.6320 us/op 2.7980 us/op 1.30
RateTracker 1000000 limit, 1 obj count per request 215.54 ns/op 172.55 ns/op 1.25
RateTracker 1000000 limit, 2 obj count per request 159.08 ns/op 127.50 ns/op 1.25
RateTracker 1000000 limit, 4 obj count per request 133.08 ns/op 104.74 ns/op 1.27
RateTracker 1000000 limit, 8 obj count per request 119.90 ns/op 93.562 ns/op 1.28
RateTracker with prune 4.4650 us/op 3.3710 us/op 1.32

by benchmarkbot/action

@codecov
Copy link

codecov bot commented Jan 26, 2022

Codecov Report

Merging #3669 (7928f92) into master (a00ec5c) will increase coverage by 0.25%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3669      +/-   ##
==========================================
+ Coverage   37.13%   37.39%   +0.25%     
==========================================
  Files         321      322       +1     
  Lines        8706     8796      +90     
  Branches     1350     1369      +19     
==========================================
+ Hits         3233     3289      +56     
- Misses       5330     5365      +35     
+ Partials      143      142       -1     

const block = signedBlock.message as TreeBacked<allForks.BeaconBlock>;
return {
key: block.slot,
value: signedBlock.serialize(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean in the issue is that we return both the binary and the signed block in beaconBlocksByRange and beaconBlocksByRoot. So instead of returning Promise<allForks.SignedBeaconBlock[]> in network req/resp api, return something like

type CachedBytes<T> = {bytes: Uint8Array} & T;

async beaconBlocksByRange(
    peerId: PeerId,
    request: phase0.BeaconBlocksByRangeRequest
  ): Promise<CachedBytes<allForks.SignedBeaconBlock>[]>

and consume that cached bytes without having to call serialize() again in BackFill sync and similar places

@dapplion do we have this type in ssz v2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was wondering about the same 🙂
This format should currently save on hashTreeRoot calcs which would amount for 9% of hashTreeRoot in verifiBlockSequence (#3657 (comment)), and may be some 2-3% on the serialize.

CachedBytes<allForks.SignedBeaconBlock> would be awesome ❤️ awaiting @dapplion 's input regarding ssz v2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, ssz v2 only (optionally) caches the hashTreeRoot (of non-tree-backed values); doesn't cache the serialized bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added caching capability as that was discussed to be out of scope for ssz v2 👍

return (typeTree.createTreeBackedFromBytes(bytes) as unknown) as T;
const treeBacked = typeTree.createTreeBackedFromBytes(bytes) as unknown;
if (options?.cacheBytes) {
return (new Proxy({treeBacked, bytes}, cachedTreeBackedProxyHandler) as unknown) as T;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed another proxy as tree's proxy wasn't letting assign bytes without going through the tree setters. May be post ssz v2 merge, it could be removed.

@g11tech g11tech changed the title Optimize backfill sync to efficiently use treebacked block data Optimize backfill sync to efficiently use reqresp fetched block data Feb 1, 2022
@dapplion dapplion changed the base branch from master to unstable May 27, 2022 04:35
@dapplion dapplion dismissed a stale review May 27, 2022 04:35

The base branch was changed.

@dapplion dapplion requested a review from a team as a code owner May 27, 2022 04:35
@dapplion
Copy link
Contributor

dapplion commented Sep 1, 2022

Closing for now since the network code has diverged. This optimization is good and should definitely be included on a future review of backfill sync

@dapplion dapplion closed this Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Skip serializing blocks when persisting to db
4 participants