Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark array-ish structures #2926

Merged
merged 6 commits into from
Aug 9, 2021
Merged

Benchmark array-ish structures #2926

merged 6 commits into from
Aug 9, 2021

Conversation

dapplion
Copy link
Contributor

@dapplion dapplion commented Aug 8, 2021

Motivation

To properly optimize our beacon state transition performance and memory usage we need to understand the tradeoffs or our different approaches.

Description

Add informational tests (not run in CI) with hardcoded results.

Notable results:

  • Iterating an array is x10 faster than iterating a MutableVector
  • Iterating a MutableVector is x100 times faster than iterating a Tree
  • Regular JS arrays of numbers take 8 bytes per element
  • MutableVector of numbers take 15 bytes per element
  • Cloning a MutableVector has a fixed cost of ~1000 bytes. Cloning a MutableVector to mutate a few elements is very memory efficient even with the initial 1000 bytes cost

@codeclimate
Copy link

codeclimate bot commented Aug 8, 2021

Code Climate has analyzed commit 6aacd2f and detected 0 issues on this pull request.

View more on Code Climate.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2021

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 13efffb Previous: bea1304 Ratio
getCommitteeAssignments - req 1000 vs - 250000 vc 9.1524 ms/op 7.7644 ms/op 1.18
epoch altair - 250000 vs - 7PWei - processInactivityUpdates 2.4537 s/op 2.7679 s/op 0.89
epoch altair - 250000 vs - 7PWei - processRewardsAndPenalties 909.44 ms/op 847.30 ms/op 1.07
epoch altair - 250000 vs - 7PWei - processParticipationFlagUpdates 391.59 ms/op 340.11 ms/op 1.15
Process block - 250000 vs - 7PWei - with 0 validator exit 443.40 us/op 479.49 us/op 0.92
Process block - 250000 vs - 7PWei - with 1 validator exit 30.160 ms/op 36.390 ms/op 0.83
Process block - 250000 vs - 7PWei - with 16 validator exits 27.802 ms/op 26.682 ms/op 1.04
epoch phase0 - 250000 vs - 7PWei - prepareEpochProcessState 702.22 ms/op 848.78 ms/op 0.83
epoch phase0 - 250000 vs - 7PWei - processRewardsAndPenalties 437.91 ms/op 572.64 ms/op 0.76
epoch phase0 - 250000 vs - 7PWei - processEffectiveBalanceUpdates 109.40 ms/op 132.96 ms/op 0.82
getAttestationDeltas - 250000 vs - 7PWei 107.34 ms/op 114.20 ms/op 0.94
processSlots - 250000 vs - 7PWei - 32 empty slots 5.1582 s/op 5.3798 s/op 0.96
shuffle list - 16384 els 2.9532 ms/op 1.8159 ms/op 1.63
shuffle list - 250000 els 41.698 ms/op 24.715 ms/op 1.69
getPubkeys - persistent - req 1000 vs - 250000 vc 21.177 us/op 18.185 us/op 1.16
BLS verify - blst-native 2.0754 ms/op 2.0260 ms/op 1.02
BLS verifyMultipleSignatures 3 - blst-native 4.3593 ms/op 4.1901 ms/op 1.04
BLS verifyMultipleSignatures 8 - blst-native 9.4146 ms/op 8.8411 ms/op 1.06
BLS verifyMultipleSignatures 32 - blst-native 33.540 ms/op 36.086 ms/op 0.93
BLS aggregatePubkeys 32 - blst-native 47.899 us/op 45.802 us/op 1.05
BLS aggregatePubkeys 128 - blst-native 175.75 us/op 174.99 us/op 1.00
getAttestationsForBlock 140.13 ms/op 87.886 ms/op 1.59
validate gossip signedAggregateAndProof - struct 5.0982 ms/op 7.4572 ms/op 0.68
validate gossip signedAggregateAndProof - treeBacked 4.9912 ms/op 5.3407 ms/op 0.93
validate gossip attestation - struct 2.3002 ms/op 2.3805 ms/op 0.97
validate gossip attestation - treeBacked 2.4132 ms/op 2.5641 ms/op 0.94

by benchmarkbot/action

Copy link
Contributor

@twoeths twoeths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having this statistic 👍 , as we'll have more and more validators especially after the Merge, I suppose the loop speed is more and more important and we want to take a scalable approach.

To deduplicate validator data, I suggest keeping only validator roots in the tree (i.e. validatorRoots: new ListType({elementType: Root, limit: VALIDATOR_REGISTRY_LIMIT}) and still keep CachedValidatorList to get the best of both world: the hash, the loop and access validator properties. I'm not sure how serialize() works for CachedBeaconState through.

What do you think @wemeetagain @dapplion ?

@wemeetagain
Copy link
Member

To deduplicate validator data, I suggest keeping only validator roots in the tree

Definitely agree. I think the only question is how we should go about that.

You mentioned the tradeoff of storing the deserialized validators separately. Done naively, it breaks ssz serialization/deserialization (and proof generation).

Another approach may be to work within the ssz library to support hybrid tree-backed / struct-backed values. This could make it easier to maintain compatibility with the full range of ssz operations. The tradeoff being that it may be harder to customize / get the exact performance characteristics we're wanting in lodestar.

@dapplion dapplion merged commit 2a478d5 into master Aug 9, 2021
@dapplion dapplion deleted the dapplion/benchmark-arrays branch August 9, 2021 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants