Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ECCVM witness generation optimisation #5211

Merged
merged 6 commits into from
Mar 18, 2024

Conversation

zac-williamson
Copy link
Contributor

@zac-williamson zac-williamson commented Mar 14, 2024

This PR modifies the witness generation code for the ECCVM circuit builder.

In our ivc benchmarks, the overall proportion of work performed by ECCVM::create_prover has reduced from 10% to less than 1%.

Key changes are multithreading witness generation, as well as removing a substantial number of field inversions that we were unnecessarily performing. The inversions are now more effectively performed via calling field_t::batch_invert

Benchmarking lock created at ~/BENCHMARK_IN_PROGRESS.
client_ivc_bench                                                                                                                                                                  100%   15MB  47.2MB/s   00:00    
2024-03-18T10:50:07+00:00
Running ./client_ivc_bench
Run on (16 X 3631.57 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 36608 KiB (x1)
Load Average: 1.16, 0.82, 0.33
--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
ClientIVCBench/Full/6      23697 ms        18934 ms            1 Decider::construct_proof=1 Decider::construct_proof(t)=755.044M ECCVMComposer::compute_commitment_key=1 ECCVMComposer::compute_commitment_key(t)=3.77177M ECCVMComposer::compute_witness=1 ECCVMComposer::compute_witness(t)=129.434M ECCVMComposer::create_prover=1 ECCVMComposer::create_prover(t)=149.26M ECCVMComposer::create_proving_key=1 ECCVMComposer::create_proving_key(t)=15.833M ECCVMProver::construct_proof=1 ECCVMProver::construct_proof(t)=1.78177G Goblin::merge=11 Goblin::merge(t)=128.554M GoblinTranslatorCircuitBuilder::constructor=1 GoblinTranslatorCircuitBuilder::constructor(t)=58.2017M GoblinTranslatorComposer::create_prover=1 GoblinTranslatorComposer::create_prover(t)=121.617M GoblinTranslatorProver::construct_proof=1 GoblinTranslatorProver::construct_proof(t)=928.122M ProtoGalaxyProver_::accumulator_update_round=10 ProtoGalaxyProver_::accumulator_update_round(t)=727.574M ProtoGalaxyProver_::combiner_quotient_round=10 ProtoGalaxyProver_::combiner_quotient_round(t)=7.29332G ProtoGalaxyProver_::perturbator_round=10 ProtoGalaxyProver_::perturbator_round(t)=1.32753G ProtoGalaxyProver_::preparation_round=10 ProtoGalaxyProver_::preparation_round(t)=4.16456G ProtogalaxyProver::fold_instances=10 ProtogalaxyProver::fold_instances(t)=13.513G ProverInstance(Circuit&)=11 ProverInstance(Circuit&)(t)=1.96494G batch_mul_with_endomorphism=30 batch_mul_with_endomorphism(t)=567.025M commit=425 commit(t)=4.03553G compute_combiner=10 compute_combiner(t)=7.29114G compute_perturbator=9 compute_perturbator(t)=1.32717G compute_univariate=48 compute_univariate(t)=1.43152G construct_circuits=6 construct_circuits(t)=4.27911G
Benchmarking lock deleted.
client_ivc_bench.json                                                                                                                                                             100% 4027   130.8KB/s   00:00    
function                                        ms     % sum
construct_circuits(t)                         4279    18.12%
ProverInstance(Circuit&)(t)                   1965     8.32%
ProtogalaxyProver::fold_instances(t)         13513    57.21%
Decider::construct_proof(t)                    755     3.20%
ECCVMComposer::create_prover(t)                149     0.63%
GoblinTranslatorComposer::create_prover(t)     122     0.51%
ECCVMProver::construct_proof(t)               1782     7.54%
GoblinTranslatorProver::construct_proof(t)     928     3.93%
Goblin::merge(t)                               129     0.54%

Total time accounted for: 23621ms/23697ms = 99.68%

Major contributors:
function                                        ms    % sum
commit(t)                                     4036   17.08%
compute_combiner(t)                           7291   30.87%
compute_perturbator(t)                        1327    5.62%
compute_univariate(t)                         1432    6.06%

Breakdown of ECCVMProver::create_prover:
ECCVMComposer::compute_witness(t)              129    86.72%
ECCVMComposer::create_proving_key(t)            16    10.61%

Breakdown of ProtogalaxyProver::fold_instances:
ProtoGalaxyProver_::preparation_round(t)           4165    30.82%
ProtoGalaxyProver_::perturbator_round(t)           1328     9.82%
ProtoGalaxyProver_::combiner_quotient_round(t)     7293    53.97%
ProtoGalaxyProver_::accumulator_update_round(t)     728     5.38%

@AztecBot
Copy link
Collaborator

AztecBot commented Mar 14, 2024

Benchmark results

Metrics with a significant change:

  • l2_block_processing_time_in_ms (32): 5,736 (+19%)
  • note_successful_decrypting_time_in_ms (32): 832 (+60%)
  • note_successful_decrypting_time_in_ms (64): 1,142 (+17%)
Detailed results

All benchmarks are run on txs on the Benchmarking contract on the repository. Each tx consists of a batch call to create_note and increment_balance, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write.

This benchmark source data is available in JSON format on S3 here.

Values are compared against data from master at commit 4d04a7e8 and shown if the difference exceeds 1%.

L2 block published to L1

Each column represents the number of txs on an L2 block published to L1.

Metric 8 txs 32 txs 64 txs
l1_rollup_calldata_size_in_bytes 5,668 18,820 36,356
l1_rollup_calldata_gas 66,364 239,152 469,844
l1_rollup_execution_gas 659,687 941,736 1,318,251
l2_block_processing_time_in_ms 1,261 (-4%) ⚠️ 5,736 (+19%) 8,877 (-2%)
note_successful_decrypting_time_in_ms 180 (+2%) ⚠️ 832 (+60%) ⚠️ 1,142 (+17%)
note_trial_decrypting_time_in_ms 86.1 (+10%) 51.8 (+46%) 59.1 (-46%)
l2_block_building_time_in_ms 18,292 (+1%) 69,348 (+1%) 137,003 (+1%)
l2_block_rollup_simulation_time_in_ms 8,258 (+1%) 29,388 (+2%) 57,218 (+2%)
l2_block_public_tx_process_time_in_ms 10,012 (+1%) 39,898 (+1%) 79,685 (+1%)

L2 chain processing

Each column represents the number of blocks on the L2 chain where each block has 16 txs.

Metric 5 blocks 10 blocks
node_history_sync_time_in_ms 13,983 (-3%) 26,935 (-1%)
note_history_successful_decrypting_time_in_ms 1,279 (+5%) 2,494 (+3%)
note_history_trial_decrypting_time_in_ms 103 (+69%) 179 (+25%)
node_database_size_in_bytes 19,071,056 35,741,776
pxe_database_size_in_bytes 29,859 59,414

Circuits stats

Stats on running time and I/O sizes collected for every circuit run across all benchmarks.

Circuit circuit_simulation_time_in_ms circuit_input_size_in_bytes circuit_output_size_in_bytes
private-kernel-init 281 (+2%) 44,366 28,244
private-kernel-ordering 215 52,868 14,326
base-parity 1,798 128 311
base-rollup 726 (+1%) 165,787 925
root-parity 1,708 (+10%) 1,244 311
root-rollup 68.2 4,487 789
private-kernel-inner 646 (+1%) 73,771 28,244
public-kernel-app-logic 444 35,260 28,215
public-kernel-tail 172 (+1%) 40,926 28,215
merge-rollup 8.70 (+6%) 2,696 925

Tree insertion stats

The duration to insert a fixed batch of leaves into each tree type.

Metric 1 leaves 16 leaves 64 leaves 128 leaves 512 leaves 1024 leaves 2048 leaves 4096 leaves 32 leaves
batch_insert_into_append_only_tree_16_depth_ms 10.0 (+1%) 16.0 N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_count 16.8 31.6 N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_ms 0.585 (+1%) 0.495 N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_32_depth_ms N/A N/A 45.6 72.3 230 444 (+1%) 880 (-1%) 1,720 (-1%) N/A
batch_insert_into_append_only_tree_32_depth_hash_count N/A N/A 96.0 159 543 1,055 2,079 4,127 N/A
batch_insert_into_append_only_tree_32_depth_hash_ms N/A N/A 0.469 0.446 0.419 0.416 (+1%) 0.418 (-1%) 0.412 (-1%) N/A
batch_insert_into_indexed_tree_20_depth_ms N/A N/A 53.6 (-2%) 106 334 (-1%) 658 (+1%) 1,312 (-2%) 2,594 (-1%) N/A
batch_insert_into_indexed_tree_20_depth_hash_count N/A N/A 104 207 691 1,363 2,707 5,395 N/A
batch_insert_into_indexed_tree_20_depth_hash_ms N/A N/A 0.477 (-2%) 0.480 0.456 (-1%) 0.454 (+1%) 0.455 (-2%) 0.452 (-1%) N/A
batch_insert_into_indexed_tree_40_depth_ms N/A N/A N/A N/A N/A N/A N/A N/A 61.1 (+1%)
batch_insert_into_indexed_tree_40_depth_hash_count N/A N/A N/A N/A N/A N/A N/A N/A 109
batch_insert_into_indexed_tree_40_depth_hash_ms N/A N/A N/A N/A N/A N/A N/A N/A 0.535 (+1%)

Miscellaneous

Transaction sizes based on how many contract classes are registered in the tx.

Metric 0 registered classes
tx_size_in_bytes 22,012

Transaction processing duration by data writes.

Metric 0 new note hashes 1 new note hashes
tx_pxe_processing_time_ms 3,237 (-2%) 1,750 (+1%)
Metric 0 public data writes 1 public data writes
tx_sequencer_processing_time_ms 12.2 (+6%) 1,240 (+1%)

@codygunton codygunton changed the title [feat] ECCVM witness generation optimisation feat: ECCVM witness generation optimisation Mar 18, 2024
row.pc = pc;
msm_state.push_back(row);
} else {
if (j == num_rounds - 1) {
Copy link
Contributor

@codygunton codygunton Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deep nesting is pretty ick / far from readable.

Copy link
Contributor

@codygunton codygunton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is large and complex PR that should be 2-3 PRs with more documentation, but the entire ECCVM needs to be read from scratch anyway, so I'll approve and merge after having sanity checked for a while.

@codygunton codygunton merged commit 85ac726 into master Mar 18, 2024
97 of 98 checks passed
@codygunton codygunton deleted the zw/eccvm-witgen-optimisations branch March 18, 2024 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants