Benchmark against other EVMs #7

lightclient · 2021-11-09T19:45:38Z

It would be cool to benchmark against other EVM implementations, especially evmone which AFAIK is currently the fastest EVM interpreter.

This would probably be a good benchmark for arithmetic: ethereum/evmone#320

The text was updated successfully, but these errors were encountered:

rakita · 2021-11-10T00:56:30Z

This will be very useful, thank you lightclient!

On my laptop, I am getting around 210-220ms, didn't expect to be that big. Will need to spin perf to see if I can see something.

rakita · 2021-11-14T13:04:28Z

It is a little bit faster now, I am getting around ~110-120ms. The memory and stack that I got from sputnik were not optimized, and signed operations could probably be done better.

rakita · 2021-11-15T00:52:37Z

and now it is more in the range of ~85-95ms

rakita · 2021-11-17T00:42:56Z

~75-80ms now, on my laptop.

rakita · 2021-11-23T15:47:49Z

sdiv looks like next in line for optimization.

rakita · 2021-12-12T18:55:35Z

This is where the story becomes interesting. and evmone is really great. I added few more optimization: static gas are precalculated in gas_block and applied when needed and added some other small tweaks but still div is big performance hit.

It seems that there is a big difference if I am running windows or linux. windows is usually faster by ~8-10ms, I am still unsure what part of code is responsible for that. All measurements above are done in windows.

For measurement bellow, they only differ by switching div and sdiv opcodes. here

For Parity u256 div I am getting around ~68-72ms on win and ~77-80ms on linux and graph looks like:

while with zkp_u256 I got a boost and was getting around ~58-61ms on windows and on linux ~67-68ms

zkp u256 that uses __udivti3 to divide 2by1 word here. It is a lot faster even with unneeded Option unwrap, I will remove it and measure again a bit later.
parity u256 uses their custom 2by1 div and it is even slower, from flamegraph it seems all time is spent on this function: here

Parity_u256 should probably just switch to u128 and will probably gain some better performance.

evmone uses an optimized version that seems even faster than embedded __udivti3 so there are even more improvements that can be done. Amazing Pawel gave us info on the speed of it: https://groups.google.com/g/llvm-dev/c/5PqUC4nB_DQ/m/DaCBItw4AAAJ

running on: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz

flamegraphs as svg if somebody wants to look in detail:
flamegraphs.zip

I feel like there is a lot of small improvements that can be done to optimize things, but we will see how big of an impact they will have.

rakita · 2021-12-12T19:25:53Z

switching parity u256 div_mod_word with zkp_u256 gives me good boost ~64-66ms on linux that is even better than zkp_u256
link

same output was got with just using parity u256 div_mod_word uncommented code

rakita · 2021-12-13T11:09:04Z

~56-58ms on windows with improved parity u256.

test is found in bin/revm-test/ and executed with cargo run --release

rakita · 2021-12-14T21:35:19Z

My test was called only once per execution, and I would execute it multiple times to get range of timing. I changed that and introduced loop, so now the execution test is called 50times. So after a few iterations, i am getting a better time than windows

elapsed: 53.666179ms
0: 65.588152ms
1: 63.255175ms
2: 57.723127ms
3: 56.212264ms
4: 53.734064ms
5: 53.121586ms
6: 53.089055ms
7: 53.133512ms
8: 53.082209ms
9: 53.090587ms
10: 53.045255ms
11: 53.880638ms
12: 53.16134ms
13: 52.969316ms
14: 53.033339ms
15: 53.167286ms
16: 53.091371ms
17: 53.054458ms
18: 53.067683ms
19: 53.243839ms
20: 53.085979ms
21: 53.122794ms
22: 53.06014ms
23: 53.123104ms
24: 53.072308ms
25: 53.119213ms
26: 53.072579ms
27: 53.094516ms
28: 53.139832ms
29: 53.038691ms
30: 53.094649ms
31: 53.293706ms
32: 52.844196ms
33: 51.876471ms
34: 52.991977ms
35: 53.015948ms
36: 53.241124ms
37: 52.784502ms
38: 52.94318ms
39: 52.920714ms
40: 52.792951ms
41: 53.023354ms
42: 53.096627ms
43: 53.086917ms
44: 52.479412ms
45: 52.817731ms
46: 53.05368ms
47: 52.982625ms
48: 53.16019ms
49: 53.135602ms

And I am getting close to evmone:

advanced/total/snailtracer/benchmark        51468 us        51466 us           13 gas_rate=4.47271G/s gas_used=230.193M
baseline/total/snailtracer/benchmark        46800 us        46762 us           15 gas_rate=4.92267G/s gas_used=230.193M

rakita · 2021-12-23T23:48:13Z

after binding intx directly I am getting even better results that are comparable with evmone (changes are at intx branch):

mean: 48.905952ms
median: 48.82769ms
0: 49.88344ms
1: 50.16717ms
2: 47.413608ms
3: 48.678762ms
4: 48.776993ms
5: 48.747ms
6: 48.434196ms
7: 48.795624ms
8: 49.002815ms
9: 48.859757ms
10: 48.972574ms
11: 48.752764ms
12: 48.724995ms
13: 48.790919ms
14: 48.897968ms
15: 48.52337ms
16: 49.149537ms
17: 49.326058ms
18: 48.927653ms
19: 49.293851ms

And flamegraph with that change looks like this (zipped svg file: flamegraph.zip):

I will not merge intx to main brach, proper way should be to reimplement it into rust. There is two issues regarding that for future improvements: #22 and #23

I feel like this is okay to close, revm got very close to evmone and timings looks good. There is probably some optimization that can be done on Host part, evmone uses MockedHost for testing while revm has only standard host impl and mock Database ( you can see from flamegraph sload takes a lot of time), but i will leave this for later. It was fun ride.

aewc · 2022-09-15T07:04:22Z

Is there any clear documentation comparing the performance of REVM with other EVMs, especially parity EVM which is also based on Rust？

rakita · 2022-09-18T16:04:31Z

Is there any clear documentation comparing the performance of REVM with other EVMs, especially parity EVM which is also based on Rust？

If you found one, please forward it to me.
In general, this issue is comparing revm with evmone, and there is this comparison with sputnikvm here: cassc/rust-evm-bench#2

flyq · 2023-01-29T07:09:11Z

Thanks, more info:
https://github.com/ziyadedher/evm-bench

chore: upstream sync

* fix error count of sload opcode * format code --------- Co-authored-by: anonymousGiga <cryptonymGong@gmail.com>

rakita mentioned this issue Dec 12, 2021

U256 div measurement paritytech/parity-common#608

Closed

rakita mentioned this issue Dec 23, 2021

backport intx div #22

Closed

rakita closed this as completed Dec 23, 2021

rakita pushed a commit that referenced this issue Sep 22, 2023

Merge pull request #7 from anton-rs/refcell/upstream-sync

f97400f

chore: upstream sync

anonymousGiga added a commit to anonymousGiga/revm that referenced this issue Nov 21, 2023

fix error count of sload opcode (bluealloy#7)

47e867e

* fix error count of sload opcode * format code --------- Co-authored-by: anonymousGiga <cryptonymGong@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark against other EVMs #7

Benchmark against other EVMs #7

lightclient commented Nov 9, 2021

rakita commented Nov 10, 2021

rakita commented Nov 14, 2021 •

edited

rakita commented Nov 15, 2021

rakita commented Nov 17, 2021

rakita commented Nov 23, 2021

rakita commented Dec 12, 2021

rakita commented Dec 12, 2021 •

edited

rakita commented Dec 13, 2021

rakita commented Dec 14, 2021

rakita commented Dec 23, 2021

aewc commented Sep 15, 2022

rakita commented Sep 18, 2022

flyq commented Jan 29, 2023

Benchmark against other EVMs #7

Benchmark against other EVMs #7

Comments

lightclient commented Nov 9, 2021

rakita commented Nov 10, 2021

rakita commented Nov 14, 2021 • edited

rakita commented Nov 15, 2021

rakita commented Nov 17, 2021

rakita commented Nov 23, 2021

rakita commented Dec 12, 2021

rakita commented Dec 12, 2021 • edited

rakita commented Dec 13, 2021

rakita commented Dec 14, 2021

rakita commented Dec 23, 2021

aewc commented Sep 15, 2022

rakita commented Sep 18, 2022

flyq commented Jan 29, 2023

rakita commented Nov 14, 2021 •

edited

rakita commented Dec 12, 2021 •

edited