Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasmer discovery and prototype #3542

Closed
piotr-dziubecki opened this issue Jan 3, 2023 · 1 comment
Closed

Wasmer discovery and prototype #3542

piotr-dziubecki opened this issue Jan 3, 2023 · 1 comment
Assignees

Comments

@piotr-dziubecki
Copy link

No description provided.

@mpapierski
Copy link
Collaborator

Research

  • Wasmer supports different compilers with single API
  • Singlepass, cranelift and LLVM backends were examined
  • Singlepass is used by multiple blockchains with smart contracts. It says in the docs that it's suitable for blockchain applications, and NEAR protocol seems to be heavily invested into that. More blockchains are following this path (i.e. CosmWasm).
  • Singlepass has shortest compilation times at the expense of slightly slower execution time. This is good for untrusted code as we don't pay unnecessary cost of optimizer that could be potentially exploited. And we can't easily charge for the compilation phase.
  • I was unable to properly examine compilation times of LLVM backend due to strange Wasm memory issues at the runtime and I wasn't able to find reasons why, but given that it's cumbersome to compile (requires compiled llvm at least 12.0, increases project compile times, etc.) I didn't go too deep into that. Probably requires extra set up but docs aren't exactly clear how.
  • It was, however, complicated to integrate wasmer due to Send + Sync requirement of their API which required major refactor of the Runtime<R> object to pass these trait requirements.

Testing procedure

  • All the integration tests are augmented so each exec_request can be timed which gives about ~1500 unique wasm executions. Total time of an EngineState::exec() call is measured.
  • Total exec() time is split into preprocess time (time from receiving wasm bytes before invoking wasm function), and invoke time (only the wasm invoke time)
  • Wasm backend can be swapped at runtime with a chainspec config to avoid expensive recompilation. Same code is ran for each backend compared.

The pipeline for ModuleBytes case consists of:

  • Deserialize module
  • Modify the wasm module by performing validation steps, calling inject_gas_limiter, calling stack_height::inject_limiter etc.
    • For Wasmi case: create wasmi::Module from parity_wasm::elements::Module (cheap)
    • For Wasmer case: serialize instrumented parity_wasm::elements::Module with gas calls and stack height etc. to raw wasm, and parse this wasm again using wasmer::Module
  • Caching is much simplified as a global "Cache<Bytes, Bytes>" object where key is the original bytes, and value is preprocessed bytes.

Comparison of different backends

Comparison of results with the set up above:

  • Wasmi with caching is faster than wasmi without caching. This is something potentially that can be taken outside this research and implemented to speed up the execution of repeatedly called deploys

  • Wasmer + singlepass compiler with caching is faster than wasmi with caching

  • Wasmer without caching is slower than wasmi without caching likely due to preprocessing stage as seen in the report below.

    execution_results_interpreted.txt preprocess time 2.711912s
    execution_results_singlepass.txt preprocess time 11.105839s

    as noted above singlepass compiler pays the extra cost for serializing gas injected parity_wasm::elements::Module cost.

Summary

We know wasmer singlepass compiler with caching outperforms cached wasmi, but for the worst-case case (i.e. one time use wasm) it is slower likely due to the expensive preprocess stage.

Current angle I want to look at in #3575 is to check how much effort it could be to upgrade wasmi to most recent version, and compare that. And I want to take an attempt at improving preprocess time for wasmer to avoid paying the extra wasm re-serialization cost, and instead take a look at https://docs.rs/wasmer-middlewares/latest/wasmer_middlewares/metering/struct.Metering.html which not only doesn't require re-serialization but keeps internal global variable remaining_points rather than calling the host, which should be much faster when compiled to native code.

My hope is to see similar gas costs for deploys and at the same time preprocess stage should be faster than wasmi's to achieve faster worst-case execution.

Raw results

# wasmi with caching vs wasmi
execution_results_interpreted.txt total wasm time 7.642741999999999s (7642742 micros)
execution_results_interpreted_cache.txt total wasm time 6.7128179999999995s (6712818 micros)
execution_results_interpreted_cache.txt is faster 1279 times
execution_results_interpreted.txt is faster 281
execution_results_interpreted.txt preprocess time 2.711912s
execution_results_interpreted_cache.txt preprocess time 1.8048719999999998s
execution_results_interpreted.txt invoke time 3.0501989999999997s
execution_results_interpreted_cache.txt invoke time 3.037845s
execution_results_interpreted.txt gas per sec highest 41073073.61702128
execution_results_interpreted.txt gas per sec lowest 0.0
execution_results_interpreted.txt gas per sec avg 3916004.117391211
execution_results_interpreted_cache.txt gas per sec highest 60326076.875
execution_results_interpreted_cache.txt gas per sec lowest 0.0
execution_results_interpreted_cache.txt gas per sec avg 4829253.206918649
# wasmi cache vs wasmer singlepass cache
execution_results_interpreted_cache.txt total wasm time 6.7128179999999995s (6712818 micros)
execution_results_singlepass_cache.txt total wasm time 4.366896s (4366896 micros)
execution_results_singlepass_cache.txt is faster 1411 times
execution_results_interpreted_cache.txt is faster 149
execution_results_interpreted_cache.txt preprocess time 1.8048719999999998s
execution_results_singlepass_cache.txt preprocess time 1.3111949999999999s
execution_results_interpreted_cache.txt invoke time 3.037845s
execution_results_singlepass_cache.txt invoke time 1.443193s
execution_results_interpreted_cache.txt gas per sec highest 60326076.875
execution_results_interpreted_cache.txt gas per sec lowest 0.0
execution_results_interpreted_cache.txt gas per sec avg 4829253.206918649
execution_results_singlepass_cache.txt gas per sec highest 27721350.258064516
execution_results_singlepass_cache.txt gas per sec lowest 0.0
execution_results_singlepass_cache.txt gas per sec avg 8545761.959139578
# wasmi no cache vs wasmer no cache
execution_results_interpreted.txt total wasm time 7.642741999999999s (7642742 micros)
execution_results_singlepass.txt total wasm time 17.622844999999998s (17622845 micros)
execution_results_singlepass.txt is faster 43 times
execution_results_interpreted.txt is faster 1518
execution_results_interpreted.txt preprocess time 2.711912s
execution_results_singlepass.txt preprocess time 11.105839s
execution_results_interpreted.txt invoke time 3.0501989999999997s
execution_results_singlepass.txt invoke time 3.0186859999999998s
execution_results_interpreted.txt gas per sec highest 41073073.61702128
execution_results_interpreted.txt gas per sec lowest 0.0
execution_results_interpreted.txt gas per sec avg 3916004.117391211
execution_results_singlepass.txt gas per sec highest 22406249.37789748
execution_results_singlepass.txt gas per sec lowest 0.0
execution_results_singlepass.txt gas per sec avg 1578317.7440372033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants