Don't copy `VMBuiltinFunctionsArray` into each `VMContext` #3741

alexcrichton · 2022-01-28T21:24:00Z

This is another PR along the lines of "let's squeeze all possible
performance we can out of instantiation". Before this PR we would copy,
by value, the contents of VMBuiltinFunctionsArray into each
VMContext allocated. This array of function pointers is modestly-sized
but growing over time as we add various intrinsics. Additionally it's
the exact same for all VMContext allocations.

This PR attempts to speed up instantiation slightly by instead storing
an indirection to the function array. This means that calling a builtin
intrinsic is a tad bit slower since it requires two loads instead of one
(one to get the base pointer, another to get the actual address).
Otherwise though VMContext initialization is now simply setting one
pointer instead of doing a memcpy from one location to another.

With some macro-magic this commit also replaces the previous
implementation with one that's more const-friendly which also gets us
compile-time type-checks of libcalls as well as compile-time
verification that all libcalls are defined.

Overall, as with #3739, the win is very modest here. Locally I measured
a speedup from 1.9us to 1.7us taken to instantiate an empty module with
one function. While small at these scales it's still a 10% improvement!

This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with bytecodealliance#3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement!

cfallin

Nice!

cfallin · 2022-01-28T21:36:49Z

crates/runtime/src/vmcontext.rs

-            for i in 0..ptrs.len() {
-                debug_assert!(ptrs[i] != 0, "index {} is not initialized", i);
+        impl VMBuiltinFunctionsArray {
+            pub fn new() -> &'static Self {


Rather than new(), can we call this static_instance() or something like that? As written it was somewhat concerning to read VMBuiltinFunctionsArray::new() in the initialization path -- it looked like an allocation of an owned thing.

Aha looks like I can do one better and have const INIT: VMBuiltinFunctionsArray = ...!

bjorn3 · 2022-01-28T22:04:52Z

crates/runtime/src/libcalls.rs

    vmctx: *mut VMContext,
    delta: u64,
    memory_index: u32,
-) -> usize {
+) -> *mut u8 {


This function doesn't return an actual pointer, right?

Correct, this is just how the type-checking with the macro worked out.

…lliance#3741) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with bytecodealliance#3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments

fitzgen approved these changes Jan 28, 2022

View reviewed changes

cfallin approved these changes Jan 28, 2022

View reviewed changes

Review comments

a866b17

bjorn3 reviewed Jan 28, 2022

View reviewed changes

alexcrichton merged commit a25f7bd into bytecodealliance:main Jan 28, 2022

alexcrichton deleted the builtins-nocopy branch January 28, 2022 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't copy `VMBuiltinFunctionsArray` into each `VMContext` #3741

Don't copy `VMBuiltinFunctionsArray` into each `VMContext` #3741

alexcrichton commented Jan 28, 2022

cfallin left a comment

cfallin Jan 28, 2022

alexcrichton Jan 28, 2022

bjorn3 Jan 28, 2022

alexcrichton Jan 28, 2022

Don't copy VMBuiltinFunctionsArray into each VMContext #3741

Don't copy VMBuiltinFunctionsArray into each VMContext #3741

Conversation

alexcrichton commented Jan 28, 2022

cfallin left a comment

Choose a reason for hiding this comment

cfallin Jan 28, 2022

Choose a reason for hiding this comment

alexcrichton Jan 28, 2022

Choose a reason for hiding this comment

bjorn3 Jan 28, 2022

Choose a reason for hiding this comment

alexcrichton Jan 28, 2022

Choose a reason for hiding this comment

Don't copy `VMBuiltinFunctionsArray` into each `VMContext` #3741

Don't copy `VMBuiltinFunctionsArray` into each `VMContext` #3741