-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shared memories #4187
Add shared memories #4187
Conversation
d58ae3e
to
d74a422
Compare
d74a422
to
e9c4f5f
Compare
Subscribe to Label Actioncc @fitzgen, @peterhuene
This issue or pull request has been labeled: "cranelift", "cranelift:module", "wasmtime:api", "wasmtime:ref-types"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
f2ffbee
to
5b765d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is some good progress! I realize this might already be getting to be somewhat of an unwieldy PR so if you'd like I think various things below can also be deferred to later. If you'd prefer to do that I think it would be good to make an issue with remaining work items and to make sure everything is reflected in that before landing this.
One other issue not mentioned below though that I want to be sure to not forget is that all of Wasmtime's interactions with shared memories through intrinsics like memory.init
are going to need to be updated for shared memories. I think in general Wasmtime needs to be very carefully audited for the rampant usage of VMMemoryDefinition
and have that all locked down through some different abstraction in the case of shared or non-shared memory. I think a static distinction at the top level with Memory
and SharedMemory
will help with this (I commented below but I think we may actually want to prevent a SharedMemory
from being viewed as a Memory
) but we'll still want to audit internals for implementations like memory.init
, memory.fill
, memory.copy
, etc.
c6410bc
to
ec08f2c
Compare
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:c-api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
5aa957b
to
15e23fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some pieces I saw going through, but I think I'll want to do one more final pass week before landing. Overall I think tihs is looking quite good and is basically ready to land, I've mostly just got some little things here and there.
} | ||
} | ||
|
||
/// Return the indexed `VMMemoryDefinition`. | ||
fn memory(&self, index: DefinedMemoryIndex) -> VMMemoryDefinition { | ||
unsafe { *self.memory_ptr(index) } | ||
unsafe { VMMemoryDefinition::load(self.memory_ptr(index)) } | ||
} | ||
|
||
/// Set the indexed memory to `VMMemoryDefinition`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading over this again, could this function perhaps be changed to take an OwnedMemoryIndex
since those are the only ones we should ever set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it. Instance::memory
is called by Instance::get_memory
which is used by memory_copy
, memory_fill
, etc. That functionality should still be present for shared memory I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I couldn't place this comment well due to the PR not actually covering the relevant line of code, I mean the set_memory
function just below this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, see my comment here: it's tricky to re-calculate the owned index for every memory.grow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a general pass over this and added a few questions below. I'll defer to @alexcrichton for final approval, but some of the things below I think would help to clarify and boost confidence a bit. Thanks!
crates/runtime/src/vmcontext.rs
Outdated
/// Return the current length of the [`VMMemoryDefinition`] by performing a | ||
/// relaxed load. Concurrent growth of shared memory may alter the value of | ||
/// this field in unexpected ways: do not use this function for situations | ||
/// in which a precise length is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc-comment seems a little unclear to me: if the result isn't exactly precise at all times, how might it differ? Is it always smaller or equal (because memory can only grow but it may be outdated)? Or something else? It seems we need a stronger statement here to rely on it at all.
I also wonder if this is necessary vs. a variant that uses Ordering::Acquire
and makes a stronger claim? Or in other words, can we justify why a "fast and imprecise" variant is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that raises another question for me about atomic orderings which sort of furthers my concerns about using ptr::copy
for memory.copy
and friends. If one thread witnesses that the shared memory has a particular size I don't know what the guarantees are about witnessing other parts of memory in terms of memory orderings. For example should this be Acquire
? Should this be SeqCst
? Do wasm modules otherwise need to atomic.fence
before calling host functionality or something like that?
I personally know very little about the atomic memory model for wasm and how it's all intended to work out so there may not be a great answer here. That being said it's almost certainly not wrong to fall back to SeqCst
so I might say we should do that to be maximally conservative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the reason @alexcrichton suggested this relaxed function was for all the cases where the length is retrieved but no synchronization was necessary (i.e., anything related to owned memory). I could certainly make this SeqCst
but that doesn't totally make sense for all the owned memory cases. Since, like @cfallin mentioned, current_length()
can only differ by appearing smaller than what another thread sees, I will update the documentation comment accordingly. This function should be precise for any owned memories (regardless of what ordering is used) and will be safely imprecise for shared memories but under-estimating (at least until the spec changes to allow memory shrinking in some way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to consider here is the effect on the synchronization of this precise field, the current length, but there's also the effects of whether we can see the rest of concurrent memory modifications or such. For example if we read the current length at a particular value what does that mean for reading other pieces of memory?
I'm not sure what the answer is but I'm pretty sure it's affected by the memory orderings here. I don't know what the desired memory orderings are so I would personally use SeqCst
everywhere because I know it's not wrong. Then again SeqCst
can also be buggy because it provides too strong of an ordering which means we can't relax it in the future when it turns out we're not supposed to have such a strong ordering.
Anyway these are also theoretical concerns of mine and don't necessarily need any changes at this time.
// VMMemoryDefinition` to it and dereference that when | ||
// atomically growing it. | ||
let from_offset = self.offsets.vmctx_vmmemory_pointer(def_index); | ||
let memory = func.create_global_value(ir::GlobalValueData::Load { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the sense of "make sure we don't feed current_length
to the heap-access machinery in Cranelift"-safety, we could maybe make current_length_offset
an Option
here, and return None
for the shared case. Then below wherever we use it, unwrap it with "expected current-length field (always present on non-shared memories)" or something of the sort as an expect
error?
This change adds the ability to use shared memories in Wasmtime when the [threads proposal] is enabled. Shared memories are annotated as `shared` in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are protected from concurrent access during `memory.size` and `memory.grow`. [threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md In order to implement this in Wasmtime, there are two main cases to cover: - a program may simply create a shared memory and possibly export it; this means that Wasmtime itself must be able to create shared memories - a user may create a shared memory externally and pass it in as an import during instantiation; this is the case when the program contains code like `(import "env" "memory" (memory 1 1 shared))`--this case is handled by a new Wasmtime API type--`SharedMemory` Because of the first case, this change allows any of the current memory-creation mechanisms to work as-is. Wasmtime can still create either static or dynamic memories in either on-demand or pooling modes, and any of these memories can be considered shared. When shared, the `Memory` runtime container will lock appropriately during `memory.size` and `memory.grow` operations; since all memories use this container, it is an ideal place for implementing the locking once and once only. The second case is covered by the new `SharedMemory` structure. It uses the same `Mmap` allocation under the hood as non-shared memories, but allows the user to perform the allocation externally to Wasmtime and share the memory across threads (via an `Arc`). The pointer address to the actual memory is carefully wired through and owned by the `SharedMemory` structure itself. This means that there are differing views of where to access the pointer (i.e., `VMMemoryDefinition`): for owned memories (the default), the `VMMemoryDefinition` is stored directly by the `VMContext`; in the `SharedMemory` case, however, this `VMContext` must point to this separate structure. To ensure that the `VMContext` can always point to the correct `VMMemoryDefinition`, this change alters the `VMContext` structure. Since a `SharedMemory` owns its own `VMMemoryDefinition`, the `defined_memories` table in the `VMContext` becomes a sequence of pointers--in the shared memory case, they point to the `VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a new table, `owned_memories`. This change adds an additional indirection (through the `*mut VMMemoryDefinition` pointer) that could add overhead. Using an imported memory as a proxy, we measured a 1-3% overhead of this approach on the `pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will special-case the owned memory access (i.e., load a pointer directly to the `owned_memories` entry) for `memory.size` so that only shared memories (and imported memories, as before) incur the indirection cost.
5975db1
to
14ce663
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks again for doing all this @abrown and being patient with reviews! I'm going to approve and merge this now. Further work is tracked on #4245 but I think this is a solid enough base to build on which should be workable enough for the next level of experimentation and feedback on the implementation.
This change adds the ability to use shared memories in Wasmtime when the
threads proposal is enabled. Shared memories are annotated as
shared
in the WebAssembly syntax, e.g.,
(memory 1 1 shared)
, and areprotected from concurrent access during
memory.size
andmemory.grow
.In order to implement this in Wasmtime, there are two main cases to
cover:
- a program may simply create a shared memory and possibly export it;
this means that Wasmtime itself must be able to create shared
memories
- a user may create a shared memory externally and pass it in as an
import during instantiation; this is the case when the program
contains code like
(import "env" "memory" (memory 1 1 shared))
--this case is handled by a new Wasmtime APItype--
SharedMemory
Because of the first case, this change allows any of the current
memory-creation mechanisms to work as-is. Wasmtime can still create
either static or dynamic memories in either on-demand or pooling modes,
and any of these memories can be considered shared. When shared, the
Memory
runtime container will lock appropriately duringmemory.size
and
memory.grow
operations; since all memories use this container, itis an ideal place for implementing the locking once and once only.
The second case is covered by the new
SharedMemory
structure. It usesthe same
Mmap
allocation under the hood as non-shared memories, butallows the user to perform the allocation externally to Wasmtime and
share the memory across threads (via an
Arc
). The pointer address tothe actual memory is carefully wired through and owned by the
SharedMemory
structure itself. This means that there are differingviews of where to access the pointer (i.e.,
VMMemoryDefinition
): forowned memories (the default), the
VMMemoryDefinition
is storeddirectly by the
VMContext
; in theSharedMemory
case, however, thisVMContext
must point to this separate structure.To ensure that the
VMContext
can always point to the correctVMMemoryDefinition
, this change alters theVMContext
structure.Since a
SharedMemory
owns its ownVMMemoryDefinition
, thedefined_memories
table in theVMContext
becomes a sequence ofpointers--in the shared memory case, they point to the
VMMemoryDefinition
owned by theSharedMemory
and in the owned memorycase (i.e., not shared) they point to
VMMemoryDefinition
s stored in anew table,
owned_memories
.This change adds an additional indirection (through the
*mut VMMemoryDefinition
pointer) that could add overhead. Using an importedmemory as a proxy, we measured a 1-3% overhead of this approach on the
pulldown-cmark
benchmark. To avoid this, Cranelift-generated code willspecial-case the owned memory access (i.e., load a pointer directly to
the
owned_memories
entry) formemory.size
so that onlyshared memories (and imported memories, as before) incur the indirection
cost.