-
Notifications
You must be signed in to change notification settings - Fork 4
Description
(I'm a little late to the party here and not proposing immediate work on this, but I thought it'd still be useful to have an issue to collect discussion and use cases to potentially motivate work on this later.)
It might be a good idea, at least initially, to consider a limited version of the mappable sub-proposal that preserves full portability of wasm across a wide variety of hardware and OSes, similar to how the static-protection sub-proposal extracted a subset of the virtual sub-proposal that can be cheaply implemented without any hardware/OS support.
Specifically, WAMR has a really interesting "shared heaps" feature (1, 2, 3) that I think could serve as a guide for defining fully-portable mapping functionality. The links have more details, but the basic idea is:
- allow memories to opt-in to having a dynamically-mappable region. As a wasm proposal, this opt-in could happen on the
memtype(e.g.,(memory mappable ...)) - when
mappableis set, alimitmust also be set to something less than 2gb; the mappings added bymemory.map(however defined) are added above thelimit - allow only a very-small finite (maybe just 1?) number of mappings in a particular memory at any one time
- in the same internal "vm context" structure that currently stores the linear memory (base, length) pair, add a (base, length) pair for each possible mapping (giving unused mappings a length of 0)
- compiled loads/stores first branch on the high bit, with the unset path using the regular linear memory (base, length) and the set path falling back to the mappable (base, length) pairs
In the worst case, the inline code is an if/else diamond, but it seems like it could be optimized in various ways, including folding the branch into any existing branching needed for memory64 or when full-sized guard pages aren't used. One can also imagine various HW tricks. Unfortunately I don't have specific performance data to cite here; if anyone from WAMR is able to bring any such measurements that would be much appreciated!
One really nice thing about this design is that it allows memory.map/memory.unmap to be really fast (inline loads/stores from the "vm context") instead of syscalls that require TLB flushes. That means that if a wasm module has multiple buffers to work on, it could quickly switch as needed without worrying about the switching cost. This approach also alleviates the systemic performance problems that otherwise arise from mmap and other virtual-memory-modifying syscalls due to "TLB shootdowns".
An immediate problem this could solve in the browser is the age old question of how to efficiently access an ArrayBuffer from wasm. And since many Web APIs (e.g., Web Audio) churn out new ArrayBuffers at a high frequency, being able to memory.{map,unmap} without an expensive syscall seems ideal. [Edit, added:] The key enabler here is that the above implementation strategy can map any memory at any address without the memory having to have been specially prepared like being backed by a file descriptor.
I realize that this can't solve every use case, but given the importance of portability to wasm, this seems like an attractive compromise worth considering, at least as a first step.
Thoughts?