-
Notifications
You must be signed in to change notification settings - Fork 694
Threads + memory growth + JS #1271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@lukewagner @littledan you expressed some interest in this. I'd be curious to see some quick numbers about how slow it is to create a new view for every memory access, and for calling into wasm for each memory access too. As for |
Not knowing enough about the total design space, I really like having typed loads on the memory object itself. |
@jgravelle-google Seems unlikely that ES would want that, since this functionality is already provided by |
This proposal sounds good to me, at a high level. This would be the first time that WebAssembly text format instruction names would show up in JavaScript AFAIK. Bikeshedding, for the load and store API, I'm wondering, what if we gave I don't understand the view API. Is this about making a TypedArray over the WebAssembly.Memory? Would this TypedArray be expected to remain "live" as memory grows? |
Oh, that's an interesting point. WebAssembly is always little-endian, but TypedArrays are platform byte order. So using normal views is already going to be incorrect behavior.
I think they're meant to be normal TypedArrays, so their size won't change. |
Yeah, good point, the view API here is less useful. I forgot about the typed array constructor with 3 parameters - and even without that it's not a huge win, I guess. It does seem kind of nice for completeness though.
Yeah, the idea was it would remain live exactly as typed arrays already do (no new features there).
Most overhead would probably be from the GC overhead of creating a new object on each load&store. It's hard to measure that directly since the overhead ends up in GC pauses, is affected by GC optimizations, etc. But this is the exact issue that led to the creation of GC-free entry points for WebGL in WebGL2, which are very useful. (It's potentially even more significant here, since there it saves creating a view per GL call, while here it could be a view per load and store.) |
It is an unfortunate problem to work around. A couple of naive questions:
|
Oh, I didn't realize this. That makes the DataView-analogous API a reasonable interface on the JS side (as long as we're OK with the flip in endianness).
OK, in this case, I don't really understand the purpose of this part of the proposal--this seems like a way to get around a single access to the ArrayBuffer, but it still forces you to allocate a new TypedArray. It's easier to understand why you would want the load/store methods (since that's a bit of indirection and allocation avoided, and totally solves the growth synchronization problem). |
You could do this, but it would only help if you make sure you return to the event loop.
Yes, that seems possible. But it's not trivial -- ultimately you want these to optimize down to a single (perhaps bounds-checked) load, but at least for v8 there's a lot of stuff the optimizer would have to remove to make that happen.
Ha, I see! The default for DataView is big-endian. Might be a bit confusing if we reuse the naming convention there, but it seems like a small risk to me. |
I would also be interesting to see data comparing a non-growable-memory app (that just uses unchanging cached views vs. a growable-memory version of the same app that polyfills the abovementioned In the common case, the buffer doesn't change and so the overhead is just that of calling the |
Sure, here are some numbers. The polyfill and benchmark files are in this gist. The benchmark calls emscripten's AsciiToString function repeatedly, which is a function that does many memory reads, so in practice we replace the line
with
The latter is 2.9x slower on v8. As Luke requested this app doesn't actually grow memory, so a new view is never created - the overhead comes from the function call and the check. |
Thanks Alon, that's really helpful. So I think the next step would be to see what the performance was like if the |
By inlined, do you mean something like
? |
With
it's about the same speed, so, still 2.9x slower. (Which I think makes sense as I assume it inlines eventually in such a hot loop.) |
Oh, no, I mean the engine-implemented C++ accessor function for |
Related: #1210 which proposes an event handler. |
Couldn't the event be dispatched synchronously? |
Unfortunately, in the shared-memory case, the grow can happen completely racily (due to grow in another thread). |
(in wasm - no support in asm.js) Background: WebAssembly/design#1271 * Add GROWABLE_HEAP_* methods that replace each load and store in JS. These safely check if memory grew, and updates the buffer and views if so. The JS optimizer is used to instrument the JS in this way, so it works for user JS as well. * Show a warning about the slowness during compilation. * Make sbrk etc. threadsafe. * Rewrite brk using sbrk, to avoid writing two pieces of threadsafe code.
Do TypedArray .length accessors get inlined? I get much better performance if build
ie only check if the buffer needs to grow if the program actually attempts to access out of bound memory. I tested on Emscripten 1.38.31 building with:
On my system (with Chrome 74) it took about 8 seconds with the exiting implementation of |
…ore#8365) (in wasm - no support in asm.js) Background: WebAssembly/design#1271 * Add GROWABLE_HEAP_* methods that replace each load and store in JS. These safely check if memory grew, and updates the buffer and views if so. The JS optimizer is used to instrument the JS in this way, so it works for user JS as well. * Show a warning about the slowness during compilation. * Make sbrk etc. threadsafe. * Rewrite brk using sbrk, to avoid writing two pieces of threadsafe code.
emcc:WARNING: ignoring unsupported linker flag: when compile opencv.js |
Suppress the link-time warning emitted by Emscripten due to our usage of multi-threading together with dynamic memory growth: warning: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see WebAssembly/design#1271 We took this warning into consideration and decided to trade the performance in favor of lower memory consumption. (In reality, the performance loss is likely small, since a big chunk of the run time is spent on interprocess communication.) So reduce the build log spam by silencing this warning, which was emitted many times - once for every target binary or static library. This change is a small improvement to the WebAssembly build support that is tracked by #177.
@kripken I was playing around with your benchmark, and i think i came up with an option to reduce the overhead of Instead of always relying on the array views, have a native function to call into which can cast the memory address, since this is on the wasm side no need to check if the underlying buffer has changed. Here is the new benchmark.cpp#include <iostream>
#include <string>
#include <chrono>
#include <emscripten.h>
extern "C" {
EMSCRIPTEN_KEEPALIVE uint8_t view_uint8(void* address) { return reinterpret_cast<uint8_t*>(address)[0]; }
}
int main() {
auto start = std::chrono::steady_clock::now();
size_t total = 0;
for (const char* str : { "hello", "world", "!", "test", "ing", "strings", "many", "of", "them" }) {
total += EM_ASM_INT({
var total = 0;
var value;
for (var i = 0; i < 1000000; ++i) {
value = AsciiToString($0);
total += value.length;
}
console.log(value);
return total;
}, str);
}
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - start).count() ;
std::cout << total << " in " << ms << " ms " << total/ms << " /ms \n";
} NO_ALLOW_MEMORY_GROWTH~/emscripten/benchmark ((3.1.7))# em++ -Os -pthread -o -sNO_ALLOW_MEMORY_GROWTH -o benchmark.js benchmark.cpp
~/emscripten/benchmark ((3.1.7))# node --experimental-wasm-threads --experimental-wasm-bulk-memory benchmark.js
hello
world
!
test
ing
strings
many
of
them
35000000 in 293 ms 119453 /ms ALLOW_MEMORY_GROWTH~/emscripten/benchmark ((3.1.7))# em++ -Os -pthread -sALLOW_MEMORY_GROWTH -o benchmark.js benchmark.cpp
em++: warning: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see https://github.com/WebAssembly/design/issues/1271 [-Wpthreads-mem-growth]
~/emscripten/benchmark ((3.1.7))# node --experimental-wasm-threads --experimental-wasm-bulk-memory benchmark.js
hello
world
!
test
ing
strings
many
of
them
35000000 in 1328 ms 26355 /ms Swapping out AsciiToString with: function AsciiToString(ptr){
var str="";
while(1) {
var ch=Module['_view_uint8'](ptr++>>0) >>> 0;
if(!ch) return str;
str+=String.fromCharCode(ch)
}
} Results!~/emscripten/benchmark ((3.1.7))# node --experimental-wasm-threads --experimental-wasm-bulk-memory benchmark.js
hello
world
!
test
ing
strings
many
of
them
35000000 in 408 ms 85784 /ms |
@arsnyder16 Interesting, thanks, I'm impressed and surprised that calling into wasm is faster here. I would have guessed otherwise. But I guess optimizations on the JS side are hard to predict... |
Unify the settings for single-threaded and multi-threaded builds; Qt now always enables heap growth by default. This means we don't have to reserve a large (1GB) fixed memory size, but can instead set the smaller (50 MB) initial memory size, like the single-threaded build does. Enabling threads + memory growth can potentially cause a performance regression when accessing heap memory from JavaScript (WebAssembly/design#1271). We leave it for the application to decide if this applies, and if the switch to fixed memory should be made. Change-Id: I96988b072506456685086e55aca4007a146bd70f Reviewed-by: Alexandru Croitor <alexandru.croitor@qt.io> Reviewed-by: Mikołaj Boc <Mikolaj.Boc@qt.io> Reviewed-by: Lorn Potter <lorn.potter@gmail.com>
@kripken Any progress on this issue ? If this seems like it might not be addressed in the short term. I am wondering if my |
@arsnyder16 Let's move that topic to an issue on the emscripten tracker? (as this is the wasm design repo, and your idea is an implementation detail) But in general it sounds interesting, if we see clear performance benefits. |
@kripken Will do. |
Hey! Is there any update on this? :) |
I hope it's ok to ask for current status every 6 months or so :) |
I think this is resolved by RAB/GSAB (https://github.com/tc39/proposal-resizablearraybuffer) |
In the last meeting we briefly discussed this, and it was suggested that I write up more details of the problem and some possible solutions. I did that here:
https://gist.github.com/kripken/949eab99b7bc34f67c12140814d2b595
Feedback welcome!
The text was updated successfully, but these errors were encountered: