Breaking up large (20 MB+) .wasm files #1166

camwest · 2017-12-14T23:52:34Z

Hi folks,

I was asked by @jfbastien to post here based on a twitter conversation we had: https://twitter.com/jfbastien/status/941170112014327808

AutoCAD web is a flavor of AutoCAD which runs entirely in the browser thanks to WebAssembly. We took our core engine, removed everything we could to get it to the smallest possible size. The resulting wasm file is currently 29.6 MB. It's now in beta if you want to try it out: http://client.autocad360.com/

Problem

Our team has been working on this for a while now, and we expect many different users with slow internet connections to use the application. We want to optimize for first-time use as much as possible. We also realize that this represents one of the largest web applications out there.

Ideally, we'd break up the wasm file into smaller chunks, where the first chunks downloaded would only represent the minimum code necessary to display graphics, show the cursor, and let the user zoom and pan around while the commands and other modules are lazy loaded.

We don't currently have a good strategy for defining split points. The desktop variant of the product uses a virtual memory manager (VMM) and profile guided optimization (PGO) to optimize startup time, and our hunch is that this is a good strategy for partitioning our code into "hot" and "cold" chunks. It's not feasible for us to re-design the core engine smaller than it is through manual definition of split points.

Tool based Solution

We think we can solve this problem by investing in tooling internally such that we extend PGO so that we emit two wasm files, one hot wasm file with stubs which, when called, block until the cold wasm file is downloaded and started. Managing two wasm files in JavaScript feels like a pretty nasty hack since the boundary between the hot and cold wasm files would most likely involve crossing from wasm to js to wasm again.

Browser based Solution

It would be even better if we could work with the browser vendors and solve the problem by extending the design of WebAssembly to support this use case. I expect lots of larger applications would be able to take advantage of it.

For example, what if we could pipeline a PGO optimized wasm file so that it was downloading, compiling, instantiating, and executing in a streaming process. The process could also raise events when a cold stub was hit allowing us to design an experience around startup.

camwest · 2017-12-14T23:53:34Z

@szilvaa feel free to chip in if I missed anything.

kmiller68 · 2017-12-15T00:09:45Z

This is certainly seems like a use case that seems we should have an answer for. I don't have any outstanding starting points ATM for a proposal unfortunately. I do have one comment on your "hot" to "cold" bridge paying the cost of Wasm->JS->Wasm every time, however.

I think if you made the bridge calls into indirect calls from a Table then you would only have to do a Wasm -> JS call once and it could stub its entry in the table with the "cold" Wasm function. At least in JSC, although I expect all engines do this, if we see a Wasm -> Wasm indirect call we will bypass the JS entrypoint code. Thus, only the first "cold" call will pay the Wasm->JS->Wasm boundary cost.

jfbastien · 2017-12-15T00:53:18Z

Twitter context, with @TheLarkInn saying that Webpack is looking at doing this 😁

Of course other tools doing it would be great.

One sad thing about inserting wasm->js->wasm calls where there used to only be wasm->wasm is that tools now need to re-write all i64 parameters to be i32 pairs.

xtuc · 2017-12-15T07:17:08Z

@camwest you mentioned the streaming process. WebAssembly supports instantiation using a stream (probably your HTTP request). Have you tried this approach?

alexp-sssup · 2017-12-15T11:25:31Z

In our Cheerp C++ compiler we are currently exploring a solution based on tagging functions/classes/namespaces at the C++ level (something like [[cheerp::hot]] or [[cheerp::first]], like we do already to automatically generate JS bridges for DOM interaction) to mark them for inclusion into a loader module. PGO based approach could also be considered.

lukewagner · 2017-12-18T17:28:04Z

Agreed that it would be cool if tools could help here.

@camwest Is this 29.6mb before Content-Encoding:gzip? Assuming the usual 3x reduction we see with gzip, is the download time of the ~10mb compressed payload problematic, or is it more the subsequent compilation time you see in release browsers? If it's the latter, that should be improving significantly in the coming months, especially if you use the streaming compilation API so that compilation can overlap download.

jfbastien · 2017-12-18T18:31:39Z

If it's the latter, that should be improving significantly in the coming months

Still, the fastest compilation will occur when we don't have to compile anything 😄

szilvaa · 2017-12-18T18:53:05Z

@lukewagner What we (I work with @camwest) are hoping is that streaming compilation can be further pipelined all the way to the "run" stage so in the end we could start running wasm before it is fully downloaded. We expect this would further improve performance provided that the code in the wasm is arranged such that the first bytes are the first to run (via PGO)
This would mimic what the virtual memory manager does on desktop: when you load an executable the code is simply mapped into the virtual address space (no i/o). Then as the address space is used the VMM receives the page-fault and brings the code in page by page.

lukewagner · 2017-12-18T19:54:34Z

@jfbastien Agreed.

@szilvaa I can see why that's attractive, but having an arbitrary synchronous wasm call block on the network seems to risk the app freezing (if the user does something outside the profiled path or if the network is extra slow) and is also at odds with the general non-blocking-io design of the web. Also, the network can be a lot slower than local i/o, so it's not quite so analogous to what native does when launching a local app.

RyanLamansky · 2017-12-18T20:22:02Z

Running WASM code before the file is fully downloaded is not possible with the current design: the download process must check for the presence of a data section (which is after the code section) before the instance can be returned.

If the implementation had an option to disable the data section, then there's potential.

A solution where WASM exports could be directly connected to imports without bridging through JavaScript seems ideal to me. It provides efficient solutions to this problem (IE, on-demand loading of rarely-used code) and enables 64-bit integers (and future types) to be directly transferred.

szilvaa · 2017-12-18T20:30:26Z

@RyanLamansky I don't quite understand how the WASM import/export mechanism avoids the problem that @lukewagner mentions above (i.e. arbitrary wasm call may be blocked on network I/O). The import dependency is either resolved at instantiation time in which case it really does not help at all with startup performance. Or it is resolved at runtime in which case the provider of the import may not be present yet so the call must block.

I think the only way to avoid "freezing" the UI thread is to run your WASM on a worker (which is what we do). This suggests that maybe this sort of pipelined code execution should only be available in a worker.

TheLarkInn · 2017-12-18T21:22:49Z

cc @sokra for coverage. This is scenario is something we'll likely discuss for webpack helping solve.

sokra · 2017-12-18T21:53:40Z

webpack will not help you with a big wasm file.

It supports Code Splitting with import() for wasm. So if you can (manually) split up your WASM into multiple pieces you will be able to load these pieces on demand. It's an async call (Promise), so you probably need to handle with in your wasm (Probably some callback called from JS on Promise completion).

This probably requires you to restructure your native code, at least on these boundaries where you want to load on demand. It's a kind of distributed architecture: Multiple wasm components communicate async over JS.

lukewagner · 2017-12-18T22:08:38Z

@szilvaa Yeah, I can imagine a pure toolchain solution almost working in workers; the main limitation is effective lack of new WebAssembly.Module on Chrome which means no synchronous compilation to handle the "call before downloaded+compiled" case.

Regarding "We don't currently have a good strategy for defining split points." in the OP, have you considered a coarse-grained strategy of splitting the app up into an exe with asynchronously-loaded DLLs? IIUC, Emscripten provides support for dynamic linking (where the exe and each dll turn into a .wasm) and, now that we have Table, it should be pretty efficient.

szilvaa · 2017-12-19T01:51:03Z

@lukewagner Yes, of course, we have considered this. In fact, the code already has exe/dlls break-up on windows/osx but these boundaries are not on the hot vs. cold code boundary for our current web scenarios.

But let's say we have a hot.wasm and a cold.wasm.

As far as I understand we couldn't use an import Table in hot.wasm because these imports would have to be satisfied at instantiation time (which defeats our purpose here). So we would have to create some sort of custom thunking mechanism that allows the imports to be delay loaded (runtime-loaded). @sokra Is that what webpack has built or building? Can you link me to some more info?

lukewagner · 2017-12-19T03:14:36Z

@szilvaa That's only a restriction for load time dynamic linking. Tables are fully mutable at runtime (via set()) and so the toolchain can implement dlopen after instantiation time. I'm not actually up to date on the state of toolchain support here, but I had thought it worked already.

sokra · 2017-12-19T08:19:58Z

@sokra Is that what webpack has built or building?

Yep, I imagined a JS bridge between two WASM modules with a async API inbetween. The bridge would use import() to on-demand-load the second WASM module on first use.

But @lukewagner's approach where JS only fills imports into a Table sounds also nice. I guess this results in fasterer WASM to WASM calls and you could use a sync interface.

sokra · 2017-12-19T08:22:55Z

Sync download + instanciation in WebWorkers doesn't look like a nice approach from UX to me. You basically block your complete native part while parts are downloaded.

lukewagner · 2017-12-19T15:20:56Z

I guess this results in fasterer WASM to WASM calls and you could use a sync interface.

Yes, it should basically be the same call as a plain pointer-to-function call which is going to be a factor faster than thunking through JS.

Sync download + instanciation in WebWorkers doesn't look like a nice approach from UX to me. You basically block your complete native part while parts are downloaded.

So it sounds like the current impl of dlopen in Emscripten does synchronous wasm compilation for bytes that are supposed to have been preloaded in the filesystem image. So that's not suitable for your use case here. I think instead you would want a function that takes a URL and C callback and then does instantiateStreaming(fetch(url)) under the hood. If that sounds right, I'd suggest filing an Emscripten issue; @kripken said it wouldn't be hard to add if someone wanted it.

ghost · 2018-01-10T17:04:29Z

Managing two wasm files in JavaScript feels like a pretty nasty hack since the boundary between the hot and cold wasm files would most likely involve crossing from wasm to js to wasm again.

Rather than communicating through JavaScript, a better approach may be to reload the entire application: ultimately, the application is binary data that can be modified with string concatenation. You can download a "hot" wasm file and then a patch: the difference between the "hot" wasm file and the the wasm file for the entire application. It should be possible to save the state from the hot application, and then and restore with the full application.

qm3ster · 2018-09-18T20:48:33Z

Or, you could streaming-instantiate the new module in a new WebWorker, and also instantiate the existing module, by passing the WebAssembly.Module as a Transferable, in the new worker.
This way they are in one context and can get optimized calls through a Table. And after you loaded your data into the new worker, just nuke the old one.

awtcode · 2018-10-02T17:42:16Z

@qm3ster , but the application would still be running midway when the data is loaded into the new worker so wouldn't we need to exit the application first?

qm3ster · 2018-10-02T17:56:01Z

@awtcode not before the application is finished loading in the new worker.
The old worker could still do things like rendering even as the new worker is loading the data, it should just avoid mutable operations.

derofim mentioned this issue Jun 6, 2019

WASM: functions count exceeds internal limit / too many functions emscripten-core/emscripten#8755

Closed

littledivy mentioned this issue Mar 22, 2021

long delay before and after script execution DjDeveloperr/deno-canvas#5

Open

jakobhellermann mentioned this issue Apr 22, 2022

Split the wasm file bevyengine/bevy#4555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking up large (20 MB+) .wasm files #1166

Breaking up large (20 MB+) .wasm files #1166

camwest commented Dec 14, 2017

camwest commented Dec 14, 2017

kmiller68 commented Dec 15, 2017

jfbastien commented Dec 15, 2017 •

edited

xtuc commented Dec 15, 2017

alexp-sssup commented Dec 15, 2017

lukewagner commented Dec 18, 2017

jfbastien commented Dec 18, 2017

szilvaa commented Dec 18, 2017

lukewagner commented Dec 18, 2017

RyanLamansky commented Dec 18, 2017

szilvaa commented Dec 18, 2017 •

edited

TheLarkInn commented Dec 18, 2017

sokra commented Dec 18, 2017

lukewagner commented Dec 18, 2017

szilvaa commented Dec 19, 2017

lukewagner commented Dec 19, 2017

sokra commented Dec 19, 2017

sokra commented Dec 19, 2017

lukewagner commented Dec 19, 2017

ghost commented Jan 10, 2018 •

edited by ghost

qm3ster commented Sep 18, 2018

awtcode commented Oct 2, 2018

qm3ster commented Oct 2, 2018

Breaking up large (20 MB+) .wasm files #1166

Breaking up large (20 MB+) .wasm files #1166

Comments

camwest commented Dec 14, 2017

Problem

Tool based Solution

Browser based Solution

camwest commented Dec 14, 2017

kmiller68 commented Dec 15, 2017

jfbastien commented Dec 15, 2017 • edited

xtuc commented Dec 15, 2017

alexp-sssup commented Dec 15, 2017

lukewagner commented Dec 18, 2017

jfbastien commented Dec 18, 2017

szilvaa commented Dec 18, 2017

lukewagner commented Dec 18, 2017

RyanLamansky commented Dec 18, 2017

szilvaa commented Dec 18, 2017 • edited

TheLarkInn commented Dec 18, 2017

sokra commented Dec 18, 2017

lukewagner commented Dec 18, 2017

szilvaa commented Dec 19, 2017

lukewagner commented Dec 19, 2017

sokra commented Dec 19, 2017

sokra commented Dec 19, 2017

lukewagner commented Dec 19, 2017

ghost commented Jan 10, 2018 • edited by ghost

qm3ster commented Sep 18, 2018

awtcode commented Oct 2, 2018

qm3ster commented Oct 2, 2018

jfbastien commented Dec 15, 2017 •

edited

szilvaa commented Dec 18, 2017 •

edited

ghost commented Jan 10, 2018 •

edited by ghost