-
Notifications
You must be signed in to change notification settings - Fork 0
Linking
This page describes how asm.js emitted by emscripten can be linked. asm.js format is very structured, and hence easier to link, so we have focused on that. Note though that asm.js code generation is off in unoptimized code (-O0 or no -Ox specified), so you will not be able to link unoptimized code. (However there is also some linking support for non-asm.js code, which is older and less recommended, see notes below for dlopen.)
Two forms of linking are supported, static linking and dynamic code loading (dlopen), both of which are described below.
Emscripten has support for static linking of asm.js code, using emlink.py. emlink takes two compiled codebases and generates a combined codebase. This is very similar to static linking in general, but different in some ways, because we are linking JavaScript here, and specifically asm.js modules of code. A general overview of using emlink is as follows:
- Compile one codebase using
-s MAIN_MODULE=1, this is the "main module" (see below). - Compile another codebase using
-s SIDE_MODULE=1, this is the "side module". - Link them using
emlink.py main.js side.js output.js
While this example talks about two modules being linked, it is possible to link several. The result of linking a main module with a side module is a main module, which can then be linked with another side module.
It is important to note that static linking of JS using emlink generates suboptimal results. The best results will always be achieved when building all the code into one big bitcode file and compiling that as a whole into JavaScript, because that allows whole-program optimizations that can be very important. Static linking can be useful during development, however: if you just modify a small part of your project and want to rebuild it, linking the changed part (that you just compiled again) with the rest of the code (that is unchanged) will be far faster than rebuilding the whole thing. For example, building Bullet from bitcode to JS takes over 10 seconds, but linking Bullet statically takes less than half a second.
Code that is intended to be linked is called a "linkable module". There are two kinds, main modules and side modules. We can only link a main module with a side module, and no other combination. The output of linking is another main module (which can then be linked with another side module and so forth).
Main modules are code that is runnable. If it still has missing symbols (that should be linked in later), then it will fail when it tries to use them, of course, but otherwise it is usable. Main modules do not have any special relocation information embedded in them, they are very similar to normal Emscripten-generated code. The main difference is that some optimizations are disabled when building them, for example full dead-code elimination (which could remove things the other code to be linked would need) and function name minification (which would prevent linking from identifying which functions to link to what).
Side modules are code that is only intended to be linked to a main module, it cannot be run by itself. It contains relocation information, which allows us to place its globals and function pointers into the proper places during linking. Side modules disable linking of standard libraries (libc, libc++, etc.), they expect those to be present in the main module they will be linked with. Finally, side modules, like main modules, disable some optimizations that enable linking to work.
A good use case for static linking is a large codebase that you are working on, but only modifying a small number of files, and want to rapidly iterate and not wait for entire builds. A recommended workflow for that is as follows:
- Put the bulk of the project, that is not changing all the time, into the main module. That means compiling the bitcode for those files into JS using
-s MAIN_MODULE=1 -o main.js. - Put the rest of the project, the small part you are compiling a lot, into the side module. That means compiling the bitcode for its files into JS using
-s SIDE_MODULE=1 -o side.js. - Every time you do an iteration after changing some of those files, you rebuild the side module sources into a new build of the side module, as just described.
- Run
emlink.py main.js side.js all.js, which links the modules and generates all.js. - Use all.js in the same way as you would use a full rebuild of the whole project.
- When, less frequently, you want to see a fully-optimized build of minimal size, build all the code together into one big bitcode file and compile that into JS (not as a main module or a side module).
Note that we could reverse the roles of the main and side modules in the above workflow, and it would still work. However, it is a good idea to make the main module the one that changes less, since as mentioned above the standard libraries are linked in and compiled to JS in that one.
Note also that the main() function can be either in the main module or the side module, don't be confused by the term "main module" (it is the "main" module in that the other will be relocated "against" it, and that the system libs are in it.).
It is important to use the same build flags on both the main and the side module. For example, if you build with --js-library x.js then that should be applied when building all modules (while the actual code will only show up in the main module, it is needed during compile time to generate proper calls into those functions, so the side module needs to know about it as well). Likewise if you build with assertions for example (-s ASSERTIONS=1), or any other flag, then you should do so on both the main and side modules.
- GL emulation errors: The main module includes all the JS library code (and the side module includes none of it); this approach makes linking of JS library code trivial, and the downside of code size is not that big in a large project anyhow. However, it does mean we include GL emulation code, which can confuse some types of GL-using code. It is recommended to build the main module with
-s DISABLE_GL_EMULATION=1unless you specifically know that you need emulation of older GL features. - System libs not being included: Side modules do not link in standard libraries like libc and libc++, as mentioned above. As a consequence, if for example you use libc++ in the side module but not in the main module, neither will link in libc++ and the linked code will fail. To get around this, you can build the main module with
EMCC_FORCE_STDLIBS=1to force inclusion of all standard libs; a more refined approach is to build the side module with-vin order to see which system libs are actually needed - look forincluding lib[...]messages - and building the main module with something likeEMCC_FORCE_STDLIBS=libcxxabi(if you need libcxxabi). Note that you only need the first library mentioned, as each depends on the ones after it so they will be auto-included anyhow.
- Minification of function names is not done on linkable modules (neither main modules nor side modules), which increases code size. We could in principle do minification after linking of all modules, but this is not implemented yet.
- The static linker links asm.js modules, and does not have all the rich metadata available to a normal linker. As a consequence, we duplicate some code and globals that a more optimal linker could coalesce. As mentioned above, the only way to get the best results is to build all the code together and not do static linking of JS.
Initial dynamic linking support using dlopen() is present for asm.js. Similarly to static linking as discussed above, create a main module and a side module. The main module is the main program, and the side module is the shared library. You should build both with the additional flag -s DLOPEN_SUPPORT=1. The main module should then be able to dlopen the shared library, which should be accessible to it as a file (so you should preload or embed it etc.).
There are some current limitations of dlopen support for asm.js:
- C++ exceptions are not supported
- Functions returning 64-bit values will break (return incorrect values) when called across modules.
Some things to take into account when using dlopen with asm.js:
- More stack space is used than normal: each module has its own stack.
- You need to export (EXPORTED_FUNCTIONS) things in the main module that the side module (the one dlopened) will use, like say malloc (which is already in EXPORTED_FUNCTIONS by default, need to leave it there and add other stuff you might need).
- You need to make sure LLVM does not eliminate stuff you want to access through
dlsym(). UseEMSCRIPTEN_KEEPALIVEfromemscripten.hon the functions and globals you want to export from the shared library. - Function pointer calls become slower with
DLOPEN_SUPPORT, because they need to be able to cross modules. This is not much optimized yet.
As always, a good source of example code is in the test suite, search for DLOPEN_SUPPORT, MAIN_MODULE, etc. Code in the test suite is always guaranteed to work.
Note: In non-asm code, there is an older option of BUILD_AS_SHARED_LIB which has fewer limitations. See the test runner for examples.
This allows caching of libraries on the client, and linking them with code that is sent from the server that updates less frequently (and the result is then cached too).
We can almost do this using the current static linking code. But we would need to convert a little python code to JS, figure out how to do reasonable dead code elimination despite linking, and would need to consider what to do with minification.