-
Notifications
You must be signed in to change notification settings - Fork 13
Generalize Module Types to Module Linking #3
Conversation
Co-Authored-By: Thomas Lively <7121787+tlively@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up @lukewagner, this is a pretty exciting proposal!
I'm personally wrestling a lot with how things might conventionally work out in toolchains/runtimes (e.g. precisely which tool should be responsible for doing what). There's a lot of flexibility in this proposal but I think that's a good thing. I was wondering for a bit if we should try to establish conventions in the explainer, such as "what is a final wasm expected to look like"? Should it import nothing but host functionality (e.g. wasi) to be easily instantiable? Should it import host functionality and module? (but then is there a convention for how modules will be named?) Should it import instances and rely on the wasm runtime to do the instantiate-the-DAG-bits? (which also leads to more naming convention questions). Overall though I think it's probably fine to define what's necessary to implement these schemes in this proposal and leave the conventions to community documentation/tooling.
proposals/module-types/Example-SharedEverythingDynamicLinking.md
Outdated
Show resolved
Hide resolved
@alexcrichton Yeah, I totally feel you on the remaining lack of clarity for what precisely the toolchain should produce at each stage, particularly when we get to the "bundling" stage. In this doc, I mostly just wanted to show how module imports could be useful, but lots more work is necessary I expect to establish a proper tooling convention. |
I could be reading it wrong, but this doesn't have dynamic instances, right? That'd be extremely useful I think, especially for apis like WASI, where you could get handed a reference to an instance that does file operations, for example. And a wasm module could easily sandbox another wasm module by just giving it references to virtualized/fake WASI modules. |
proposals/module-types/Example-SharedEverythingDynamicLinking.md
Outdated
Show resolved
Hide resolved
proposals/module-types/Example-SharedEverythingDynamicLinking.md
Outdated
Show resolved
Hide resolved
proposals/module-types/Example-SharedEverythingDynamicLinking.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Ms2ger <Ms2ger@igalia.com>
@Ms2ger Thanks for the great suggestions and comments! @lachlansneff That's right, this proposal is just focused on the minimal extension to enable, essentially, load-time dynamic linking / virtualization. First-class runtime instances are interesting, but for the reasons in the Explainer, I'm mostly shying away from that for now b/c it leads to a full-on GC requirement once you have first-class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the answers! A couple more questions/comments below.
Co-authored-by: Ms2ger <Ms2ger@igalia.com>
Note: I switched the name of the proposal back to "Module Linking"; which is what I called it informally for a while, rather than naming the proposal after one of its constituent features (Module Imports). It's more natural in discussion. |
Is this far enough along now to present at the May 26th meeting? |
@binji Yes, it feels like we're narrowing in, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: YAMAMOTO Yuji <whosekiteneverfly@gmail.com>
export of `libc` (analogous to `malloc`, but for allocating from the global | ||
`funcref` table) from the shared library's `start` function. Elements can | ||
then be written into the table (using [bulk memory operations]) at the allocated | ||
offset and their indices written into the exported `i32`s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting approach! In the current llvm + emscripten approach we take advantage of the fact that each shared library knows statically now many slots it need, and each library can import __table_base
and export base-relative offset of each public function.
So I think the the table slots must be dynamically allocated
statement is not totally true. More like the table segment base address must be allocated dynamically
.
The same this is true for the data segment base addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good point; I'll soften the must
wording here.
So I was imagining the __table_base
strategy you describe when I wrote "(In theory, more efficient schemes are possible when the main program has more static knowledge of its shared libraries.)" below because, iiuc, for this to work the main module has to know statically how many slots the library needs before instantiating it. (I suppose the main module could also probe for this dynamically (via Module.customSections()
or something else), but the Module Linking proposal doesn't have the ability to do that from pure wasm.)
One thing I was concerned about is, at least with the static strategy, this would mean minor semver version updates could break main modules (in a rather silent way too). A dynamic probing strategy could avoid this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the current llvm+emscripten solution is dynamic and does involve a tiny custom section at the start of a shared library that specifies the number of slots it needs and the number of bytes of static data (along with alignment).
I just realized that in theory the custom section could be avoided by looking that the segment lengths.. but then that wouldn't allow for bss / empty table slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, makes total sense. In order to use Module Linking (which doesn't have the runtime ability to probe custom sections during instantiation), do you think it'd be possible to use the "a module allocates its own elem/data space" approach described here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I see.. so in that case the static data payload would live in a passive segment and get loaded into a location returned from malloc? I guess that works! Then it can take that same address an store it in a private global that can be used to calculate load/store offsets (an non-exported internal version of __memory_base
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, yes! Cool.
The benefit of instance imports is that they allow potentially-large groups of | ||
fields to be passed around as a single unit, which can be useful when linking | ||
significant dependencies. Also, practically, instance imports allow import | ||
strings to be factored in the text and binary formats, reducing duplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yay!
proposals/module-types/Example-SharedEverythingDynamicLinking.md
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,524 @@ | |||
# Shared-Everything Dynamic Linking Example | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool, and I would be happy if tools like wasm-ld could one day emit modules in this format when building and using dynamic libraries.
One concern is that it seems like you have only addressed function symbols. Modules in llvm also have global data and corresponding data symbols. In the current llvm+emscripten model we deal with this in the following way: Each shared library has its own data and elem segments which are created at static link time. The key is that these segments have dynamic base addresses based on wasm global which are imported as __table_base
and __memory_base
.
Data symbols are then imported and exported just like function symbols in this proposal. When imported, data address are expected to be absolute. When exported data addresses are assumed to be relative to the module's __memory_base
.
In addition to this basic use of data symbols there is also the problem the relocations which are required in the data section. Unlike with the code section we have not found a way to avoid these. For example:
extern int foo;
int* bar = &foo;
The result of compiling this code into a shared library is that it allocates 4 bytes of static data along with an associated relocation entry. In the current llvm+emscripten model these relocations are turned into generated code that runs during the start function so they are effectively applied by the module itself, rather than some outside dynamic linker. This simplifies the dynamic linker at the expense of some codegen performed by wasm-ld. Its also means we don't have to spec any kind of format for relocations in the shared library format, since we leave it all up to the module itself to self-relocate on startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this example doesn't mention data symbols, but also, yes, they could be handled quite symmetrically to functions (particularly exported function pointer identities). Do you think it's worth adding another segment below the "Function Pointer Identity" section mentioning these cases and saying it's symmetric? Are there any hard cases you think aren't addressed by such a scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly think that it worth specifying how data symbols might imported and exported in this scheme. This I think this will naturally force us to define how modules can include and use their own static data.
Perhaps we could illustrate this by adding a string constant to the example that one of the libraries exports to the main program as a data symbol?
I fear it might add a fair amount of complexity, and I don't want to block this PR if you feel like you want to get something landed and then iterate? I'm saying that partly because I am aware my comments are coming quite a late in the discussion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, happy to add another little section with an example. I'll try to get to that tomorrow.
Merging after CG poll |
As is, the Module Types proposal tweaks the spec-internal definition of module/instance types and gives them a text format so that module/instance types can be used in toolchains, but there are no actual changes implied to a wasm engine.
This PR significantly extends the proposal to put module/instance types to work in wasm engines, extending wasm with module/instance definitions, imports and exports. These features unlock a set of use cases that are described in the Explainer in the PR.
After some initial discussion in the PR, I was thinking of presenting the proposal at a biweekly CG meeting (probably not enough time before the one in two days).