Skip to content
This repository has been archived by the owner on Aug 17, 2022. It is now read-only.

Generalize Module Types to Module Linking #3

Merged
merged 33 commits into from
Jun 9, 2020
Merged

Conversation

lukewagner
Copy link
Member

As is, the Module Types proposal tweaks the spec-internal definition of module/instance types and gives them a text format so that module/instance types can be used in toolchains, but there are no actual changes implied to a wasm engine.

This PR significantly extends the proposal to put module/instance types to work in wasm engines, extending wasm with module/instance definitions, imports and exports. These features unlock a set of use cases that are described in the Explainer in the PR.

After some initial discussion in the PR, I was thinking of presenting the proposal at a biweekly CG meeting (probably not enough time before the one in two days).

Co-Authored-By: Thomas Lively <7121787+tlively@users.noreply.github.com>
Copy link
Collaborator

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @lukewagner, this is a pretty exciting proposal!

I'm personally wrestling a lot with how things might conventionally work out in toolchains/runtimes (e.g. precisely which tool should be responsible for doing what). There's a lot of flexibility in this proposal but I think that's a good thing. I was wondering for a bit if we should try to establish conventions in the explainer, such as "what is a final wasm expected to look like"? Should it import nothing but host functionality (e.g. wasi) to be easily instantiable? Should it import host functionality and module? (but then is there a convention for how modules will be named?) Should it import instances and rely on the wasm runtime to do the instantiate-the-DAG-bits? (which also leads to more naming convention questions). Overall though I think it's probably fine to define what's necessary to implement these schemes in this proposal and leave the conventions to community documentation/tooling.

proposals/module-types/Explainer.md Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
@lukewagner
Copy link
Member Author

@alexcrichton Yeah, I totally feel you on the remaining lack of clarity for what precisely the toolchain should produce at each stage, particularly when we get to the "bundling" stage. In this doc, I mostly just wanted to show how module imports could be useful, but lots more work is necessary I expect to establish a proper tooling convention.

@lachlansneff
Copy link

I could be reading it wrong, but this doesn't have dynamic instances, right? That'd be extremely useful I think, especially for apis like WASI, where you could get handed a reference to an instance that does file operations, for example. And a wasm module could easily sandbox another wasm module by just giving it references to virtualized/fake WASI modules.

proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
@lukewagner
Copy link
Member Author

lukewagner commented May 1, 2020

@Ms2ger Thanks for the great suggestions and comments!

@lachlansneff That's right, this proposal is just focused on the minimal extension to enable, essentially, load-time dynamic linking / virtualization. First-class runtime instances are interesting, but for the reasons in the Explainer, I'm mostly shying away from that for now b/c it leads to a full-on GC requirement once you have first-class instance.instantiate.

Copy link
Contributor

@Ms2ger Ms2ger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answers! A couple more questions/comments below.

proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
Luke Wagner and others added 2 commits May 4, 2020 16:36
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
proposals/module-types/Explainer.md Show resolved Hide resolved
proposals/module-types/Explainer.md Outdated Show resolved Hide resolved
@lukewagner
Copy link
Member Author

Note: I switched the name of the proposal back to "Module Linking"; which is what I called it informally for a while, rather than naming the proposal after one of its constituent features (Module Imports). It's more natural in discussion.

@binji
Copy link
Member

binji commented May 18, 2020

After some initial discussion in the PR, I was thinking of presenting the proposal at a biweekly CG meeting (probably not enough time before the one in two days).

Is this far enough along now to present at the May 26th meeting?

@lukewagner
Copy link
Member Author

@binji Yes, it feels like we're narrowing in, thanks.

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

export of `libc` (analogous to `malloc`, but for allocating from the global
`funcref` table) from the shared library's `start` function. Elements can
then be written into the table (using [bulk memory operations]) at the allocated
offset and their indices written into the exported `i32`s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach! In the current llvm + emscripten approach we take advantage of the fact that each shared library knows statically now many slots it need, and each library can import __table_base and export base-relative offset of each public function.

So I think the the table slots must be dynamically allocated statement is not totally true. More like the table segment base address must be allocated dynamically.

The same this is true for the data segment base addresses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good point; I'll soften the must wording here.

So I was imagining the __table_base strategy you describe when I wrote "(In theory, more efficient schemes are possible when the main program has more static knowledge of its shared libraries.)" below because, iiuc, for this to work the main module has to know statically how many slots the library needs before instantiating it. (I suppose the main module could also probe for this dynamically (via Module.customSections() or something else), but the Module Linking proposal doesn't have the ability to do that from pure wasm.)

One thing I was concerned about is, at least with the static strategy, this would mean minor semver version updates could break main modules (in a rather silent way too). A dynamic probing strategy could avoid this though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current llvm+emscripten solution is dynamic and does involve a tiny custom section at the start of a shared library that specifies the number of slots it needs and the number of bytes of static data (along with alignment).

I just realized that in theory the custom section could be avoided by looking that the segment lengths.. but then that wouldn't allow for bss / empty table slots.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, makes total sense. In order to use Module Linking (which doesn't have the runtime ability to probe custom sections during instantiation), do you think it'd be possible to use the "a module allocates its own elem/data space" approach described here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I see.. so in that case the static data payload would live in a passive segment and get loaded into a location returned from malloc? I guess that works! Then it can take that same address an store it in a private global that can be used to calculate load/store offsets (an non-exported internal version of __memory_base).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, yes! Cool.

The benefit of instance imports is that they allow potentially-large groups of
fields to be passed around as a single unit, which can be useful when linking
significant dependencies. Also, practically, instance imports allow import
strings to be factored in the text and binary formats, reducing duplication.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yay!

@@ -0,0 +1,524 @@
# Shared-Everything Dynamic Linking Example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool, and I would be happy if tools like wasm-ld could one day emit modules in this format when building and using dynamic libraries.

One concern is that it seems like you have only addressed function symbols. Modules in llvm also have global data and corresponding data symbols. In the current llvm+emscripten model we deal with this in the following way: Each shared library has its own data and elem segments which are created at static link time. The key is that these segments have dynamic base addresses based on wasm global which are imported as __table_base and __memory_base.

Data symbols are then imported and exported just like function symbols in this proposal. When imported, data address are expected to be absolute. When exported data addresses are assumed to be relative to the module's __memory_base.

In addition to this basic use of data symbols there is also the problem the relocations which are required in the data section. Unlike with the code section we have not found a way to avoid these. For example:

extern int foo;
int* bar = &foo;

The result of compiling this code into a shared library is that it allocates 4 bytes of static data along with an associated relocation entry. In the current llvm+emscripten model these relocations are turned into generated code that runs during the start function so they are effectively applied by the module itself, rather than some outside dynamic linker. This simplifies the dynamic linker at the expense of some codegen performed by wasm-ld. Its also means we don't have to spec any kind of format for relocations in the shared library format, since we leave it all up to the module itself to self-relocate on startup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this example doesn't mention data symbols, but also, yes, they could be handled quite symmetrically to functions (particularly exported function pointer identities). Do you think it's worth adding another segment below the "Function Pointer Identity" section mentioning these cases and saying it's symmetric? Are there any hard cases you think aren't addressed by such a scheme?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly think that it worth specifying how data symbols might imported and exported in this scheme. This I think this will naturally force us to define how modules can include and use their own static data.

Perhaps we could illustrate this by adding a string constant to the example that one of the libraries exports to the main program as a data symbol?

I fear it might add a fair amount of complexity, and I don't want to block this PR if you feel like you want to get something landed and then iterate? I'm saying that partly because I am aware my comments are coming quite a late in the discussion here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, happy to add another little section with an example. I'll try to get to that tomorrow.

@lukewagner
Copy link
Member Author

Merging after CG poll

@lukewagner lukewagner merged commit 997a5d6 into master Jun 9, 2020
@lukewagner lukewagner deleted the module-imports branch June 9, 2020 17:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.