Generalize Module Types to Module Linking #3

lukewagner · 2020-04-26T23:26:55Z

As is, the Module Types proposal tweaks the spec-internal definition of module/instance types and gives them a text format so that module/instance types can be used in toolchains, but there are no actual changes implied to a wasm engine.

This PR significantly extends the proposal to put module/instance types to work in wasm engines, extending wasm with module/instance definitions, imports and exports. These features unlock a set of use cases that are described in the Explainer in the PR.

After some initial discussion in the PR, I was thinking of presenting the proposal at a biweekly CG meeting (probably not enough time before the one in two days).

proposals/module-types/Explainer.md

Co-Authored-By: Thomas Lively <7121787+tlively@users.noreply.github.com>

alexcrichton

Thanks for writing this up @lukewagner, this is a pretty exciting proposal!

I'm personally wrestling a lot with how things might conventionally work out in toolchains/runtimes (e.g. precisely which tool should be responsible for doing what). There's a lot of flexibility in this proposal but I think that's a good thing. I was wondering for a bit if we should try to establish conventions in the explainer, such as "what is a final wasm expected to look like"? Should it import nothing but host functionality (e.g. wasi) to be easily instantiable? Should it import host functionality and module? (but then is there a convention for how modules will be named?) Should it import instances and rely on the wasm runtime to do the instantiate-the-DAG-bits? (which also leads to more naming convention questions). Overall though I think it's probably fine to define what's necessary to implement these schemes in this proposal and leave the conventions to community documentation/tooling.

proposals/module-types/Example-SharedEverythingDynamicLinking.md

proposals/module-types/Explainer.md

lukewagner · 2020-04-28T00:49:14Z

@alexcrichton Yeah, I totally feel you on the remaining lack of clarity for what precisely the toolchain should produce at each stage, particularly when we get to the "bundling" stage. In this doc, I mostly just wanted to show how module imports could be useful, but lots more work is necessary I expect to establish a proper tooling convention.

proposals/module-types/Explainer.md

lachlansneff · 2020-04-30T23:59:57Z

I could be reading it wrong, but this doesn't have dynamic instances, right? That'd be extremely useful I think, especially for apis like WASI, where you could get handed a reference to an instance that does file operations, for example. And a wasm module could easily sandbox another wasm module by just giving it references to virtualized/fake WASI modules.

proposals/module-types/Example-SharedEverythingDynamicLinking.md

proposals/module-types/Explainer.md

proposals/module-types/Example-SharedEverythingDynamicLinking.md

proposals/module-types/Explainer.md

Co-authored-by: Ms2ger <Ms2ger@igalia.com>

lukewagner · 2020-05-01T22:05:52Z

@Ms2ger Thanks for the great suggestions and comments!

@lachlansneff That's right, this proposal is just focused on the minimal extension to enable, essentially, load-time dynamic linking / virtualization. First-class runtime instances are interesting, but for the reasons in the Explainer, I'm mostly shying away from that for now b/c it leads to a full-on GC requirement once you have first-class instance.instantiate.

Ms2ger

Thanks for the answers! A couple more questions/comments below.

proposals/module-types/Explainer.md

Co-authored-by: Ms2ger <Ms2ger@igalia.com>

proposals/module-types/Explainer.md

…on/Code Sections

lukewagner · 2020-05-11T18:00:21Z

Note: I switched the name of the proposal back to "Module Linking"; which is what I called it informally for a while, rather than naming the proposal after one of its constituent features (Module Imports). It's more natural in discussion.

Fix typo in instance type example

binji · 2020-05-18T20:32:26Z

After some initial discussion in the PR, I was thinking of presenting the proposal at a biweekly CG meeting (probably not enough time before the one in two days).

Is this far enough along now to present at the May 26th meeting?

…aving

lukewagner · 2020-05-18T23:01:57Z

@binji Yes, it feels like we're narrowing in, thanks.

rossberg

LGTM!

proposals/module-types/Example-LinkTimeVirtualization.md

Co-authored-by: YAMAMOTO Yuji <whosekiteneverfly@gmail.com>

…d imports

sbc100 · 2020-04-27T01:48:31Z

proposals/module-types/Example-SharedEverythingDynamicLinking.md

+export of `libc` (analogous to `malloc`, but for allocating from the global
+`funcref` table) from the shared library's `start` function. Elements can
+then be written into the table (using [bulk memory operations]) at the allocated
+offset and their indices written into the exported `i32`s.


Interesting approach! In the current llvm + emscripten approach we take advantage of the fact that each shared library knows statically now many slots it need, and each library can import __table_base and export base-relative offset of each public function.

So I think the the table slots must be dynamically allocated statement is not totally true. More like the table segment base address must be allocated dynamically.

The same this is true for the data segment base addresses.

Ah yes, good point; I'll soften the must wording here.

So I was imagining the __table_base strategy you describe when I wrote "(In theory, more efficient schemes are possible when the main program has more static knowledge of its shared libraries.)" below because, iiuc, for this to work the main module has to know statically how many slots the library needs before instantiating it. (I suppose the main module could also probe for this dynamically (via Module.customSections() or something else), but the Module Linking proposal doesn't have the ability to do that from pure wasm.)

One thing I was concerned about is, at least with the static strategy, this would mean minor semver version updates could break main modules (in a rather silent way too). A dynamic probing strategy could avoid this though.

Yes, the current llvm+emscripten solution is dynamic and does involve a tiny custom section at the start of a shared library that specifies the number of slots it needs and the number of bytes of static data (along with alignment).

I just realized that in theory the custom section could be avoided by looking that the segment lengths.. but then that wouldn't allow for bss / empty table slots.

Cool, makes total sense. In order to use Module Linking (which doesn't have the runtime ability to probe custom sections during instantiation), do you think it'd be possible to use the "a module allocates its own elem/data space" approach described here?

Sure I see.. so in that case the static data payload would live in a passive segment and get loaded into a location returned from malloc? I guess that works! Then it can take that same address an store it in a private global that can be used to calculate load/store offsets (an non-exported internal version of __memory_base).

Exactly, yes! Cool.

sbc100 · 2020-05-23T02:39:57Z

proposals/module-types/Explainer.md

+The benefit of instance imports is that they allow potentially-large groups of
+fields to be passed around as a single unit, which can be useful when linking
+significant dependencies. Also, practically, instance imports allow import
+strings to be factored in the text and binary formats, reducing duplication.


proposals/module-types/Example-SharedEverythingDynamicLinking.md

sbc100 · 2020-05-27T21:19:01Z

proposals/module-types/Example-SharedEverythingDynamicLinking.md

@@ -0,0 +1,524 @@
+# Shared-Everything Dynamic Linking Example
+


This is very cool, and I would be happy if tools like wasm-ld could one day emit modules in this format when building and using dynamic libraries.

One concern is that it seems like you have only addressed function symbols. Modules in llvm also have global data and corresponding data symbols. In the current llvm+emscripten model we deal with this in the following way: Each shared library has its own data and elem segments which are created at static link time. The key is that these segments have dynamic base addresses based on wasm global which are imported as __table_base and __memory_base.

Data symbols are then imported and exported just like function symbols in this proposal. When imported, data address are expected to be absolute. When exported data addresses are assumed to be relative to the module's __memory_base.

In addition to this basic use of data symbols there is also the problem the relocations which are required in the data section. Unlike with the code section we have not found a way to avoid these. For example:

extern int foo; int* bar = &foo;

The result of compiling this code into a shared library is that it allocates 4 bytes of static data along with an associated relocation entry. In the current llvm+emscripten model these relocations are turned into generated code that runs during the start function so they are effectively applied by the module itself, rather than some outside dynamic linker. This simplifies the dynamic linker at the expense of some codegen performed by wasm-ld. Its also means we don't have to spec any kind of format for relocations in the shared library format, since we leave it all up to the module itself to self-relocate on startup.

You're right, this example doesn't mention data symbols, but also, yes, they could be handled quite symmetrically to functions (particularly exported function pointer identities). Do you think it's worth adding another segment below the "Function Pointer Identity" section mentioning these cases and saying it's symmetric? Are there any hard cases you think aren't addressed by such a scheme?

I certainly think that it worth specifying how data symbols might imported and exported in this scheme. This I think this will naturally force us to define how modules can include and use their own static data.

Perhaps we could illustrate this by adding a string constant to the example that one of the libraries exports to the main program as a data symbol?

I fear it might add a fair amount of complexity, and I don't want to block this PR if you feel like you want to get something landed and then iterate? I'm saying that partly because I am aware my comments are coming quite a late in the discussion here.

Yeah, happy to add another little section with an example. I'll try to get to that tomorrow.

proposals/module-types/Example-SharedEverythingDynamicLinking.md

lukewagner · 2020-06-09T17:09:09Z

Merging after CG poll

First draft for feedback

6e8eb1f

tlively reviewed Apr 27, 2020

View reviewed changes

proposals/module-types/Explainer.md Outdated Show resolved Hide resolved

tlively reviewed Apr 27, 2020

View reviewed changes

proposals/module-types/Explainer.md Outdated Show resolved Hide resolved

tlively reviewed Apr 27, 2020

View reviewed changes

proposals/module-types/Explainer.md Outdated Show resolved Hide resolved

Apply suggestions from code review

d6649ad

Co-Authored-By: Thomas Lively <7121787+tlively@users.noreply.github.com>

alexcrichton reviewed Apr 27, 2020

View reviewed changes

sokra reviewed Apr 28, 2020

View reviewed changes

proposals/module-types/Explainer.md Show resolved Hide resolved

proposals/module-types/Explainer.md Show resolved Hide resolved

lukewagner mentioned this pull request Apr 30, 2020

References to modules/instances? #4

Closed

Ms2ger reviewed May 1, 2020

View reviewed changes

Luke Wagner and others added 4 commits May 1, 2020 16:12

Fix libimg.wat code

9a5adc5

Mention ref.memory, ref.table and ref.global

81b0e06

Apply Ms2ger's suggestions

32e815d

Co-authored-by: Ms2ger <Ms2ger@igalia.com>

Address Ms2ger's feedback

52310ca

Ms2ger reviewed May 4, 2020

View reviewed changes

Luke Wagner and others added 2 commits May 4, 2020 16:36

Address Ms2ger's second round of feedback

a179a06

Apply suggestions from Ms2ger

2c20546

Co-authored-by: Ms2ger <Ms2ger@igalia.com>

rossberg reviewed May 5, 2020

View reviewed changes

Luke Wagner added 6 commits May 5, 2020 13:14

Tweak Module Imports intro

06e287d

Reframe (exports) as 'zero-level (export)'

c7bcd1f

Add missing typedef to Summary section

86bbfc5

Nuance GC commentary

85a4cec

Reword wasm merging para

64b42fb

Refine discussion of aliases

c91c611

binji mentioned this pull request May 6, 2020

Take Module-Types proposal as a Dependency? WebAssembly/conditional-sections#22

Open

Luke Wagner added 3 commits May 6, 2020 23:02

Remove all the first-class bits

137c1b8

Switch from instance flattening to explicit aliases

9fd79ed

Split Module Section into Module/ModuleCode Sections to mirror Functi…

cb5c5cb

…on/Code Sections

Luke Wagner and others added 5 commits May 11, 2020 13:24

Tweak wording of instance definition rule

4ad7309

Fix typo in instance type example

5a893f0

Merge pull request #5 from acfoltzer/patch-1

07ad0a4

Fix typo in instance type example

Imports and exports of ordered in the type, but subtyping is permissive

33367d1

Tweak wording

74bc689

Put aliases into their own section and allow flexible section interle…

73950e7

…aving

sunfishcode mentioned this pull request May 18, 2020

Broken link in DesignPrinciples.md WebAssembly/WASI#269

Closed

rossberg approved these changes May 19, 2020

View reviewed changes

igrep reviewed May 19, 2020

View reviewed changes

proposals/module-types/Example-LinkTimeVirtualization.md Outdated Show resolved Hide resolved

lukewagner and others added 5 commits May 19, 2020 10:47

Include igrep's fix

3469268

Co-authored-by: YAMAMOTO Yuji <whosekiteneverfly@gmail.com>

Syntactically distinguish parent vs instance aliases

7876a71

Add zero-level exports

40e3f9c

Update the Summary to reflect recent changes

73f9952

Add subtype relation between single-level instance and two-level fiel…

f607aae

…d imports

sbc100 mentioned this pull request May 22, 2020

[Proposal] Extend wasi to be able to communicate with the embedder WebAssembly/WASI#280

Closed

lukewagner mentioned this pull request May 25, 2020

Add agenda item for Module Types / Linking WebAssembly/meetings#566

Merged

sunfishcode mentioned this pull request May 26, 2020

Add support for multi-call executables WebAssembly/WASI#281

Closed

Recast multi-level imports as single-level instance imports

9565517

lukewagner mentioned this pull request May 27, 2020

Consider lookahead implications of text format rules #6

Closed

sbc100 reviewed May 27, 2020

View reviewed changes

Address Sam's feedback

ff434ea

sunfishcode reviewed May 27, 2020

View reviewed changes

proposals/module-types/Example-SharedEverythingDynamicLinking.md Show resolved Hide resolved

Add note about the linear memory stack pointer to the example

6b30a85

alexcrichton mentioned this pull request Jun 4, 2020

Order for testing subtyping between modules #7

Closed

lukewagner merged commit 997a5d6 into master Jun 9, 2020

lukewagner deleted the module-imports branch June 9, 2020 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize Module Types to Module Linking #3

Generalize Module Types to Module Linking #3

lukewagner commented Apr 26, 2020

alexcrichton left a comment

lukewagner commented Apr 28, 2020

lachlansneff commented Apr 30, 2020

lukewagner commented May 1, 2020 •

edited

Loading

Ms2ger left a comment

lukewagner commented May 11, 2020

binji commented May 18, 2020

lukewagner commented May 18, 2020

rossberg left a comment

sbc100 Apr 27, 2020

lukewagner May 27, 2020

sbc100 May 27, 2020

lukewagner May 27, 2020

sbc100 May 27, 2020

lukewagner May 27, 2020

sbc100 May 23, 2020

sbc100 May 27, 2020

lukewagner May 27, 2020

sbc100 May 27, 2020

lukewagner May 27, 2020

lukewagner commented Jun 9, 2020

		@@ -0,0 +1,524 @@
		# Shared-Everything Dynamic Linking Example

Generalize Module Types to Module Linking #3

Generalize Module Types to Module Linking #3

Conversation

lukewagner commented Apr 26, 2020

alexcrichton left a comment

Choose a reason for hiding this comment

lukewagner commented Apr 28, 2020

lachlansneff commented Apr 30, 2020

lukewagner commented May 1, 2020 • edited Loading

Ms2ger left a comment

Choose a reason for hiding this comment

lukewagner commented May 11, 2020

binji commented May 18, 2020

lukewagner commented May 18, 2020

rossberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukewagner commented Jun 9, 2020

lukewagner commented May 1, 2020 •

edited

Loading