Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a toolchain-independent ABI document, and propose _initialize #203

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sunfishcode
Copy link
Member

The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should.

WASI has defined a convention using an exported function named _initialize, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way.

To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the _initialize function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed.

Related activities

In the component model, there is a proposal to add a second initialization phase. If that's done, then component-model toolchains could arrange for this _initialize function to be called automatically by this second initialization mechanism.

Considered alternatives

It is tempting to use the Wasm start function for C++ constructors; this has been extensively discussed, and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports.

It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step.

Wasm-ld has a __wasm_call_ctors function, and in theory we could use that name instead of _initialize, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use _initialize as the exported function, which may call __wasm_call_ctors in its body.

Process

We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process:

  • I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR.

  • After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility.

  • If the CG is in favor of it, then I'd propose we merge this PR.

The Wasm ecosystem is currently not consistent in how "constructors" such
as C++ static initializers and similar features in other languages are
implemented, and the result is users reporting constructs running multiple
times, and other users reporting constructors not getting run when they
should.

WASI has [defined a convention] using an exported function named
`_initialize`, however not all users are using WASI conventions. In
particular, users of what is sometimes called "wasm32-unknown-unknown"
are not expecting to follow WASI conventions. However, they still have a
need for constructors working in a reliable way.

To address this, I propose moving this out of WASI and defining this as
a toolchain-independent ABI, here in tool-conventions. This would
recognize the `_initialize` function as the toolchain-independent way
to ensure that constructors are properly called before other exports are
accessed.

Related activities
------------------

In the component model, there is a proposal to add a
[second initialization phase]. If that's done, then component-model
toolchains could arrange for this `_initialize` function to be called
automatically by this second initialization mechanism.

Considered alternatives
-----------------------

It is tempting to use the [Wasm start function] for C++ constructors;
this has been [extensively discussed], and the short answer is, the Wasm
start function is often called at a time when the outside environment
can't access the module's exports, and C++ constructors can run
arbitrary user code which may generate calls to things that need to
access the module's exports.

It's also tempting to propose defining a second initialization phase in
core Wasm. I'm not opposed to this, but it is more complex at the core
Wasm level than at the component-model level. For example, in Emscripten,
Wasm modules depend on JS code being able to run after the exports are
available but before the initialization function is called, which
wouldn't be possible if we simply call the initilaization function as
part of the linking step.

Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use
that name instead of `_initialize`, but wasm-ld already does insert some
initialization in addition to just constructors, so I think it makes
sense to use `_initialize` as the exported function, which may call
`__wasm_call_ctors` in its body.

Process
-------

We don't have a formal process defined for tool-convention proposals,
but because this is proposal has potentially wide-ranging impacts, I
propose to follow the following process:

 - I'm starting by posting this PR here, and people can comment on it.
   If a better alternative emerges, I'll close this PR.

 - After discussion here settles, if a better alternative hasn't emerged,
   I plan to request a CG meeting agenda item to present this topic to the
   CG, and seek feedback there, to ensure that it has CG-level visibility.

 - If the CG is in favor of it, then I'd propose we merge this PR.

[defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md
[second initialization phase]: WebAssembly/component-model#146
[Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start
[extensively discussed]: WebAssembly/design#1160
[`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
@ricochet
Copy link

ricochet commented Mar 6, 2023

I am in favor of this proposal. As a separate process question, is there a list of producers that need to be updated? It may help to communicate early with those toolchains. It might also help if this proposal encourages toolchains to link back to this PR in their issue/PR to make it easier to determine when support is added.

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with all of this.

Perhaps we can come up with a better title than "Toolchain-independent ABI"... by definition aren't all the things in this repo (e.g. BasicCABI and Linker etc) toolchain independent. Maybe BasicModuleABI?

@abrown
Copy link

abrown commented Mar 7, 2023

Great issue description! That background content is almost worth more than the actual document.

@sunfishcode
Copy link
Member Author

@ricochet I think many languages have a potential use for this. But in terms of things that need to be updated, I think the bigger coordination concern here is with embedders. In some cases we'll need engines to call _initialize automatically, and in other cases, we'll need users to call _initialize themselves.

@sbc100 BasicModuleABI works for me. Renamed.

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question about the process. Most of the stuff in tool-conventions is here specifically to avoid the need for full CG-level process. I guess this API reflects a larger level of cross-toolchain interoperation than even what we have here already.

BasicModuleABI.md Outdated Show resolved Hide resolved
BasicModuleABI.md Outdated Show resolved Hide resolved
sunfishcode added a commit to sunfishcode/meetings that referenced this pull request Apr 6, 2023
Add an agenda item for discussing the proposed `_initialize` convention in
tool-conventions.

     - This would add a new tool-conventions document named
       BasicModuleABI.md which would define the `_initialize` convention.

     - See WebAssembly/tool-conventions#203
       for details.

I'm following the process described at the bottom of [this comment]. I'm
open to alternative approaches to solving the problem.

[this comment]: WebAssembly/tool-conventions#203 (comment)
@sunfishcode
Copy link
Member Author

I've incorporated various bits of feedback, and the discussion here now seems to have settled. Following the process I described above, I've now filed an agenda item to discuss this with the CG: WebAssembly/meetings#1253

dtig pushed a commit to WebAssembly/meetings that referenced this pull request Apr 11, 2023
Add an agenda item for discussing the proposed `_initialize` convention in
tool-conventions.

     - This would add a new tool-conventions document named
       BasicModuleABI.md which would define the `_initialize` convention.

     - See WebAssembly/tool-conventions#203
       for details.

I'm following the process described at the bottom of [this comment]. I'm
open to alternative approaches to solving the problem.

[this comment]: WebAssembly/tool-conventions#203 (comment)
@titzer
Copy link

titzer commented Apr 11, 2023

What if the toolchain injected a check for initialization status into exports? AFAICT that would make it impossible to misuse a module by not calling its initialization, as that would be done lazily. The code would be roughly:

    (if (global.get $uninitialized) (then call $_initialize))   

I count about 6-7 bytes to wrap each exported function. The dynamic cost would be 3 instructions: load, compare, branch.

If we wanted to reduce either of those costs, another option would be to add an explicit dependency to the export mechanism. Roughly, to call this export, this other one-shot function must be called. And the engine could do that lazily, or it could trap if that function hasn't called, etc.

@kripken
Copy link
Member

kripken commented Apr 11, 2023

I think this generally makes sense, and I don't have any better ideas, but I want to stress what I see as the downside here: If this is adopted by some VMs and not others, then wasm files will become less portable. Specifically, someone might notice that the same wasm file runs in one way in, say, wasmtime and wasm3 (which call _initialize automatically) but not on the Web when using new WebAssembly.Module() (as browsers do not call it automatically). And this can be surprising because the wasm itself does not indicate in any way "I can only be run in certain VMs" - all VMs can try to run it, with different results.

Now, maybe that's fine and expected. But I wonder if we can do a little better than leave it as a surprising thing for people to run into. Some vague thoughts:

  1. We could define layered concepts that make this feel natural. I mean, there could be a "vanilla wasm" that does not call _initialize, and a "fancy wasm" (all of this with better names 😄 ) that does. Then the explanation for why the wasm files run differently would be "Web VMs implement vanilla wasm, while server VMs also implement fancy wasm; these are different things, so you need different wasm files." And also we could say "to run a fancy wasm on the Web, you need custom JS, that is, fancy wasm + custom JS can run on a VM that supports JS + vanilla wasm."
  2. We could go further and actually make the wasm portable by making it possible to run it on the Web, that is, have a new API aside from new WebAssembly.Module that does call _initialize (and does not let exports/imports be entangled in the way that is the problem that causes the need for all of this).

I'm not sure either of those is a good idea! Just some thoughts.

Put another way, I think it would be good if we had a clear answer for someone that builds a new wasm VM tomorrow and asks themselves, "should I call _initialize or not?" If we don't have a rule for that then it might end up with "well, it seems like most of the wasm files people will run in my VM expect it / don't expect it, so I will / won't."

@tlively
Copy link
Member

tlively commented Apr 11, 2023

Maybe we could add a non-normative note to the Core and/or JS specs mentioning this toolchain convention and pointing out that just core spec initialization may not be enough to make the module usable as intended.

If a module exports a function named `_initialize` with no arguments and no
return values, and does not export a function named `_start`, the toolchain
that produced my assume that on any instance of the module, this `_initialize`
function is called before any other exports are accessed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the relationship with similar functions like __wasm_call_ctors?
eg. which one to call if multiple of them are exported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, and indeed, it may be too late to try to design anything, to avoid breaking existing setups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _initialize and _start functions are in charge of calling __wasm_call_ctors.

I see __wasm_call_ctors as an llvm-internal function which is not designed to be called directly by users but instead by higher level toolchain functions like _start and _initialize (or other crt1-type functions).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. but there are existing conventions to call exported __wasm_call_ctors.
eg. https://github.com/emscripten-core/emscripten/blob/bec42dac7873903d09d713963e34020c22a8bd2d/src/library_dylink.js#L846

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that is because emscripten does it own thing here and basically implements _start / _initialize in JavaScript. I wouldn't worry about this case since emscripten only ever loads modules that is builds itself.. we may convert to using a more standards-based approach in the future, but for now emscripten-built wasm modules are fairly specific to emscripten.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, the standard-based way to go is to export either _start or _initialize. Exporting and directly calling main and/or __wasm_call_ctors (as emscripten does) is non-standard.

It would also be non-standard/undefined, for example, to call __wasm_call_ctors directly from user C/C++ code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, the standard thing suggested here is to ignore __wasm_call_ctors __post_instantiate etc even if they are exported, and only cares _initialize? (besides that, _start in case of wasi)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense to me yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. it makes sense to me too.

dicej added a commit to dicej/wasm-tools that referenced this pull request Oct 3, 2023
Per WebAssembly/tool-conventions#203 and recent LLVM and
wasi-libc changes, libraries may export an `_initialize` function instead of
`__wasm_call_ctors`, and we should call whichever one we find.  Note that we
expect to find at most one of the two, and will return an error if both are
found.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
alexcrichton pushed a commit to bytecodealliance/wasm-tools that referenced this pull request Oct 3, 2023
Per WebAssembly/tool-conventions#203 and recent LLVM and
wasi-libc changes, libraries may export an `_initialize` function instead of
`__wasm_call_ctors`, and we should call whichever one we find.  Note that we
expect to find at most one of the two, and will return an error if both are
found.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants