Skip to content
This repository has been archived by the owner on Oct 13, 2023. It is now read-only.

Tooling issue: Components and Constructors #99

Open
pchickey opened this issue Mar 1, 2023 · 2 comments
Open

Tooling issue: Components and Constructors #99

pchickey opened this issue Mar 1, 2023 · 2 comments

Comments

@pchickey
Copy link
Collaborator

pchickey commented Mar 1, 2023

Components and Constructors

Pat Hickey, 28 Feb 2023

Presently, there are some major problems using C/C++ constructors (ctors) with the component model.

  • Rust and C/C++ guests need to define a cabi_realloc function. cabi_realloc is defined with the same syntax as every other user-defined export function, but it has a special restriction under the component model: no component import functions may be called from inside it.
  • wasm-ld currently takes each user-defined export function in the module and synthesizes an export function which runs the ctors, then calls the user-defined function.
  • wasi-libc currently uses ctors to eagerly initialize the environment, preopens, and some bookkeeping for monotonic clocks. These ctors in turn call various WASI import functions.

Together, this state of affairs is incompatible with the component model, because cabi_realloc will call component import functions. See preview2-prototyping #97.

Shortest-term fix: preview 1 adapter short-circuits import function calls in ctors

The wasm-ld-synthesized export functions always call ctors, as part of invoking every export function call in the instance. This means ctors get run many times over the lifetime of an instance, which is not desirable behavior in general.

Joel recognized that this behavior does mean we can get away with implementing "short-circuit" logic that sets a global when the adapter's cabi_realloc has called into the adaptee's cabi_realloc. Then, in any preview 1 import functions that may be called from the adaptee's wasi-libc ctors, a trivial value is returned (i.e. the empty environment, the empty set of preopens) rather than call out to a preview 2 component import function.

The adapter is built for the wasm32-unknown-unknown target, so it does not include wasi-libc, and therefore the adapter's exports do not call any ctors of its own.

We get away with giving wasi-libc an incorrect value in the ctor because the wasi-libc ctors will be run again, correctly, at the start of every other export function besides cabi_realloc.

Short-term fix: guest bindgen behavior eliminates ctor calls in cabi_realloc

wasm-ld has logic to not synthesize ctor-calling wrapper functions iff the user's program makes calls to __wasm_call_ctors.

The Rust guest bindgen macro generates the definitions for all export functions, and the Rust guest crate provides the definition of cabi_realloc.

This does not work for the command case, because the _start export function is defined by wasi-libc, and needs to run ctors before calling the user's fn main(). This case will depend on the short-circuiting behavior in the adapter described above.

Medium-term fix: wasi-libc, when used by std, no longer uses ctors

There is only one case where wasi-libc really needs to use ctors: implementing the extern char **environ symbol. This archaic feature of libc assumes that the program's memory has the environment written to it before execution begins (e.g. execve(2)). Since wasm does not have this facility, wasi-libc uses ctors to call import functions and initialize environ before any other code is executed. It does, however, contain logic to fetch the environment lazily if environ is not present in the linked executable.

Rust's std used environ until recently, when Dan switched it to use a wasi-libc specific facility that allows iterating over the entire environment while still allowing it to be lazily initialized. So, as long as Rust guests are compiled with rustc 1.69 or later (nightly as of this writing, and should be stable on April 20), they won't call any environment-related import functions in ctors.

wasi-libc uses ctors in two more places: detecting preopens, and initializing a start time for the monotonic clock. The monotonic clock ctor should be trivial to remove: absolute values of the monotonic clock are undefined, so there is no need for the logic to count up from ctor-time.

Preopen ctors are trickier to remove, but we believe that we should be able to lazily initialize the preopen state at the beginning of calls to open(2) and close(2).

Once those two changes land in wasi-libc and a release is cut that can be upstreamed into std, it should be possible to compile Rust programs which have no dependency on ctors at all, as long as they don't contain their own references to environ, or link with C/C++ which uses ctors or environ.

Long-term fix: C/C++ programs depending on ctors are supported by tool-conventions and component model canonical ABI support

Having worked around the unpleasantness of ctors as much as possible, we need some long term solution to give C/C++ programs using ctors a way to run in the component model.

Even outside of the component model, the behavior of ctors in all Wasm targets is totally undefined: they behave however llvm happens to have implemented them at the moment. This situation isn't ideal for any Wasm user.

Dan and Luke have a rough plan to come up with a Wasm tool-conventions spec, which will describe a convention for constructors across all Wasm targets. The convention will, roughly, define:

  • a command as a module having an export named _start. In commands, ctors should be run only at the beginning of _start. Instances should expect to only have _start executed once, and trap on any subsequent invocations.

  • a reactor as a module having an export named _init. In reactors, ctors should be run only in _init. Instances should expect to have _init executed exactly once, before any other export functions are invoked.

Users have a reasonable expectation to be able to generate components from both wasm32-wasi and wasm32-unknown-unknown targets, since WASI isn't required to use the component model, so using a convention across all Wasm targets takes care of those users, rather than tailoring a solution to just WASI users.

Dan is going to take this proposal to the Wasm LLVM team (sbc100 et al) and hash out all of the details with them, making whatever modifications to this design based on their feedback.

Once the tool-convention spec is accepted by the LLVM team and the Wasm CG, it can then be relied upon by the component model spec, and the component model's canonical ABI can guarantee that reactors get _init executed following instantiation. The usual design problems around instantiation/initialization order of components which import each other apply here, and import functions may or may not end up being legal to call during _init.

This is a very long-running spec and implementation period, but with the fixes above, Bytecode Alliance stakeholders have managed to put a spec solution to ctors off the critical path, so it will be OK if this takes all year to get done.

Fixes for compatibility with existing Preview 1 modules

Either of the following solutions will keep binary compatibility working with existing Preview 1 modules:

  • Depend on the preview 1 adapter short-circuiting import function calls in ctors.
  • Implement a wasm rewriter that detects when the cabi_realloc export function has been synthesized by wasm-ld to call the ctors, then call the user-defined (internal)cabi_realloc. Replace the body of this function with a call to the internal cabi_realloc.
@dicej
Copy link
Collaborator

dicej commented Mar 3, 2023

Just wanted to chime in and say this all sounds good to me. Thanks for writing it up!

I'm happy to implement the short-circuit workaround for existing modules if nobody else has started yet.

@yowl
Copy link

yowl commented Jul 25, 2023

I want to add that invoking c++ constructors through a new exported function would be a problem for .Net CoreCLR as any exported function would go through it's reverse P/Invoke mechanism which expects the runtime to already be initialised. If the reactor's _initialize could be called then that would be enough for this use case. Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants