Skip to content

[Wasm RyuJIT] Object Writer #121143

@JulieLeeMSFT

Description

@JulieLeeMSFT

Object Writer

To generate usable Wasm from RyuJIT we need an object writer. The object writer has multiple responsibilities, not limited to:

NativeAOT-LLVM has an existing object writer which can be found here: https://github.com/dotnet/runtimelab/blob/feature/NativeAOT-LLVM/src/coreclr/tools/aot/ILCompiler.LLVM/CodeGen/WasmObjectWriter.cs

Object model

There are two models we can adopt for RyuJIT Wasm, and the object writer should be able to support both:

  1. Single unified module. In this case, we link all of the Wasm modules generated by the build together (using wasm-ld) into a single module alongside the runtime itself, so there's just a final yourapp.wasm to be loaded by the host. This requires robust relocation information and has a mixture of upsides and downsides.
  2. Array of modules. In this case, the runtime is loaded by itself as a free-standing module - the 'main' module - and then we load a module for each managed assembly, where 1-N of these modules also contain Wasm code that was generated by RyuJIT. The runtime is responsible for orchestrating this. This model is not compatible with hosts like wasmtime, which expect a single module to AOT-compile and run.

Object composition

Function signatures

In Wasm each module has a type section which enumerates all the different function signatures (function types) used by the module. This section precedes the actual code section and the function section (along with the import section) refers to function types by index.
We need to generate entries in this section for all of our compiled methods and any method signatures that our compiled code needs to invoke, otherwise the module will be invalid. We want to avoid generating duplicate entries.

Imports

In Wasm each module explicitly specifies its external dependencies, in our case most of these would be PAL APIs or helpers. We have to generate a table of these at build time and associate each import with a name and a type (from the function signatures table). This section comes after the function signatures and before functions because imports and declared functions all share the same index space.
We likely also want to import a function pointer table from outside, the one being used by the CoreCLR runtime.
It is probably fine to have unused imports or have a standard set of imports for 'all the PAL APIs and CLR helpers' instead of trying to specifically only import what we use.

Functions

The function section is a sequence of function type indices (pointing into the type section) where each one corresponds with an actual function body in the code section, coming later in the module.

table section

We want to either declare a function pointer table or import one from outside so that it can be used for call_indirect.

Memories

We want to declare one linear memory with no maximum size and a reasonable minimum size. The linker will merge it with the memories of other modules.

Globals

We want to declare some common globals like the stack top/bottom, etc, matching what emscripten clang generates so that we can link correctly with the CoreCLR runtime module.

Exports

If we plan to export global variables or functions by name we'll need to generate an exports section which maps function indices (from the functions section + imports section) and global indices to names.

Element section

Any function pointer data we want to load into the function pointer table needs to be defined in the element section as a vector of function indices.
If we're using the linker, it will automatically synthesize the contents of the table for us so we won't need to generate any elements in that scenario, just make sure that we have appropriate relocations for any function that has its address taken.

Code section

Actual method bodies live in the code section, where there is one function body for every entry in the function section above (which is where the signature of the function was specified).
Each function body specifies its locals in groups where all the locals in a given group have the same type, i.e. 10 i32s, 5 f32s, 3 f64s and they are sequentially numbered.
After the locals the actual code follows.
We need to track any relocatable values that end up in code, like function pointers or memory addresses, so we can emit relocs for them.

Data section

String literals and other compile time information, along with reserved locations for things like indirection cells, all are defined by 'data segments' in the data section. Each segment specifies an offset and a size along with a vector of bytes to fill that segment at load time.
We need to track any relocatable values that end up in data segments, like function pointers or memory addresses, so we can emit relocs for them.

Names (custom) section

Contains names for each function in the module, used for debugging and profiling.

Target features (custom) section

Specifies which Wasm target features we're using, various tooling including wasm-ld wants to see this in order to process our code correctly. We'll want to make sure we don't use non-MVP features without specifying them here.

(incomplete - work in progress)

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions