Add a new "abi" which supports the full type grammar #422

alexcrichton · 2021-04-02T15:52:36Z

This commit adds a new Abi variant called Abi::Next. The purpose of this new ABI is to support the full expressivity of the type grammar, making validation much simpler since there generally doesn't need to be a ton of validation. The goal of this ABI is to have the next WASI snapshot move to it. Old WASI snapshots will never use this ABI because it's a breaking change from the existing WASI ABI.

At a syntactical level functions are specified with the old ABI as:

(@interface func (export "foo") ...)

whereas the new ABI is recognized as:

(export "foo" (func ...))

This commit also prepares the next wasi snapshot to rely far less on @witx and custom types which are unlikely to be in interface types. To that end a few changes are made:

Discriminants of variants are now automatically sized to be as small as possible. This doesn't affect the current WASI ABI because all discriminants are already specified.
New in-buffer T and out-buffer T types are added. These represent input/output from the callee's perspective (e.g. a read function takes an out-buffer and a write function takes an in-buffer) and represent a slice of T in memory that the callee may consume but also may not. Having this be a first-class type instead of raw pointers allows WASI to get virtualized by wasm modules themselves given suitable host environments.
Structs-of-bools are now automatically represented as bitflags.

The new ABI doesn't have a ton of documentation yet but I hope to write that up in the future as necessary. For now the details can be mostly glossed over since code generators are just receiving primitive instructions to implement and the details of the ABI are all handled by this crate. At a high level though the ABI is:

There's a different ABI whether a function is imported or exported and whether the caller is wasm or a host. For example if wasm calls an imported function with a string it can simply pass a pointer/length. If a host calls wasm with a string, however, it needs to malloc space for the string and the wasm needs to know that it's receiving an owned allocation.
All types can be "flattened" into primitive wasm values. If a type is in a return or parameter position this flattening is done to generate the actual function's arguments and return values.
All types have a defined in-memory representation. This enables types like (list T) to work so the callee knows what representation the caller has.
The ABI relies on the wasm module to implicitly export a few functions and items. For example an export called memory must be exported currently. Similarly for some types in the ABI you'd also need to export a witx_malloc and witx_free function. The exact details of how to wire all this up I hope can be more flexible in the future, but I figure this is probably at least a good starting point.

The intention of this ABI is to be a sort of "canonical ABI" for interface types. This enables WASI (and everything else using witx) to use the full type grammar as intended by interface types while assigning meaning to what the host/wasm need to do to communicate with each other. In the limit interface types will allow each module to customize its precise ABI, but for now this gives the ability to today have modules start communicating while we wait for the customization pieces to all fall in place.

Currently I have not changed the ephemeral snapshot to use the new ABI, but depending on feedback on this that's the next thing I'd like to do. That should enable the ephemeral snapshot to have access to a much more rich type grammar than what it has access to today. Furthermore I'd also like to eventually "backport" the {in,out}-buffer types to the previous ABI in a simple fashion (just a pointer/length) to ideally remove the need for @witx if possible. I haven't attempted to do this yet.

This commit implements support for new ABI which is an evolution of the current ABI specific to WASI. The main purpose of this ABI is to support all possible types in all places (e.g. multiple results, multiple params, lists of records of variants of structs of lists, etc...).

This is necessary to implement lists-of-lists properly with translation/validation.

Born out of recent discussions and realizations that we'll need an owned/borrowed distinction for arguments where possible.

Even if they aren't declared as such.

Mostly just updating read/write to load/store the appropriate size of the bitflags instead of decomposing into structs-of-bools.

* Automatically size enums based on how many cases they have * Read/write the tag appropriate tag size

Allows code generators to use this in their own calculations if necessary.

sunfishcode

This looks great!

sunfishcode · 2021-04-08T01:27:12Z

tools/witx/src/abi.rs

+                        // bitcasts. This will go through and cast everything
+                        // to the right type to ensure all blocks produce the
+                        // same set of results.
+                        casts.truncate(0);


Here and elsewhere, is there a reason for using truncate(0) instead of clear()?

Nah that's just my age showing, I'm not sure we had clear() at Rust 1.0...

I'll switch!

sunfishcode · 2021-04-08T01:54:26Z

tools/witx/src/ast.rs

@@ -438,6 +519,39 @@ pub struct Variant {
 }

 impl Variant {
+    pub fn infer_repr(cases: usize) -> IntRepr {
+        match cases {
+            n if n < u8::max_value() as usize => IntRepr::U8,


When n is 255 or 256, it seems like we could still use a u8 variant, right? This is just an optimization, but I also wanted to make sure I'm not missing something subtle. So this could be written as n if n <= 0x100 and similar for the other types?

Oh dear thanks for catching this!

tools/witx/src/abi.rs

jedisct1 · 2021-04-08T11:56:23Z

When using the new ABI here:

(typename $mystruct (record (field $member1 u8) (field $member2 u8) (field $member3 u8)))
(export "xyz" (func (param $a $mystruct) (param $b $mystruct) (result $error (error $errno))))

wasm_signature(CallMode::DefinedImport) returns 7 input parameters for the function signature. Is that expected? Even if the structure is flattened, this doesn't match the offsets of the members.

sunfishcode · 2021-04-08T14:47:52Z

tools/witx/src/abi.rs

+    I64ToF32,
+
+    None,
+}


Bitcasts between types with different sizes are sensitive to endianness. For example, in F32ToF64, does the F32 go in the most significant half of the F64 or the least significant half? Cross-endian configurations are a theoretical concern at this point, but I think we could at least document what should happen. Since wasm itself is little-endian, I propose this say "bitcasts between types with different sizes use little-endian byte ordering".

Also, it'd be good to mention here that widening bitcasts zero-extend.

Oh I've figured that like wasm everything is little-endian here. For conversions like f32 to f64 I'm imagining it's the same as f64::from(1.0f32) in Rust where it's not really about moving bits but the f64 value losslessly matches the f32 value.

But yeah I'll definitely clarify this and indicate that everything is zero-extended. I haven't thought too too hard about the semantics here, but I think this'll all be ok.

Ah. f64::from is lossless except it converts signaling NaN to quiet NaN, which is obscure, but surprising given that the rest of the ABI preserves NaN bit patterns. But this isn't urgent to sort out now.

alexcrichton · 2021-04-08T15:08:48Z

@jedisct1 ah yeah that's expected, the record-by-value-parameter is "splatted" into its flattened form, so each of the two arguments takes up 3 literal parameter values each in the wasm signature. The return value is also represented as multiple values, however, and because C/Rust aren't super great about multi-value returns today that's represented as a return pointer. This means that in all it turns out as 7 parameters (3 for first arg, 3 for second, 1 for ret pointer)

jedisct1 · 2021-04-08T16:23:27Z

Thanks for clarifying, and for your amazing work on this, Alex!

The ret pointer totally makes sense. Multi-return values also require tuples in Zig, Swift and AssemblyScript, and having a single pointer is more convenient than the previous ABI.

However, the flattening of structures in function parameters is quite of a massive change. Does that mean that we will always need two completely different representations for the same type, even if properly padded structures are already used internally? That seems to make everything more complicated, including language support for using imported functions.

That flattening is certainly necessary. But if there is a way to avoid such a departure from the previous ABI and still use pointers instead, that would be immensely useful to ease the work of people maintaining languages, code generators and runtimes.

alexcrichton · 2021-04-08T17:00:10Z

It's definitely not intended to have two representations of the same type. The intention is that code generators would be based on this crate which abstracts away all the details of the ABI. This helps ensure that code generators all agree with one another and there's only one place that actually defines the ABI (this crate). In that sense the intention is to solve the problems you're mentioning, not create new problems.

This will require code generators to migrate to using this crate and the ABI definitions, which is a large change, but it's expected that the general amount of maintenance afterwards is no different from today.

jedisct1 · 2021-04-08T17:43:06Z

Thanks Alex,

Only code generators in Rust can use this crate :(

The two representations I was referring to is the fact that when using this crate, offsets returned by member_layout() don't match the splatted representation for function parameters.

Should these offsets be ignored and the splatted representation is always the correct one to store data as?

alexcrichton · 2021-04-08T18:12:05Z

Yes that's true, if you don't want to write Rust code you won't be able to use the crate. The intention is that this ABI is documented/canonicalized in documentation (like the wasm spec) so if other implementations would like to rebuild everything there's still a shared specification of what to do.

For the two representations you're talking about, I'm not sure what you mean. (sorry I haven't been sure for the past few comments but haven't addressed this point specifically). When a struct is passed as a parameter each of its fields recursively get expanded into individual function arguments. This has nothing to do with memory/layout/etc since nothing is stored in memory, everything is passed as function parameters and such.

Does that answer your question? Sorry I'm not entirely sure because the "splat to arguments" is not intended to be a representation, it's just an implementation detail of how you call a function with a struct argument

Use the parsed module name for inline modules, and validate that it matches the filename for file modules.

alexcrichton · 2021-05-03T21:16:11Z

This is now more formally specified at WebAssembly/interface-types#132 in the context of interface types. The intention is that once that's settled this will be updated to match the specification there!

And in witxt files, use the module name instead of giving `(witx ...)` its own name.

Move typename, resource, and const declarations inside of module syntax.

sunfishcode · 2023-01-28T15:13:23Z

This work has since been subsumed by the Canonical ABI and associated tooling.

alexcrichton added 17 commits February 22, 2021 12:18

Implement reading/writing to/from memory

abc2c9e

This is necessary to implement lists-of-lists properly with translation/validation.

Add some tests for read/write memory

5e6c843

Fill out a bit and fix a few bugs

c008c8e

Update ABIs with more modes of calling

c1495cb

Born out of recent discussions and realizations that we'll need an owned/borrowed distinction for arguments where possible.

Automatically infer structs as bitflags/tuple/etc

0030d39

Even if they aren't declared as such.

Support bitflags in the Next ABI

fc59732

Mostly just updating read/write to load/store the appropriate size of the bitflags instead of decomposing into structs-of-bools.

Support @witx tag in the Next ABI

eefa46c

* Automatically size enums based on how many cases they have * Read/write the tag appropriate tag size

Trim to minimize the witx ABI

0881434

Don't support export modes with call and the preview1 abi

a6835be

Add some initial fuzzing

a5b09ac

Add an option shorthand

9726d02

Move a helper function to a method on Type

439b4f8

Allows code generators to use this in their own calculations if necessary.

Tweak memory ownership in list lifting/lowering

a7fcb5f

Bump witx to 0.10.0

98a294d

Initial support for in-buffer and out-buffer types

2beb50f

Merge remote-tracking branch 'origin/main' into abi-next

00d4af8

alexcrichton mentioned this pull request Apr 2, 2021

Add an option shorthand to witx. #420

Closed

sunfishcode reviewed Apr 8, 2021

View reviewed changes

sunfishcode added 2 commits April 20, 2021 10:14

Move typename, resource, and const declarations inside of module syntax.

59e67ae

rustfmt

5651d52

rektide mentioned this pull request May 3, 2021

Phase 2 estimate WebAssembly/interface-types#130

Open

Use the parsed module name where applicable.

9013e18

Use the parsed module name for inline modules, and validate that it matches the filename for file modules.

sunfishcode and others added 4 commits May 3, 2021 17:15

Don't error if the filename "-" doesn't match the module name.

b53b089

Move use syntax inside of modules.

d31ad1b

And in witxt files, use the module name instead of giving `(witx ...)` its own name.

Say "core wasm" instead of "native wasm".

85a887b

Merge pull request #1 from sunfishcode/abi-next-more

70a8c63

Move typename, resource, and const declarations inside of module syntax.

pchickey mentioned this pull request May 28, 2021

Implementing a wasmer backend for wiggle bytecodealliance/wasmtime#2949

Closed

sunfishcode closed this Jan 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new "abi" which supports the full type grammar #422

Add a new "abi" which supports the full type grammar #422

alexcrichton commented Apr 2, 2021

sunfishcode left a comment

sunfishcode Apr 8, 2021

alexcrichton Apr 8, 2021

sunfishcode Apr 8, 2021

alexcrichton Apr 8, 2021

jedisct1 commented Apr 8, 2021 •

edited

Loading

sunfishcode Apr 8, 2021

alexcrichton Apr 8, 2021

sunfishcode Apr 8, 2021

alexcrichton commented Apr 8, 2021

jedisct1 commented Apr 8, 2021

alexcrichton commented Apr 8, 2021

jedisct1 commented Apr 8, 2021

alexcrichton commented Apr 8, 2021

alexcrichton commented May 3, 2021

sunfishcode commented Jan 28, 2023

Add a new "abi" which supports the full type grammar #422

Add a new "abi" which supports the full type grammar #422

Conversation

alexcrichton commented Apr 2, 2021

sunfishcode left a comment

Choose a reason for hiding this comment

sunfishcode Apr 8, 2021

Choose a reason for hiding this comment

alexcrichton Apr 8, 2021

Choose a reason for hiding this comment

sunfishcode Apr 8, 2021

Choose a reason for hiding this comment

alexcrichton Apr 8, 2021

Choose a reason for hiding this comment

jedisct1 commented Apr 8, 2021 • edited Loading

sunfishcode Apr 8, 2021

Choose a reason for hiding this comment

alexcrichton Apr 8, 2021

Choose a reason for hiding this comment

sunfishcode Apr 8, 2021

Choose a reason for hiding this comment

alexcrichton commented Apr 8, 2021

jedisct1 commented Apr 8, 2021

alexcrichton commented Apr 8, 2021

jedisct1 commented Apr 8, 2021

alexcrichton commented Apr 8, 2021

alexcrichton commented May 3, 2021

sunfishcode commented Jan 28, 2023

jedisct1 commented Apr 8, 2021 •

edited

Loading