Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new "abi" which supports the full type grammar #422

Closed
wants to merge 24 commits into from

Conversation

alexcrichton
Copy link
Contributor

This commit adds a new Abi variant called Abi::Next. The purpose of this new ABI is to support the full expressivity of the type grammar, making validation much simpler since there generally doesn't need to be a ton of validation. The goal of this ABI is to have the next WASI snapshot move to it. Old WASI snapshots will never use this ABI because it's a breaking change from the existing WASI ABI.

At a syntactical level functions are specified with the old ABI as:

(@interface func (export "foo") ...)

whereas the new ABI is recognized as:

(export "foo" (func ...))

This commit also prepares the next wasi snapshot to rely far less on @witx and custom types which are unlikely to be in interface types. To that end a few changes are made:

  • Discriminants of variants are now automatically sized to be as small as possible. This doesn't affect the current WASI ABI because all discriminants are already specified.
  • New in-buffer T and out-buffer T types are added. These represent input/output from the callee's perspective (e.g. a read function takes an out-buffer and a write function takes an in-buffer) and represent a slice of T in memory that the callee may consume but also may not. Having this be a first-class type instead of raw pointers allows WASI to get virtualized by wasm modules themselves given suitable host environments.
  • Structs-of-bools are now automatically represented as bitflags.

The new ABI doesn't have a ton of documentation yet but I hope to write that up in the future as necessary. For now the details can be mostly glossed over since code generators are just receiving primitive instructions to implement and the details of the ABI are all handled by this crate. At a high level though the ABI is:

  • There's a different ABI whether a function is imported or exported and whether the caller is wasm or a host. For example if wasm calls an imported function with a string it can simply pass a pointer/length. If a host calls wasm with a string, however, it needs to malloc space for the string and the wasm needs to know that it's receiving an owned allocation.
  • All types can be "flattened" into primitive wasm values. If a type is in a return or parameter position this flattening is done to generate the actual function's arguments and return values.
  • All types have a defined in-memory representation. This enables types like (list T) to work so the callee knows what representation the caller has.
  • The ABI relies on the wasm module to implicitly export a few functions and items. For example an export called memory must be exported currently. Similarly for some types in the ABI you'd also need to export a witx_malloc and witx_free function. The exact details of how to wire all this up I hope can be more flexible in the future, but I figure this is probably at least a good starting point.

The intention of this ABI is to be a sort of "canonical ABI" for interface types. This enables WASI (and everything else using witx) to use the full type grammar as intended by interface types while assigning meaning to what the host/wasm need to do to communicate with each other. In the limit interface types will allow each module to customize its precise ABI, but for now this gives the ability to today have modules start communicating while we wait for the customization pieces to all fall in place.

Currently I have not changed the ephemeral snapshot to use the new ABI, but depending on feedback on this that's the next thing I'd like to do. That should enable the ephemeral snapshot to have access to a much more rich type grammar than what it has access to today. Furthermore I'd also like to eventually "backport" the {in,out}-buffer types to the previous ABI in a simple fashion (just a pointer/length) to ideally remove the need for @witx if possible. I haven't attempted to do this yet.

This commit implements support for new ABI which is an evolution of the
current ABI specific to WASI. The main purpose of this ABI is to support
all possible types in all places (e.g. multiple results, multiple
params, lists of records of variants of structs of lists, etc...).
This is necessary to implement lists-of-lists properly with
translation/validation.
Born out of recent discussions and realizations that we'll need an
owned/borrowed distinction for arguments where possible.
Even if they aren't declared as such.
Mostly just updating read/write to load/store the appropriate size of
the bitflags instead of decomposing into structs-of-bools.
* Automatically size enums based on how many cases they have
* Read/write the tag appropriate tag size
Allows code generators to use this in their own calculations if
necessary.
Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

// bitcasts. This will go through and cast everything
// to the right type to ensure all blocks produce the
// same set of results.
casts.truncate(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, is there a reason for using truncate(0) instead of clear()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah that's just my age showing, I'm not sure we had clear() at Rust 1.0...

I'll switch!

@@ -438,6 +519,39 @@ pub struct Variant {
}

impl Variant {
pub fn infer_repr(cases: usize) -> IntRepr {
match cases {
n if n < u8::max_value() as usize => IntRepr::U8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When n is 255 or 256, it seems like we could still use a u8 variant, right? This is just an optimization, but I also wanted to make sure I'm not missing something subtle. So this could be written as n if n <= 0x100 and similar for the other types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear thanks for catching this!

tools/witx/src/abi.rs Show resolved Hide resolved
@jedisct1
Copy link
Member

jedisct1 commented Apr 8, 2021

When using the new ABI here:

(typename $mystruct (record (field $member1 u8) (field $member2 u8) (field $member3 u8)))
(export "xyz" (func (param $a $mystruct) (param $b $mystruct) (result $error (error $errno))))

wasm_signature(CallMode::DefinedImport) returns 7 input parameters for the function signature. Is that expected? Even if the structure is flattened, this doesn't match the offsets of the members.

I64ToF32,

None,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitcasts between types with different sizes are sensitive to endianness. For example, in F32ToF64, does the F32 go in the most significant half of the F64 or the least significant half? Cross-endian configurations are a theoretical concern at this point, but I think we could at least document what should happen. Since wasm itself is little-endian, I propose this say "bitcasts between types with different sizes use little-endian byte ordering".

Also, it'd be good to mention here that widening bitcasts zero-extend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I've figured that like wasm everything is little-endian here. For conversions like f32 to f64 I'm imagining it's the same as f64::from(1.0f32) in Rust where it's not really about moving bits but the f64 value losslessly matches the f32 value.

But yeah I'll definitely clarify this and indicate that everything is zero-extended. I haven't thought too too hard about the semantics here, but I think this'll all be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. f64::from is lossless except it converts signaling NaN to quiet NaN, which is obscure, but surprising given that the rest of the ABI preserves NaN bit patterns. But this isn't urgent to sort out now.

@alexcrichton
Copy link
Contributor Author

@jedisct1 ah yeah that's expected, the record-by-value-parameter is "splatted" into its flattened form, so each of the two arguments takes up 3 literal parameter values each in the wasm signature. The return value is also represented as multiple values, however, and because C/Rust aren't super great about multi-value returns today that's represented as a return pointer. This means that in all it turns out as 7 parameters (3 for first arg, 3 for second, 1 for ret pointer)

@jedisct1
Copy link
Member

jedisct1 commented Apr 8, 2021

Thanks for clarifying, and for your amazing work on this, Alex!

The ret pointer totally makes sense. Multi-return values also require tuples in Zig, Swift and AssemblyScript, and having a single pointer is more convenient than the previous ABI.

However, the flattening of structures in function parameters is quite of a massive change. Does that mean that we will always need two completely different representations for the same type, even if properly padded structures are already used internally? That seems to make everything more complicated, including language support for using imported functions.

That flattening is certainly necessary. But if there is a way to avoid such a departure from the previous ABI and still use pointers instead, that would be immensely useful to ease the work of people maintaining languages, code generators and runtimes.

@alexcrichton
Copy link
Contributor Author

It's definitely not intended to have two representations of the same type. The intention is that code generators would be based on this crate which abstracts away all the details of the ABI. This helps ensure that code generators all agree with one another and there's only one place that actually defines the ABI (this crate). In that sense the intention is to solve the problems you're mentioning, not create new problems.

This will require code generators to migrate to using this crate and the ABI definitions, which is a large change, but it's expected that the general amount of maintenance afterwards is no different from today.

@jedisct1
Copy link
Member

jedisct1 commented Apr 8, 2021

Thanks Alex,

Only code generators in Rust can use this crate :(

The two representations I was referring to is the fact that when using this crate, offsets returned by member_layout() don't match the splatted representation for function parameters.

Should these offsets be ignored and the splatted representation is always the correct one to store data as?

@alexcrichton
Copy link
Contributor Author

Yes that's true, if you don't want to write Rust code you won't be able to use the crate. The intention is that this ABI is documented/canonicalized in documentation (like the wasm spec) so if other implementations would like to rebuild everything there's still a shared specification of what to do.

For the two representations you're talking about, I'm not sure what you mean. (sorry I haven't been sure for the past few comments but haven't addressed this point specifically). When a struct is passed as a parameter each of its fields recursively get expanded into individual function arguments. This has nothing to do with memory/layout/etc since nothing is stored in memory, everything is passed as function parameters and such.

Does that answer your question? Sorry I'm not entirely sure because the "splat to arguments" is not intended to be a representation, it's just an implementation detail of how you call a function with a struct argument

Use the parsed module name for inline modules, and validate that it
matches the filename for file modules.
@alexcrichton
Copy link
Contributor Author

This is now more formally specified at WebAssembly/interface-types#132 in the context of interface types. The intention is that once that's settled this will be updated to match the specification there!

sunfishcode and others added 4 commits May 3, 2021 17:15
And in witxt files, use the module name instead of giving `(witx ...)`
its own name.
Move typename, resource, and const declarations inside of module syntax.
@sunfishcode
Copy link
Member

This work has since been subsumed by the Canonical ABI and associated tooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants