Skip to content
This repository has been archived by the owner on Aug 17, 2022. It is now read-only.

Draft specs of idl and bindings #52

Closed
wants to merge 13 commits into from
Closed

Draft specs of idl and bindings #52

wants to merge 13 commits into from

Conversation

fgmccabe
Copy link
Contributor

Initial draft of specs for type schema and bindings

@PoignardAzur
Copy link

I haven't had the time to take an in-depth look at the PR, but I'm getting the feeling it's solving the wrong problem.

  • I'm not sure there's much of a point in having a text IDL other than wast. People could always compile the binary IDL from WebIDL or cap'n proto or C headers.
  • The abstract semantic types feel like a leaky abstraction. Eg I don't understand the point of having an abstract integer type, instead of i32, u32, i64, u64.
  • It's way too early to be thinking about generics.

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this seems like a reasonable direction, although, obviously, many details still need working out.

However, I agree with @PoignardAzur that it is way too premature to add features like generic types, type imports, and probably exceptions. As long as they aren't even in Wasm, I don't think it makes sense to advance them into the IDL.

Things like try expressions in coercions also make me worry that the coercion language will quickly Greenspun into a fully-fledged programming language, which would kind of imply that it is the wrong abstraction.

Finally, the syntax could fit Wasm's better. But syntax is only a secondary concern at this point.


A *BindingLambda* is an expression that encapsulates either access to an
operation or provision of an operation. Binding lambdas have a function
signature; if it is an access binding then the signature is a WASM function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
signature; if it is an access binding then the signature is a WASM function
signature; if it is an access binding then the signature is a Wasm function


+ Access lambda
An _access lambda_ represents a specification of how an operation specified in
wi-IDL (typically an imported function) is accessed from WASM. It takes the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wi-IDL (typically an imported function) is accessed from WASM. It takes the
wi-IDL (typically an imported function) is accessed from Wasm. It takes the


where Namei are identifiers and Expression is an expression involving the
various Name parameters together with appropriate lifting and lowering
operators. The type of this lambda is a WASM function type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
operators. The type of this lambda is a WASM function type.
operators. The type of this lambda is a Wasm function type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What function type does this syntax imply?

For example, the lambda:

```
(access ($B $L) ($integer-to-i32 (utf8-count (base-len-as-string $B $L))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean

Suggested change
(access ($B $L) ($integer-to-i32 (utf8-count (base-len-as-string $B $L))))
(access ($B $L) (integer-to-i32 ($utf8-count (base-len-as-string $B $L))))

(access ($B $L) ($integer-to-i32 (utf8-count (base-len-as-string $B $L))))
```

denotes an access lambda that accesses a function utf8-count (which presumably counts the number of unicode code points in a string). The sub-expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a user-defined Wasm function?


>Question: Should type hiding be supported? I.e., if a quantified form introduces
>a type variable that is already in scope does it have the effect of occluding
>the outer name. Convenience and composability says yes, security says no.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may misunderstand the question, but AFAICT, composability yells "no". ;)


<div class="example">
```
(type Boolean [cases true false])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may become problematic if you ever want to add subtyping to the IDL, since it would force Bool to be compatible with any variant super and subtype.


Note that we specifically do not elaborate the numeric types -- for example into
signed vs unsigned, or 8bit vs 64bit. Such elaboration can be accounted for by
appropriate operators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it flies to have types that do not prescribes a value range. You need to make sure that two sides communicating through an IDL do not make inconsistent assumptions about what that value range is and apply inconsistent binding operators.


#### Nominal Type {#nominal-type}

A nominal type expression is used to identify types by name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what you mean by "nominal". Is this just a type alias, or does it introduce a fresh type? "Nominal typing" refers to the latter, but the description kinda suggests the former. Especially since true nominal types would not easily allow self-contained interface descriptions.


```
Signature => 'type' NominalType '=' Type
Signature => 'all' TypeVar .. TypeVar '.' NominalType '=' Type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you allow type parameters without restricting type recursion then type equality is undecidable.

Also, the all is odd here, since this is a lambda, not a quantifier.

@fgmccabe
Copy link
Contributor Author

fgmccabe commented Jul 23, 2019 via email

@rossberg
Copy link
Member

  1. The IDL types are abstract for a reason. In any given provision/use of
    an API the coercion operators are paired: a lifting operator and a lowering
    operator. It is the pair that is actually important; the abstract IDL in
    the middle is only there to act as a kind of discriminator: to limit the
    space of operators that have to be specified.

Anotherway of saying this is: if you have to have a lowering operator of
the form -to-i32, it makes no sense to elaborate this to
i32-to-i32, i8-to-i32 i64-to-32, i31-to-i32 and so on. Especially when you
will also have to have lifting operators of the form i32-as-i32, i32-as-i8
i32-as-i64 i32-as-i31 etc. It is simply easier to have a single concept of
integer and map to it and from it.

To be useful for programmers and tools alike, an IDL needs to enable expressing self-contained interface descriptions. With what you describe, looking at the interface of a module would not be enough -- to program against an interface correctly I'd need to know how it is implemented, i.e., what binding operators are used on the other end, so that I can match them up. That of course would miss the most important point of an IDL, which is to decouple interface from implementation.

  1. The motivation for generics in this spec come from a few different
    directions: the use of quantifiers to define scoping, the requirement to
    have collection types (think vector of points for a graphics API), the
    desire that no types are 'special', and the fact that it is easier to
    define them now. That does not imply that we need to fully elaborate them
    at the moment.

Oh, I know -- and agree with -- all the reasons for having polymorphism. And it should be included eventually. But you need to start with something simple, and in particular, you want to avoid getting ahead of the language, or you will likely create something incompatible.

  1. Exception handling is there because coercions can fail. Even simple
    coercions can fail (e.g., mathematically and physically, the idea of losing
    the most significant bits when coercing from 64 bits to 32 (or 32 to 31) is
    absurd); but more importantly, any coercions that affect memory can also
    fail.

Yes, but that does not necessarily imply the need for exception handling. It could just trap. Again, I understand that such handling may be desirable eventually. But as long as simple trapping is deemed "good enough" for Wasm itself, there is little reason why the binding layer needs to be special.

Furthermore, IMO, exceptions should be treated no differently than regular
return values: if a called function throws an exception, that exception may
also need to be coerced.

I am aware of your stance on this, but of course the IDL should match what will actually be provided by Wasm (which also happens to match most languages). Another example where it's preferable to not get ahead of the language.

(In general, I'd suggest to focus on a bindings "MVP". It often is tempting to put "all the features" into a proposal, but that doesn't tend to work well. Wasm itself is an example of incrementality by design. That often is painful, and it can be frustrating how long it takes us to add certain features to Wasm now although they had been planned from the beginning, but it improves chances of success.)

@jgravelle-google
Copy link
Contributor

Meta-level point:

but it improves chances of success

To elaborate on how I view "success", this means both 1) improve the odds that a proposal reaches consensus by minimizing moving parts (political success), and 2) incrementalism in design is insulation against mistakes, having real-world users helps validate that the design works (technical success)

I believe both successes are entirely crucial to most wasm spec-related things.


Not said to have an opinion on the current design (yet), just mentioning the strategic lens I'm thinking of here

@PoignardAzur
Copy link

  1. Exception handling is there because coercions can fail. Even simple
    coercions can fail (e.g., mathematically and physically, the idea of losing
    the most significant bits when coercing from 64 bits to 32 (or 32 to 31) is
    absurd); but more importantly, any coercions that affect memory can also
    fail.

To be blunt, the fact that these coercions can fail is a pretty good sign that the type system is a leaky abstraction.

For example, let's say you have

  • a type Integer that can be either i32 or i64
  • a module that exports a function foobar that takes an Integer

If foobar will throw/trap if you pass it an Integer that can't be converted to i32, then foobar doesn't actually take an Integer, it takes a i32, and it should announce that; the conversion/trapping should happen on the caller side.

I think that goes back to the discussion we had the other day, about how do we determine what is within the scope of snowman bindings, and what adds too much complexity. I think non-trivial coercions that introduce their own rules and can potentially fail should be the concern of compilers, not wasm hosts.

  1. The text representation is there to (a) hash out ideas in a more
    reasonable space than binary and (b) to allow people to write down APIs.
    However, there definitely is/will be a binary encoding that is a key part
    of the spec. We have already drafted a binary encoding for webIDL, but it
    looks like we are not going the webIDL route at the moment.

I'm sorry to be negative, but I really had a hard time reading the type notations; it might be that I'm not used enough to mathematical notations. Still, I would have found it easier to understand if the IDL had been something closer to eg the capnproto IDL.

@fgmccabe
Copy link
Contributor Author

fgmccabe commented Jul 23, 2019 via email

@PoignardAzur
Copy link

PoignardAzur commented Jul 24, 2019

On the other hand, having coercion expressions is also extremely useful:
for example, it allows minor differences between consumer and producer of
an API to be smoothed over. This facilitates reuse of code.

I don't think generality of the kind you're describing facilitates reuse of code.

Modular code is easiest to reuse when it follows design by contract. In particular, when writing a function's prototype / API, its parameter types should represent the function's preconditions.

A function shouldn't have to worry about its precondition being broken. Eg, if a function takes an enum of A, B and C, then it should assume that these are the only three values it's ever going to get; if a function takes a u8, it should assume it's only ever going to receive values from 0 to 255; if it takes a s8, from -128 to 127.

If the provider wants to give flexibility with the possibility of reporting an out
of range error, then they use a lowering operator that throws exceptions.
If the provider is happy truncating the original then they use a lowering
operator that truncates.

I don't think that kind of flexibility is desirable.

In general, out-of-range errors and other precondition breaks should be checked on the caller side, not the callee side.

In 99% of use cases, what you want in your IDL is to export

function someFunction : (arg1: u16, arg2: u16, arg3: string) -> (s32, s32)

On the caller side, the compiler will transcompile and/or import the IDL file, and match the above types to their equivalent in the caller's language (eg Rust's u16 or D's ushort or Java's unsigned short or C++'s uint16_t).

The compiler is the one that handles the conversion, and issues a warning / a runtime exception / whatever in case of narrowing conversions. The wasm host shouldn't worry about any of that.

@fgmccabe
Copy link
Contributor Author

fgmccabe commented Jul 24, 2019 via email

Copy link
Contributor

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, thanks for writing these drafts up, @fgmccabe.

My overall impression is that this seems to propose too much, and the snowman language seems too expressive for something that is ultimately only intended to lift Wasm values into IDL values and lower them back again. I agree with the others calling for something smaller and more "MVP".

Additionally, my understanding was that we are aiming to carve out a subset of Web IDL (or at least something that we can line up 1 to 1 with Web IDL constructs) so that we can easily meet our original goal: fast and direct access to DOM methods. This proposal has a bunch more stuff that Web IDL doesn't: not only quantified and existential types, but even just sum types don't exist in Web IDL. I'm concerned both with the amount of scope creep here, and that we are forgetting and failing our original goal as the scope creeps.

A few additional comments and questions inline below.

Thanks!

@@ -0,0 +1,881 @@
<!doctype html><html lang="en">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is just the output of running bikeshed on proposals/webidl-bindings/binary.bs, right?

convenient to support a binary representation of the webIDL.

This specification mirrors the official specification [[!webIDL]] in most
ways. However, the binary encoding may admit non-legal forms of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relationship between the snowman bindings language with Web IDL and the spec for a binary representation of Web IDL?

Type => ErrorType
Type => ParameterizedType
Type => BufferSourceType
Type => CompoundType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be de-duplicating / referring to compound types by index (like my experimental implementation does) to save on code size.

In fact, I can't find where Type transitively goes to some sort of reference to an already-defined type at all, so for example a FunctionType cannot take a previously defined dictionary type as a parameter under these current rules. Is some sort of a production like Type => ReferenceToPreviouslyDefinedType missing?

W-ProvisionLambda ::= '(' 'provide' W-Tuple W-ProvisionExpression ')'
```

#### Provision Lambda Type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that instead of

  • mapping tuples of Web IDL values into tuples of Wasm values (incoming binding maps in the explainer) and vice versa (outgoing binding maps in the explainer),
  • and then combining an incoming and outgoing binding map into either an import binding (with an outgoing binding map for arguments and incoming binding map for return values) or an export binding (with an incoming binding map for arguments and outgoing binding map for return values)

we now have

  • access lambdas and provision lambdas that contain both lifting and lowering operators,
  • but the lifting/lowering operators are separated from each other by which side (parameter or return) of the function whose values they are coercing
  • and we have ^ to splice nested lists into the current list

correct?

The original approach has two benefits, as I see it:

  1. The incoming and outgoing binding maps contain only lowering and lifting coercions respectively, while the access and provision lambdas contain heterogeneous expression trees of coercions. The lack of heterogeneity in the original allows for reusing opcodes across lifting and lowering coercions, which means that ultimately we can keep all the opcodes in the one-byte LEB128 space longer before reaching any two-byte LEB128s, which helps code size.

  2. The original incoming/outgoing binding map formulation didn't require list splicing operators. To convert into n values, you just provided n expressions and each operator evaluated to a single value. The new approach requires an "hourglass" shape and the ^ splicer since the function being bound exists as a single W-expression but can return many values.


The three arguments to `string-to-base-ptr` are the `String` value itself, the
base address of a block of linear memory -- together with the size of the
buffer. The returned result is the address of the utf8-encoded sequence of bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that these three arguments are runtime arguments, and none are immediates? Where is the buffer supposed to come from? Are we going to allow some subset of Wasm instructions inside access/provision lambdas so that they can call allocators themselves instead of building the concept of allocators into coercion operators?

If there isn't enough room for the incoming string in the given allocation buffer, what happens? Does this try calling an allocator function? Throw? Only write a substring into the buffer?

@fgmccabe
Copy link
Contributor Author

fgmccabe commented Jul 24, 2019 via email

Refactored variants to be part of a signature rather than being a union type.
Copy link
Member

@lukewagner lukewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the initial work! I'm not quite sure yet, but I think I like the generalization of having binding lambdas being able to contain 0..N calls to imports/exports and just using validation rules to ensure that the types line up. It does seem to add some more complexity to an optimizing trampoline compiler, but we can see how that manifests in practice.

I'd like to +1 and emphasize some comments that @rossberg has already made:

  • we shouldn't add anything to this proposal that will be coming to core wasm since (1) it absolutely must be designed to align with the core wasm feature, (2) if we do anything before the core wasm feature is stablized (or even proposed), it's very likely we'll likely make an alignment error. So in particular, I think these don't belong in the "MVP":
    • anything to do with exceptions (that should block on wasm exceptions)
    • nominal types (which should I think instead type imports and exports)
    • quantified types (those aren't even proposed yet)
  • we should have explicitly-signed integer types with fixed ranges. An infinite-range Integer type is more work, not strictly required for the initial set of use cases afaics and also introduces the need to have binding operators that specify (possibly configurable) out-of-range behavior.

Additionally:

  • I don't think there should be a special Resource<T> type bindings, core wasm reference types should be used instead.
  • If we can define a module type and its textual representation as part of core wasm (as discussed in design/#1289), then I don't think we need a separate IDL for bindings; it's sufficient just to reuse what's defined for core wasm, by simply allowing binding types in place of core wasm types. Thus, I'd very much prefer we didn't frame this proposal as defining a new IDL, but instead as defining a new set of types that can be used within the existing text format for defining module types (signatures).

@fgmccabe
Copy link
Contributor Author

fgmccabe commented Jul 24, 2019 via email

@PoignardAzur
Copy link

If we can define a module type and its textual representation as part of core wasm (as discussed in design/#1289), then I don't think we need a separate IDL for bindings; it's sufficient just to reuse what's defined for core wasm, by simply allowing binding types in place of core wasm types. Thus, I'd very much prefer we didn't frame this proposal as defining a new IDL, but instead as defining a new set of types that can be used within the existing text format for defining module types (signatures).

You mean use the existing S-expression syntax with new instruction names for new types?

@lukewagner
Copy link
Member

@PoignardAzur What's being discussed in design/#1289 is just describing a modules' type(i.e. "signature" or "interface"), and thus it only contains the types, not the binding expressions, which are implementation details of either side of the interface.

@lukewagner lukewagner closed this Sep 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants