Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions BinaryEncoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ A four-byte little endian unsigned integer.
### varint32
A [Signed LEB128](https://en.wikipedia.org/wiki/LEB128#Signed_LEB128) variable-length integer, limited to int32 values.

### varuint1
A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to the values 0 or 1. `varuint1` values may contain leading zeros. (This type is mainly used for compatibility with potential future extensions.)

### varuint32
A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. `varuint32` values may contain leading zeros.

Expand Down Expand Up @@ -75,8 +78,8 @@ The module starts with a preamble of two fields:

| Field | Type | Description |
| ----- | ----- | ----- |
| magic number | `uint32` | Magic number `0x6d736100` == `'\0asm'`. |
| version | `uint32` | Version number `11` == `0x0b`. The version for MVP will be reset to `1`. |
| magic number | `uint32` | Magic number `0x6d736100` (i.e., '\0asm') |
| version | `uint32` | Version number, currently 10. The version for MVP will be reset to 1. |

This preamble is followed by a sequence of sections. Each section is identified by an
immediate string. Sections whose identity is unknown to the WebAssembly
Expand Down Expand Up @@ -117,15 +120,19 @@ The type section declares all function signatures that will be used in the modul

| Field | Type | Description |
| ----- | ----- | ----- |
| count | `varuint32` | count of signature entries to follow |
| count | `varuint32` | count of type entries to follow |
| entries | `type_entry*` | repeated type entries as described below |

#### Type entry
| Field | Type | Description |
| ----- | ----- | ----- |
| form | `uint8` | `0x40`, indicating a function type |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep the disjointness of primitive and structural type constructors by giving the former monotonically increasing integers and the latter monotonically decreasing integers (starting by giving functions a form of -1). So same aesthetic preference for avoiding arbitrary-feeling statements like "noone will ever need more than 0x3f primitive type constructors", but different meaning for the negative index than in the previous comment. If so then, the type would be a varint32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with using var(u)int. But the type space potentially indexes 3 sorts of things: primitive constructors, structural constructors, type ids. If we want to use the signedness overlay trick, than it seems much more beneficial to reserve that for distinguishing between ids and constructors in the future, so that type id references could be single byte (in which case we probably need to assign negative numbers to all constructors now). WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the type space potentially indexes 3 sorts of things: primitive constructors, structural constructors, type ids

I didn't understand this part of the OP: why would we want to add the complication of "inlining" certain structural constructors instead of just using a type id?

If we want to use the signedness overlay trick, than it seems much more beneficial to reserve that for distinguishing
between ids and constructors in the future

Yes, if we can rule out the third case as I'm asking above then the encoding could be pretty simple: if positive, it's a pritimive, if negative, it's a (negated) type-id. But that'd be the encoding of a value type. form is the encoding of a different set: the set of compound constructors. So as observed earlier, it could completely overlap with both primitives and type-ids. I can see the argument for keeping the encoding of compound ctors disjoint from that of primitive ctors (so that you can represent the set of all constructors with an int), but that seems to work just fine with giving the compound ctors negative indices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand this part of the OP: why would we want to add the
complication of "inlining" certain structural constructors instead of just
using a type id?

E.g. to reduce the overhead of one-off uses of structural types. Or imagine
you want to emit a more complicated nested struct, then you wouldn't need
to separate out and name each level.

Perhaps the other way round is a more compelling scenario: you may want to
allow primitive constructors in type definitions. Or they'll start to mix
when we introduce type import/exports some day. Also, it's hard to predict
what other thing might come along and change the story (generics? who
knows).

I'm not suggesting that there currently is a concrete reason to join the
spaces. But it doesn't seem completely unlikely to arise later, and given
that the cost of keeping it an option is zero, why preclude it?

Yes, if we can rule out the third case as I'm asking above then the

encoding could be pretty simple: if positive, it's a pritimive, if
negative, it's a (negated) type-id.

I'd actually invert that scheme, to avoid the negation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't understand is why all those future new things can't just be new forms of entries in the types section such that a type-id is all you need?

I'd actually invert that scheme, to avoid the negation.

Wouldn't that give all the primitive types today (i32, etc) negative indices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@titzer, ha, I knew I shouldn't have mentioned generics. This is getting OT, but let me just say that I would like to avoid their complexity as much as the next guy, while I'm also aware that "our language doesn't need generics" have become famous last words of language/VM designers. Static compilation only works for fairly weak, second-class polymorphism; in more expressive cases (which all big languages but C++ support) you'd be forced to introduce unions and lots of expensive runtime checks. Maybe that's okay for Wasm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast, where do you see the downside of avoiding conflicts between the spaces until we
understand the future better?

I also don't understand what is being proposed in this PR to address this: superficially this is just a question of 0x40 vs. -1. What are you proposing happens for these new-kinds-of-types?

in more expressive cases (which all big languages but C++ support) you'd be forced to introduce unions
and lots of expensive runtime checks

Still OT, but: yes, for Java-style. For C#-style, though, I was assuming that a C#-on-wasm runtime would actually need to ship with its own runtime machinery to do runtime generation of wasm for instantiations that only show up at runtime since I'd be surprised if we could design a feature in wasm that wasn't overly specialized to C# but that could still handle the C# use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast, where do you see the downside of avoiding conflicts between
the spaces until we
understand the future better?

I also don't understand what is being proposed in this PR to address this:
superficially this is just a question of 0x40 vs. -1. What are you
proposing happens for these new-kinds-of-types?

Hm, I thought we just agreed that signedness is best reserved for
distinguishing type ids. So this PR avoids clobbering the opposite sign
space, and instead just picks an arbitrary opcode for the function type
that doesn't collide with the primitive ones. 0x40 just because it
partitions the positive 1-byte signed LEB value range into two equal
halves, reserving one side for nullary, the other for non-nullary
constructors (which may or may not make sense, I'm open to better
suggestions; there are probably going to be far more nullary constructors
than others, but either way the space is comfortably large AFAICT).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I thought we just agreed that signedness is best reserved for distinguishing type ids.

If non-nullary constructors don't show up in value types (only their type-ids), then both could have negative indices. But I guess the counterargument is: maybe not the 3 non-nullary ctors we're thinking about now (func, struct, array), but perhaps some new thing in the the future and if positive indices are "reserved" for nullary and negative is "reserved" for type-ids, then we're out of luck (or, at the very least, we'd have to do break the pattern). I guess I buy that, so 0x40 is fine. More than just the aesthetic -1 vs. 0x40 has been question of where this is going which I think I now have a better understanding of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, I came across a situation which supports the current design: in Wasm.Table we want to be able to declare the types of elements in the table definition/import. The abovementioned scheme for local types seems to fit (allowing table elements to have any type that you can put in a local, or some restriction thereof), but the question is how to say "any function". Well, since we already have this 0x40 "Function" constructor in the index space, that seems to be a good candidate (even if it's a slight abuse of logical category). Similarly, the "Struct" and "Array" constructors could mean "any struct type" / "any array type" which could make sense one day.

| param_count | `varuint32` | the number of parameters to the function |
| return_type | `value_type?` | the return type of the function, with `0` indicating no return type |
| param_types | `value_type*` | the parameter types of the function |
| return_count | `varuint1` | the number of results from the function |
| return_type | `value_type?` | the result type of the function (if return_count is 1) |

(Note: In the future, this section may contain other forms of type entries as well, which can be distinguished by the `form` field.)

### Import section

Expand Down Expand Up @@ -216,7 +223,7 @@ The start section declares the [start function](Modules.md#module-start-function

ID: `code`

The code section assigns a body to every function in the module.
The code section contains a body for every function in the module.
The count of function declared in the [function section](#function-section)
and function bodies defined in this section must be the same and the `i`th
declaration corresponds to the `i`th function body.
Expand All @@ -230,7 +237,7 @@ declaration corresponds to the `i`th function body.

ID: `data`

The data section declares the initialized data that should be loaded
The data section declares the initialized data that is loaded
into the linear memory.

| Field | Type | Description |
Expand Down
9 changes: 9 additions & 0 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -389,3 +389,12 @@ those that motivated the development of the

Even Knuth found it worthwhile to give us his opinion on this issue at point,
[a flame about 64-bit pointers](http://www-cs-faculty.stanford.edu/~uno/news08.html).

## Will I be able to access proprietary platform APIs (e.g. Android / iOS)?

Yes but it will depend on the _WebAssembly embedder_. Inside a browser you'll
get access to the same HTML5 and other browser-specific APIs which are also
accessible through regular JavaScript. However, if a wasm VM is provided as an
[“app execution platform”](NonWeb.md) by a specific vendor, it might provide
access to [proprietary platform-specific APIs](Portability.md#api) of e.g.
Android / iOS.
29 changes: 29 additions & 0 deletions Web.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,35 @@ WebAssembly's [modules](Modules.md) allow for natural [integration with
the ES6 module system](Modules.md#integration-with-es6-modules) and allow
synchronous calling to and from JavaScript.

### Function Names

A WebAssembly module imports and exports functions. WebAssembly names functions
using arbitrary-length byte sequences. Any 8-bit values are permitted in a
WebAssembly name, including the null byte and byte sequences that don't
correspond to any Unicode code point regardless of encoding. The most natural
Web representation of a mapping of function names to functions is a JS object
in which each function is a property. Property names in JS are UTF-16 encoded
strings. A WebAssembly module may fail validation on the Web if it imports or
exports functions whose names do not transcode cleanly to UTF-16 according to
the following conversion algorithm, assuming that the WebAssembly name is in a
`Uint8Array` called `array`:

```
function convertToJSString(array)
{
var string = "";
for (var i = 0; i < array.length; ++i)
string += String.fromCharCode(array[i]);
return decodeURIComponent(escape(string));
}
```

This performs the UTF8 decoding (`decodeURIComponent(unescape(string))`) using
a [common JS idiom](http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html).
Transcoding failure is detected by `decodeURIComponent`, which may throw
`URIError`. If it does, the WebAssembly module will not validate. This validation
rule is only mandatory for Web embedding.

## Aliasing linear memory from JS

If [allowed by the module](Modules.md#linear-memory-section), JavaScript can
Expand Down