Skip to content

Commit

Permalink
Binary 0xc (#811)
Browse files Browse the repository at this point in the history
* Clarify that wasm may be viewed as either an AST or a stack machine. (#686)

* Clarify that wasm may be viewed as either an AST or a stack machine.

* Reword the introductory paragraph.

* Add parens, remove "typed".

* Make opcode 0x00 `unreachable`. (#684)

Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode.

All-zeros is one of the more common patterns of corrupted data. This
change makes it more likely that code that is accidentally zeroed, in
whole or in part, will be noticed when executed rather than silently
running through a nop slide.

Obviously, this doesn't matter when an opcode table is present, but
if there is a default opcode table, it would presumably use the
opcodes defined here.

* BinaryEncoding.md changes implied by #682

* Fix thinko in import section

* Rename definition_kind to external_kind for precision

* Rename resizable_definition to resizable_limits

* Add  opcode delimiter to init_expr

* Add Elem section to ToC and move it before Data section to reflect Table going before Memory

* Add missing init_expr to global variables and undo the grouped representation of globals

* Note that only immutable globals can be exported

* Change the other 'mutability' flag to 'varuint1'

* Give 'anyfunc' its own opcode

* Add note about immutable global import requirement

* Remove explicit 'default' flag; make memory/table default by default

* Change (get|set)_global opcodes

* Add end opcode to functions

* Use section codes instead of section names

(rebasing onto 0xC instead of master)

This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder.
It allows for user-defined sections that have string names to be encoded in the same manner as before.
The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being.

* Use LEB for br_table (#738)

* Describe operand order of call_indirect (#758)

* Remove arities from call/return (#748)

* Limit varint sizes in Binary Encoding. (#764)

* Global section (#771)

global-variable was a broken anchor and the type of count was an undefined reference and inconsistent with all the rest of the sections.

* Make name section a user-string section.

* Update BinaryEncoding.md

* Update BinaryEncoding.md

* Use positive section code byte

* Remove specification of name strings for unknown sections

* Update BinaryEncoding.md

* Remove repetition in definition of var(u)int types (#768)

* Fix typo (#781)

* Move the element section before the code section (#779)

* Binary format identifier is out of date (#785)

* Update BinaryEncoding.md to reflect the ml-proto encoding of the memory and table sections. (#800)

* Add string back

* Block signatures (#765)

* Replace branch arities with block and if signatures.

Moving arities to blocks has the nice property of giving implementations
useful information up front, however some anticipated uses of this
information would really want to know the types up front too.

This patch proposes replacing block arities with function signature indices,
which would provide full type information about a block up front.

* Remove the arity operand from br_table too.

* Remove mentions of "arguments".

* Make string part of the payload

* Remove references to post-order AST in BinaryEncoding.md (#801)

* Simplify loop by removing its exit label.

This removes loop's bottom label.

* Move description of `return` to correct column (#804)

* type correction and missing close quote (#805)

* Remove more references to AST (#806)

* Remove reference to AST in JS.md

Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now.

* Use "instruction" instead of "AST operator"

* Update rationale for stack machine

* Update Rationale.md

* Update discussion of expression trees

* Update MVP.md

* Update Rationale.md

* Update Rationale.md

* Remove references to expressions

* Update Rationale.md

* Update Rationale.md

* Address review comments

* Address review comments

* Address review comments

* Delete h
  • Loading branch information
titzer authored Sep 29, 2016
1 parent be45b5f commit 453320e
Show file tree
Hide file tree
Showing 7 changed files with 308 additions and 223 deletions.
36 changes: 21 additions & 15 deletions AstSemantics.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
# Abstract Syntax Tree Semantics

WebAssembly code is represented as an Abstract Syntax Tree (AST) where each node
represents an expression. Each function body consists of a list of expressions.
All expressions and operators are typed, with no implicit conversions, subtyping, or overloading rules.
This document describes WebAssembly semantics. The description here is written
in terms of an Abstract Syntax Tree (AST), however it is also possible to
understand WebAssembly semantics in terms of a stack machine. (In practice,
implementations need not build an actual AST or maintain an actual stack; they
need only behave [as if](https://en.wikipedia.org/wiki/As-if_rule) they did so.)

This document explains the high-level design of the AST: its types, constructs, and
semantics. For full details consult [the formal Specification](https://github.com/WebAssembly/spec),
for file-level encoding details consult [Binary Encoding](BinaryEncoding.md),
and for the human-readable text representation consult [Text Format](TextFormat.md).

Each function body consists of a list of expressions. All expressions and
operators are typed, with no implicit conversions or overloading rules.

Verification of WebAssembly code requires only a single pass with constant-time
type checking and well-formedness checking.

Expand Down Expand Up @@ -104,10 +109,9 @@ default linear memories but [new memory operators](FutureFeatures.md#multiple-ta
may be added after the MVP which can also access non-default memories.

Linear memories (default or otherwise) can either be [imported](Modules.md#imports)
or [defined inside the module](Modules.md#linear-memory-section), with defaultness
indicated by a flag on the import or definition. After import or definition,
there is no difference when accessing a linear memory whether it was imported or
defined internally.
or [defined inside the module](Modules.md#linear-memory-section). After import
or definition, there is no difference when accessing a linear memory whether it
was imported or defined internally.

In the MVP, linear memory cannot be shared between threads of execution.
The addition of [threads](PostMVP.md#threads) will allow this.
Expand Down Expand Up @@ -267,10 +271,9 @@ host environment.
Every WebAssembly [instance](Modules.md) has one specially-designated *default*
table which is indexed by [`call_indirect`](#calls) and other future
table operators. Tables can either be [imported](Modules.md#imports) or
[defined inside the module](Modules.md#table-section), with defaultness
indicated by a flag on the import or definition. After import or definition,
there is no difference when calling into a table whether it was imported or
defined internally.
[defined inside the module](Modules.md#table-section). After import or
definition, there is no difference when calling into a table whether it was
imported or defined internally.

In the MVP, the primary purpose of tables is to implement indirect function
calls in C/C++ using an integer index as the pointer-to-function and the table
Expand Down Expand Up @@ -345,9 +348,12 @@ a value and may appear as children of other expressions.
### Branches and nesting

The `br` and `br_if` constructs express low-level branching.
Branches may only reference labels defined by an outer *enclosing construct*.
This means that, for example, references to a `block`'s label can only occur
within the `block`'s body.
Branches may only reference labels defined by an outer *enclosing construct*,
which can be a `block` (with a label at the `end`), `loop` (with a label at the
beginning), `if` (with a label at the `end` or `else`), `else` (with a label at
the `end`), or the function body (with a label at the `end`). This means that,
for example, references to a `block`'s label can only occur within the
`block`'s body.

In practice, outer `block`s can be used to place labels for any given branching
pattern, except for one restriction: one can't branch into the middle of a loop
Expand All @@ -367,7 +373,7 @@ The `nop`, `br`, `br_if`, `br_table`, and `return` constructs do not yield value
Other control constructs may yield values if their subexpressions yield values:

* `block`: yields either the value of the last expression in the block or the result of an inner branch that targeted the label of the block
* `loop`: yields either the value of the last expression in the loop or the result of an inner branch that targeted the end label of the loop
* `loop`: yields the value of the last expression in the loop
* `if`: yields either the value of the last *then* expression or the last *else* expression or the result of an inner branch that targeted the label of one of these.

In all constructs containing block-like sequences of expressions, all expressions but the last must not yield a value.
Expand Down
Loading

0 comments on commit 453320e

Please sign in to comment.