diff --git a/BinaryEncoding.md b/BinaryEncoding.md index d6fdbdcb..5b154211 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -50,7 +50,7 @@ represented by _at most_ ceil(_N_/7) bytes that may contain padding `0x80` or `0 Note: Currently, the only sizes used are `varint32` and `varint64`. ### `value_type` -A single-byte unsigned integer indicating a [value type](AstSemantics.md#types). These types are encoded as: +A single-byte unsigned integer indicating a [value type](Semantics.md#types). These types are encoded as: * `1` indicating type `i32` * `2` indicating type `i64` * `3` indicating type `f32` @@ -73,7 +73,7 @@ A single-byte unsigned integer indicating the kind of definition being imported ### `resizable_limits` A packed tuple that describes the limits of a -[table](AstSemantics.md#table) or [memory](AstSemantics.md#resizing): +[table](Semantics.md#table) or [memory](Semantics.md#resizing): | Field | Type | Description | | ----- | ----- | ----- | @@ -192,7 +192,7 @@ or, if the `kind` is `Table`: | Field | Type | Description | | ----- | ---- | ----------- | -| element_type | `varuint7` | `0x20`, indicating [`anyfunc`](AstSemantics.md#table) | +| element_type | `varuint7` | `0x20`, indicating [`anyfunc`](Semantics.md#table) | | | `resizable_limits` | see [above](#resizable_limits) | or, if the `kind` is `Memory`: @@ -229,7 +229,7 @@ The encoding of a [Table section](Modules.md#table-section): | Field | Type | Description | | ----- | ---- | ----------- | -| element_type | `varuint7` | `0x20`, indicating [`anyfunc`](AstSemantics.md#table) | +| element_type | `varuint7` | `0x20`, indicating [`anyfunc`](Semantics.md#table) | | | `resizable_limits` | see [above](#resizable_limits) | In the MVP, the number of tables must be no more than 1. @@ -250,7 +250,7 @@ The encoding of a [Memory section](Modules.md#linear-memory-section): | | `resizable_limits` | see [above](#resizable_limits) | Note that the initial/maximum fields are specified in units of -[WebAssembly pages](AstSemantics.md#linear-memory). +[WebAssembly pages](Semantics.md#linear-memory). In the MVP, the number of memories must be no more than 1. @@ -399,7 +399,7 @@ count may be greater or less than the actual number of locals. # Function Bodies Function bodies consist of a sequence of local variable declarations followed by -[bytecode instructions](AstSemantics.md). Each function body must end with the `end` opcode. +[bytecode instructions](Semantics.md). Each function body must end with the `end` opcode. | Field | Type | Description | | ----- | ---- | ----------- | @@ -420,7 +420,7 @@ It is legal to have several entries with the same type. | type | `value_type` | type of the variables | -## Control flow operators ([described here](AstSemantics.md#control-flow-structures)) +## Control flow operators ([described here](Semantics.md#control-flow-structures)) | Name | Opcode | Immediates | Description | | ---- | ---- | ---- | ---- | @@ -454,7 +454,7 @@ The `br_table` operator implements an indirect branch. It accepts an optional va branches to the block or loop at the given offset within the `target_table`. If the input value is out of range, `br_table` branches to the default target. -## Basic operators ([described here](AstSemantics.md#constants)) +## Basic operators ([described here](Semantics.md#constants)) | Name | Opcode | Immediates | Description | | ---- | ---- | ---- | ---- | @@ -472,7 +472,7 @@ out of range, `br_table` branches to the default target. The `call_indirect` operator takes a list of function arguments and as the last operand the index into the table. -## Memory-related operators ([described here](AstSemantics.md#linear-memory-accesses)) +## Memory-related operators ([described here](Semantics.md#linear-memory-accesses)) | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | @@ -515,7 +515,7 @@ natural alignment. The bits after the `log(memory-access-size)` least-significant bits must be set to 0. These bits are reserved for future use (e.g., for shared memory ordering requirements). -## Simple operators ([described here](AstSemantics.md#32-bit-integer-operators)) +## Simple operators ([described here](Semantics.md#32-bit-integer-operators)) | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | diff --git a/CAndC++.md b/CAndC++.md index 236672ca..76866e2c 100644 --- a/CAndC++.md +++ b/CAndC++.md @@ -88,7 +88,7 @@ optimizers still assume that undefined behavior won't occur, so such bugs can still lead to surprising behavior. For example, while unaligned memory access is -[fully defined](AstSemantics.md#alignment) in WebAssembly, C and C++ compilers +[fully defined](Semantics.md#alignment) in WebAssembly, C and C++ compilers make no guarantee that a (non-packed) unaligned memory access at the source level is harmlessly translated into an unaligned memory access in WebAssembly. And in practice, popular C and C++ compilers do optimize on the assumption that @@ -116,7 +116,7 @@ rather than on the underlying platform. For those details that are dependent on the platform, on WebAssembly they follow naturally from having 8-bit bytes, 32-bit and 64-bit two's complement integers, and [32-bit and 64-bit IEEE-754-2008-style floating point support] -(AstSemantics.md#floating-point-operators). +(Semantics.md#floating-point-operators). ## Portability of compiled code diff --git a/DynamicLinking.md b/DynamicLinking.md index aec31e9a..c4329f8e 100644 --- a/DynamicLinking.md +++ b/DynamicLinking.md @@ -2,8 +2,8 @@ WebAssembly enables load-time and run-time (`dlopen`) dynamic linking in the MVP by having multiple [instantiated modules](Modules.md) -share functions, [linear memories](AstSemantics.md#linear-memory), -[tables](AstSemantics.md#table) and [constants](AstSemantics.md#constants) +share functions, [linear memories](Semantics.md#linear-memory), +[tables](Semantics.md#table) and [constants](Semantics.md#constants) using module [imports](Modules.md#imports) and [exports](Modules.md#exports). In particular, since all (non-local) state that a module can access can be imported and exported and thus shared between separate modules' instances, toolchains diff --git a/FAQ.md b/FAQ.md index e44a595e..562cffdc 100644 --- a/FAQ.md +++ b/FAQ.md @@ -310,7 +310,7 @@ syscall in POSIX, WebAssembly unpacks this functionality into multiple operators: * the MVP starts with the ability to grow linear memory via a - [`grow_memory`](AstSemantics.md#resizing) operator; + [`grow_memory`](Semantics.md#resizing) operator; * proposed [future features](FutureFeatures.md#finer-grained-control-over-memory) would allow the application to change the protection and mappings for pages in the diff --git a/FutureFeatures.md b/FutureFeatures.md index 3f373bd8..77ff0af8 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -33,7 +33,7 @@ Provide access to safe OS-provided functionality including: performing these operators in sequence. The `addr` and `length` parameters above would be required to be multiples of -[`page_size`](AstSemantics.md#resizing). +[`page_size`](Semantics.md#resizing). The `mprotect` operator would require hardware memory protection to execute efficiently and thus may be added as an "optional" feature (requiring a @@ -51,7 +51,7 @@ can allocate noncontiguous virtual address ranges. See the Some platforms offer support for memory pages as large as 16GiB, which can improve the efficiency of memory management in some situations. WebAssembly -may offer programs the option to specify a larger page size than the [default] (AstSemantics.md#resizing). +may offer programs the option to specify a larger page size than the [default] (Semantics.md#resizing). ## More expressive control flow @@ -139,7 +139,7 @@ Useful properties of signature-restricted PTCs: General-purpose Proper Tail Calls would have no signature restrictions, and therefore be more broadly usable than -[Signature-restricted Proper Tail Calls](AstSemantics.md#signature-restricted-proper-tail-calls), +[Signature-restricted Proper Tail Calls](Semantics.md#signature-restricted-proper-tail-calls), though there would be some different performance characteristics. ## Asynchronous Signals @@ -301,7 +301,7 @@ quadruple precision. WebAssembly floating point conforms IEEE 754-2008 in most respects, but there are a few areas that are -[not yet covered](AstSemantics.md#floating-point-operators). +[not yet covered](Semantics.md#floating-point-operators). To support exceptions and alternate rounding modes, one option is to define an alternate form for each of `add`, `sub`, `mul`, `div`, `sqrt`, and `fma`. These @@ -382,7 +382,7 @@ pass was otherwise necessary. In the MVP, there are no global variables; C/C++ global variables are stored in linear memory and thus accessed through normal -[linear memory operators](AstSemantics.md#linear-memory-operators). +[linear memory operators](Semantics.md#linear-memory-operators). [Dynamic linking](DynamicLinking.md) will add some form of immutable global variable analogous to "symbols" in native binaries. In some cases, though, it may be useful to have a fully mutable global variable which lives outside @@ -437,7 +437,7 @@ since opaque, could be implemented as a raw function pointer). ## More Table Operators and Types In the MVP, WebAssembly has limited functionality for operating on -[tables](AstSemantics.md#table) and the host-environment can do much more (e.g., +[tables](Semantics.md#table) and the host-environment can do much more (e.g., see [JavaScript's `WebAssembly.Table` API](JS.md#webassemblytable-objects)). It would be useful to be able to do everything from within WebAssembly so, e.g., it was possible to write a WebAssembly dynamic loader in WebAssembly. As a diff --git a/GC.md b/GC.md index f15caae4..d3330ad8 100644 --- a/GC.md +++ b/GC.md @@ -87,7 +87,7 @@ signatures. In particular: would map to exported [opaque reference types](GC.md#opaque-reference-types); * methods of WebIDL interfaces would map to exported functions where the receiver was translated into an explicit argument and WebIDL value - types were mapped to appropriate [value types](AstSemantics.md#types) + types were mapped to appropriate [value types](Semantics.md#types) (e.g., [bindTexture](https://www.khronos.org/registry/webgl/specs/latest/1.0/#5.14) would translate to `void (WebGLRenderingContextBase, int32, WebGLTexture?)`). diff --git a/JS.md b/JS.md index a08d9f52..4e5d84c0 100644 --- a/JS.md +++ b/JS.md @@ -15,7 +15,7 @@ as defined below and will be removed at some point in the future.* ## Traps -Whenever WebAssembly semantics specify a [trap](AstSemantics.md#traps), +Whenever WebAssembly semantics specify a [trap](Semantics.md#traps), a `WebAssembly.RuntimeError` object is thrown. WebAssembly code (currently) has no way to catch this exception and thus the exception will necessarily propagate to the enclosing non-WebAssembly caller (either the browser or @@ -342,7 +342,7 @@ call one with the `new` operator. ## `WebAssembly.Memory` Objects -A `WebAssembly.Memory` object contains a single [linear memory](AstSemantics.md#linear-memory) +A `WebAssembly.Memory` object contains a single [linear memory](Semantics.md#linear-memory) which can be simultaneously referenced by multiple `Instance` objects. Each `Memory` object has two internal slots: * [[Memory]] : a [`Memory.memory`](https://github.com/WebAssembly/spec/blob/master/ml-proto/spec/memory.mli) @@ -405,7 +405,7 @@ is thrown. Let `d` be [`ToNonWrappingUint32`](#tononwrappinguint32)(`delta`). Let `ret` be the result of performing a -[`grow_memory`](AstSemantics.md#resizing) operation given delta `d`. +[`grow_memory`](Semantics.md#resizing) operation given delta `d`. If `ret` is `-1`, a `WebAssembly.RuntimeError` is thrown. @@ -429,7 +429,7 @@ is thrown. Otherwise return `M.[[BufferObject]]`. ## `WebAssembly.Table` Objects -A `WebAssembly.Table` object contains a single [table](AstSemantics.md#table) +A `WebAssembly.Table` object contains a single [table](Semantics.md#table) which can be simultaneously referenced by multiple `Instance` objects. Each `Table` object has two internal slots: * [[Table]] : a [`Table.table`](https://github.com/WebAssembly/spec/blob/master/ml-proto/spec/table.mli) diff --git a/MVP.md b/MVP.md index 38ea1677..0bc38a11 100644 --- a/MVP.md +++ b/MVP.md @@ -13,7 +13,7 @@ documents: * The distributable, loadable and executable unit of code in WebAssembly is called a [module](Modules.md). * The behavior of WebAssembly code in a module is specified in terms of - [instructions](AstSemantics.md) for a structured stack machine. + [instructions](Semantics.md) for a structured stack machine. * The WebAssembly binary format, which is designed to be natively decoded by WebAssembly implementations, is specified as a [binary encoding](BinaryEncoding.md) of a module's structure and code. diff --git a/Modules.md b/Modules.md index a3c4c60d..6da31b30 100644 --- a/Modules.md +++ b/Modules.md @@ -31,13 +31,13 @@ various operators and section fields in the module: A module can declare a sequence of **imports** which are provided, at instantiation time, by the host environment. There are several kinds of imports: * **function imports**, which can be called inside the module by the - [`call`](AstSemantics.md#calls) operator; + [`call`](Semantics.md#calls) operator; * **global imports**, which can be accessed inside the module by the - [global operators](AstSemantics.md#global-variables); + [global operators](Semantics.md#global-variables); * **linear memory imports**, which can be accessed inside the module by the - [memory operators](AstSemantics.md#linear-memory); and + [memory operators](Semantics.md#linear-memory); and * **table imports**, which can be accessed inside the module by - [call_indirect](AstSemantics.md#calls) and other + [call_indirect](Semantics.md#calls) and other table operators in the [future](FutureFeatures.md#more-table-operators-and-types). @@ -73,7 +73,7 @@ maximum length *less-or-equal* than the maximum length declared in the import. This ensures that separate compilation can assume: memory accesses below the declared initial length are always in-bounds, accesses above the declared maximum length are always out-of-bounds and if initial equals maximum, the -length is fixed. In the MVP, every memory is a [default memory](AstSemantics.md#linear-memory) +length is fixed. In the MVP, every memory is a [default memory](Semantics.md#linear-memory) and thus there may be at most one linear memory import or linear memory section. @@ -82,7 +82,7 @@ A *table import* includes the same set of fields defined in the length* and optional *maximum length*. As with the linear memory section, the host environment must ensure only WebAssembly tables are imported with exactly-matching element type, greater-or-equal initial length, and -less-or-equal maximum length. In the MVP, every table is a [default table](AstSemantics.md#table) +less-or-equal maximum length. In the MVP, every table is a [default table](Semantics.md#table) and thus there may be at most one table import or table section. Since the WebAssembly spec does not define how import names are interpreted: @@ -157,7 +157,7 @@ interchangeable with ES6 modules (ignoring [GC/Web API](FutureFeatures.md#gc/dom-integration) signature restrictions of the WebAssembly MVP) and thus it should be natural to compose a single application from both kinds of code. This goal motivates the -[semantic design](AstSemantics.md#linear-memory) of giving each WebAssembly +[semantic design](Semantics.md#linear-memory) of giving each WebAssembly module its own disjoint linear memory. Otherwise, if all modules shared a single linear memory (all modules with the same realm? origin? window?—even the scope of "all" is a nuanced question), a single app using multiple @@ -199,24 +199,24 @@ A module can: ## Global section The *global section* provides an internal definition of zero or more -[global variables](AstSemantics.md#global-variables). +[global variables](Semantics.md#global-variables). Each global variable internal definition declares its *type* -(a [value type](AstSemantics.md#types)), *mutability* (boolean flag) and +(a [value type](Semantics.md#types)), *mutability* (boolean flag) and *initializer* (an [initializer expression](#initializer-expression)). ## Linear memory section The *linear memory section* provides an internal definition of one -[linear memory](AstSemantics.md#linear-memory). In the MVP, every memory is a +[linear memory](Semantics.md#linear-memory). In the MVP, every memory is a default memory and thus there may be at most one linear memory import or linear memory section. -Each linear memory section declares an *initial* [memory size](AstSemantics.md#linear-memory) -(which may be subsequently increased by [`grow_memory`](AstSemantics.md#resizing)) and an +Each linear memory section declares an *initial* [memory size](Semantics.md#linear-memory) +(which may be subsequently increased by [`grow_memory`](Semantics.md#resizing)) and an optional *maximum memory size*. -[`grow_memory`](AstSemantics.md#resizing) is guaranteed to fail if attempting to +[`grow_memory`](Semantics.md#resizing) is guaranteed to fail if attempting to grow past the declared maximum. When declared, implementations *should* (non-normative) attempt to reserve virtual memory up to the maximum size. While failure to allocate the *initial* memory size is a runtime error, failure to @@ -237,7 +237,7 @@ value (defining the length of the given segment). The `offset` is an ## Table section The *table section* contains zero or more definitions of distinct -[tables](AstSemantics.md#table). In the MVP, every table is a +[tables](Semantics.md#table). In the MVP, every table is a default table and thus there may be at most one table import or table section. Each table definition declares an *element type*, *initial length*, and @@ -294,7 +294,7 @@ function definitions, assigning monotonically-increasing indices based on the order of definition in the module (as defined by the [binary encoding](BinaryEncoding.md)). The function index space is used by: -* [calls](AstSemantics.md#calls), to identify the callee of a direct call +* [calls](Semantics.md#calls), to identify the callee of a direct call ## Global Index Space @@ -303,7 +303,7 @@ global definitions, assigning monotonically-increasing indices based on the order of definition in the module (as defined by the [binary encoding](BinaryEncoding.md)). The global index space is used by: -* [global variable access operators](AstSemantics.md#global-variables), to +* [global variable access operators](Semantics.md#global-variables), to identify the global variable to read/write * [data segments](#data-section), to define the offset of a data segment (in linear memory) as the value of a global variable @@ -347,7 +347,7 @@ expressions. In the MVP, to keep things simple while still supporting the basic needs of [dynamic linking](DynamicLinking.md), initializer expressions are restricted to the following nullary operators: - * the four [constant operators](AstSemantics.md#constants); and + * the four [constant operators](Semantics.md#constants); and * `get_global`, where the global index must refer to an immutable import. In the future, operators like `i32.add` could be added to allow more expressive diff --git a/Portability.md b/Portability.md index 5580499c..57315894 100644 --- a/Portability.md +++ b/Portability.md @@ -27,7 +27,7 @@ characteristics: emulation thereof. * Two's complement signed integers in 32 bits and optionally 64 bits. * IEEE 754-2008 32-bit and 64-bit floating point, except for - [a few exceptions](AstSemantics.md#floating-point-operators). + [a few exceptions](Semantics.md#floating-point-operators). * Little-endian byte ordering. * Memory regions which can be efficiently addressed with 32-bit pointers or indices. diff --git a/README.md b/README.md index f75d93ea..fcf6416f 100644 --- a/README.md +++ b/README.md @@ -10,9 +10,9 @@ WebAssembly or wasm is a new, portable, size- and load-time-efficient format sui WebAssembly is currently being designed as an open standard by a [W3C Community Group](https://www.w3.org/community/webassembly/) that includes representatives from all major browsers. *Expect the contents of this repository to be in flux: everything is still under discussion.* -- **WebAssembly is efficient and fast**: The wasm [AST](AstSemantics.md) is designed to be encoded in a size- and load-time-efficient [binary format](BinaryEncoding.md). WebAssembly aims to execute at native speed by taking advantage of [common hardware capabilities](Portability.md#assumptions-for-efficient-execution) available on a wide range of platforms. +- **WebAssembly is efficient and fast**: Wasm [bytecode](Semantics.md) is designed to be encoded in a size- and load-time-efficient [binary format](BinaryEncoding.md). WebAssembly aims to execute at native speed by taking advantage of [common hardware capabilities](Portability.md#assumptions-for-efficient-execution) available on a wide range of platforms. -- **WebAssembly is safe**: WebAssembly describes a [memory-safe](Security.md#memory-safety), sandboxed [execution environment](AstSemantics.md#linear-memory) that may even be implemented inside existing JavaScript virtual machines. When [embedded in the web](Web.md), WebAssembly will enforce the same-origin and permissions security policies of the browser. +- **WebAssembly is safe**: WebAssembly describes a [memory-safe](Security.md#memory-safety), sandboxed [execution environment](Semantics.md#linear-memory) that may even be implemented inside existing JavaScript virtual machines. When [embedded in the web](Web.md), WebAssembly will enforce the same-origin and permissions security policies of the browser. - **WebAssembly is open and debuggable**: WebAssembly is designed to be pretty-printed in a [textual format](TextFormat.md) for debugging, testing, experimenting, optimizing, learning, teaching, and writing programs by hand. The textual format will be used when [viewing the source](FAQ.md#will-webassembly-support-view-source-on-the-web) of wasm modules on the web. diff --git a/Rationale.md b/Rationale.md index fd194bd0..fd64975e 100644 --- a/Rationale.md +++ b/Rationale.md @@ -10,14 +10,14 @@ ergonomics, portability, performance, security, and Getting Things Done. WebAssembly was designed incrementally, with multiple implementations being pursued concurrently. As the MVP stabilizes and we get experience from real-world codebases, we'll revisit the alternatives listed below, reevaluate the tradeoffs -and update the [design](AstSemantics.md) before the MVP is finalized. +and update the [design](Semantics.md) before the MVP is finalized. ## Why a stack machine? Why not an AST, or a register- or SSA-based bytecode? -* We started with an AST and generalized to a [structured stack machine](AstSemantics.md). ASTs allow a +* We started with an AST and generalized to a [structured stack machine](Semantics.md). ASTs allow a dense encoding and efficient decoding, compilation, and interpretation. The structured stack machine of WebAssembly is a generalization of ASTs allowed in previous versions while allowing efficiency gains in interpretation and baseline compilation, as well as a straightforward @@ -43,7 +43,7 @@ addition of multiple return values from control flow constructs and function cal ## Basic Types Only -WebAssembly only represents [a few types](AstSemantics.md#Types). +WebAssembly only represents [a few types](Semantics.md#Types). * More complex types can be formed from these basic types. It's up to the source language compiler to express its own types in terms of the basic machine @@ -69,7 +69,7 @@ WebAssembly only represents [a few types](AstSemantics.md#Types). ## Load/Store Addressing Load/store instructions include an immediate offset used for -[addressing](AstSemantics.md#Addressing). This is intended to simplify folding +[addressing](Semantics.md#Addressing). This is intended to simplify folding of offsets into complex address modes in hardware, and to simplify bounds checking optimizations. It offloads some of the optimization work to the compiler that targets WebAssembly, executing on the developer's machine, instead @@ -79,7 +79,7 @@ of performing that work in the WebAssembly compiler on the user's machine. ## Alignment Hints Load/store instructions contain -[alignment hints](AstSemantics.md#Alignment). This makes it easier to generate +[alignment hints](Semantics.md#Alignment). This makes it easier to generate efficient code on certain hardware architectures. Either tooling or an explicit opt-in "debug mode" in the spec could allow @@ -91,7 +91,7 @@ why it isn't the specified default. ## Out of Bounds The ideal semantics is for -[out-of-bounds accesses](AstSemantics.md#Out-of-Bounds) to trap, but the +[out-of-bounds accesses](Semantics.md#Out-of-Bounds) to trap, but the implications are not yet fully clear. There are several possible variations on this design being discussed and diff --git a/Security.md b/Security.md index 6b59f452..f3a1890a 100644 --- a/Security.md +++ b/Security.md @@ -30,7 +30,7 @@ at load time, even when [dynamic linking](DynamicLinking.md) is used. This allows implicit enforcement of [control-flow integrity][] (CFI) through structured control-flow. Since compiled code is immutable and not observable at runtime, WebAssembly programs are protected from control flow hijacking attacks. - * [Function calls](AstSemantics.md#calls) must specify the index of a target + * [Function calls](Semantics.md#calls) must specify the index of a target that corresponds to a valid entry in the [function index space](Modules.md#function-index-space) or [table index space](Modules.md#table-index-space). @@ -39,19 +39,19 @@ runtime, WebAssembly programs are protected from control flow hijacking attacks. function must match the type signature specified at the call site. * A shadow stack is used to maintain a trusted call stack that is invulnerable to buffer overflows in the module heap, ensuring safe function returns. - * [Branches](AstSemantics.md#branches-and-nesting) must point to valid + * [Branches](Semantics.md#branches-and-nesting) must point to valid destinations within the enclosing function. Variables in C/C++ can be lowered to two different primitives in WebAssembly, -depending on their scope. [Local variables](AstSemantics.md#local-variables) -with fixed scope and [global variables](AstSemantics.md#global-variables) are +depending on their scope. [Local variables](Semantics.md#local-variables) +with fixed scope and [global variables](Semantics.md#global-variables) are represented as fixed-type values stored by index. The former are initialized to zero by default and are stored in the protected shadow stack, whereas the latter are located in the [global index space](Modules.md#global-index-space) and can be imported from external modules. Local variables with [unclear static scope](Rationale.md#locals) (e.g. are used by the address-of operator, or are of type `struct` and returned by value) are stored in a separate -user-addressable stack in [linear memory](AstSemantics.md#linear-memory) at +user-addressable stack in [linear memory](Semantics.md#linear-memory) at compile time. This is an isolated memory region with fixed maximum size that is zero initialized by default. References to this memory are computed with infinite precision to avoid wrapping and simplify bounds checking. In the future, @@ -59,7 +59,7 @@ support for [multiple linear memory sections](Modules.md#linear-memory-section) [finer-grained memory operations](FutureFeatures.md#finer-grained-control-over-memory) (e.g. shared memory, page protection, large pages, etc.) will be implemented. -[Traps](AstSemantics.md#traps) are used to immediately terminate execution and +[Traps](Semantics.md#traps) are used to immediately terminate execution and signal abnormal behavior to the execution environment. In a browser, this is represented as a JavaScript exception. Support for [module-defined trap handlers](FutureFeatures.md#trappingor-non-trapping-strategies) diff --git a/AstSemantics.md b/Semantics.md similarity index 84% rename from AstSemantics.md rename to Semantics.md index 64be0f08..75d3276c 100644 --- a/AstSemantics.md +++ b/Semantics.md @@ -1,44 +1,36 @@ -# Abstract Syntax Tree Semantics - -This document describes WebAssembly semantics. The description here is written -in terms of an Abstract Syntax Tree (AST), however it is also possible to -understand WebAssembly semantics in terms of a stack machine. (In practice, -implementations need not build an actual AST or maintain an actual stack; they -need only behave [as if](https://en.wikipedia.org/wiki/As-if_rule) they did so.) - -This document explains the high-level design of the AST: its types, constructs, and -semantics. For full details consult [the formal Specification](https://github.com/WebAssembly/spec), +# Semantics + +This document explains the high-level design of WebAssembly code: its types, constructs, and +semantics. +WebAssembly code can be considered a *structured stack machine*; a machine where most computations use a stack +of values, but control flow is expressed in structured constructs such as blocks, ifs, and loops. +In practice, implementations need not maintain an actual value stack, nor actual data structures for control; they +need only behave [as if](https://en.wikipedia.org/wiki/As-if_rule) they did so. +For full details consult [the formal Specification](https://github.com/WebAssembly/spec), for file-level encoding details consult [Binary Encoding](BinaryEncoding.md), and for the human-readable text representation consult [Text Format](TextFormat.md). -Each function body consists of a list of expressions. All expressions and -operators are typed, with no implicit conversions or overloading rules. - +Each function body consists of a list of instructions which forms an implicit *block*. +Execution of instructions proceeds by way of a traditional *program counter* that advances +through the instructions. +Instructions fall into two categories: *control* instructions that form control constructs and *simple* instructions. +Control instructions pop their argument value(s) off the stack, may change the +program counter, and push result value(s) onto the stack. +Simple instructions pop their argument value(s) from the stack, apply an operator to the values, +and then push the result value(s) onto the stack, followed by an implicit advancement of +the program counter. + +All instructions and operators in WebAssembly are explicitly typed, with no overloading rules. Verification of WebAssembly code requires only a single pass with constant-time type checking and well-formedness checking. WebAssembly offers a set of language-independent operators that closely match operators in many programming languages and are efficiently implementable -on all modern computers. +on all modern computers. Each operator has a corresponding simple instruction. The [rationale](Rationale.md) document details why WebAssembly is designed as detailed in this document. -## Order of evaluation - -The evaluation order of child nodes is deterministic. - -All nodes other than control flow constructs need to evaluate their child nodes -in the order they appear in the serialized AST. - -For example, the s-expression presentation of the `i32.add` node -`(i32.add (set_local $x (i32.const 1)) (set_local $x (i32.const 2)))` -would first evaluate the child node `(set_local $x (i32.const 1))` and -afterwards the child node `(set_local $x (i32.const 2))`. - -The value of the local variable $x will be `2` after the `i32.add` node is fully -evaluated. - ## Traps Some operators may *trap* under some conditions, as noted below. In the MVP, @@ -75,7 +67,7 @@ WebAssembly has the following *value types*: * `f32`: 32-bit floating point * `f64`: 64-bit floating point -Each parameter and local variable has exactly one [value type](AstSemantics.md#types). Function signatures +Each parameter and local variable has exactly one [value type](Semantics.md#types). Function signatures consist of a sequence of zero or more parameter types and a sequence of zero or more return types. (Note: in the MVP, a function can have at most one return type). @@ -92,7 +84,7 @@ page support may be added in an opt-in manner in the [future](FutureFeatures.md#large-page-support)). The initial state of a linear memory is defined by the module's [linear memory](Modules.md#linear-memory-section) and [data](Modules.md#data-section) sections. The memory size can be dynamically -increased by the [`grow_memory`](AstSemantics.md#resizing) operator. +increased by the [`grow_memory`](Semantics.md#resizing) operator. A linear memory can be considered to be an untyped array of bytes, and it is unspecified how embedders map this array into their process' own [virtual @@ -237,7 +229,7 @@ The current size of the linear memory can be queried by the following operator: * `current_memory` : return the current memory size in units of pages. -As stated [above](AstSemantics.md#linear-memory), linear memory is contiguous, +As stated [above](Semantics.md#linear-memory), linear memory is contiguous, meaning there are no "holes" in the linear address space. After the MVP, there are [future features](FutureFeatures.md#finer-grained-control-over-memory) proposed to allow setting protection and creating mappings within the @@ -331,54 +323,71 @@ global variables will be necessary to allow sharing reference types between [threads](PostMVP.md#threads) since shared linear memory cannot load or store references. -## Control flow structures +## Control constructs and instructions -WebAssembly offers basic structured control flow with the following constructs. -Since all AST nodes are expressions in WebAssembly, control constructs may yield -a value and may appear as children of other expressions. +WebAssembly offers basic structured control flow constructs such as *blocks*, *loops*, and *ifs*. +All constructs are formed out of the following control instructions: - * `nop`: an empty operator that does not yield a value - * `block`: a fixed-length sequence of expressions with a label at the end - * `loop`: a block with an additional label at the beginning which may be used to form loops - * `if`: if expression with a list of *then* expressions and a list of *else* expressions + * `nop`: no operation, no effect + * `block`: the beginning of a block construct, a sequence of instructions with a label at the end + * `loop`: a block with a label at the beginning which may be used to form loops + * `if`: the beginning of an if construct with an implicit *then* block + * `else`: marks the else block of an if * `br`: branch to a given label in an enclosing construct * `br_if`: conditionally branch to a given label in an enclosing construct * `br_table`: a jump table which jumps to a label in an enclosing construct * `return`: return zero or more values from this function + * `end`: an instruction that marks the end of a block, loop, if, or function + +Blocks are composed of matched pairs of `block` ... `end` instructions, loops with matched pairs of +`loop` ... `end` instructions, and ifs with either `if` ... `end` or `if` ... `else` ... `end` sequences. +For each of these constructs the instructions in the ellipsis are said to be *enclosed* in the +construct. ### Branches and nesting -The `br` and `br_if` constructs express low-level branching. -Branches may only reference labels defined by an outer *enclosing construct*, -which can be a `block` (with a label at the `end`), `loop` (with a label at the -beginning), `if` (with a label at the `end` or `else`), `else` (with a label at -the `end`), or the function body (with a label at the `end`). This means that, -for example, references to a `block`'s label can only occur within the -`block`'s body. +The `br`, `br_if`, and `br_table` instructions express low-level branching and are hereafter refered to simply as branches. +Branches may only reference labels defined by a construct in which they are enclosed. +For example, references to a `block`'s label can only occur within the `block`'s body. In practice, outer `block`s can be used to place labels for any given branching -pattern, except for one restriction: one can't branch into the middle of a loop -from outside it. This restriction ensures all control flow graphs are well-structured -in the exact sense as in high-level languages like Java, JavaScript, Rust and Go. To -further see the parallel, note that a `br` to a `block`'s label is functionally -equivalent to a labeled `break` in high-level languages in that a `br` simply -breaks out of a `block`. +pattern, except that the nesting restriction makes it impossible to branch into the middle of a loop +from outside the loop. This limitation ensures by construction that all control flow graphs +are well-structured as in high-level languages like Java, JavaScript and Go. +Notice that that a branch to a `block`'s label is equivalent to a labeled `break` in +high-level languages; branches simply break out of a `block`, and branches to a `loop` +correspond to a "continue" statement. + +### Execution semantics of control instructions + +Executing a `return` pops return value(s) off the stack and returns from the current function. + +Executing a `block` or `loop` instruction has no effect on the value stack. + +Executing the `end` of a `block` or `loop` (including implicit blocks such as in `if` or for a function body) has no effect on the value stack. + +Executing the `end` of the implicit block for a function body pops the return value(s) (if any) off the stack and returns from the function. + +Executing the `if` instruction pops an `i32` condition off the stack and either falls through to the next instruction +or sets the program counter to after the `else` or `end` of the `if`. + +Executing the `else` instruction of an `if` sets the program counter to after the corresponding `end` of the `if`. -Branches that exit a `block`, `loop`, or `br_table` may take a subexpression -that yields a value for the exited construct. If present, it is the first operand -before any others. +Branches that exit a `block` or `if` may yield value(s) for that construct. +Branches pop result value(s) off the stack which must be the same type as the declared +type of the construct which they target. If a conditional or unconditional branch is taken, the values pushed +onto the stack between the beginning of the construct and the branch are discarded, the result value(s) are +pushed back onto the stack, and the program counter is updated to the end of the construct. -### Yielding values from control constructs +Branches that target a `loop` do not yield a value; they pop any values pushed onto the stack since the start of the loop and set the program counter to the start of the loop. -The `nop`, `br`, `br_if`, `br_table`, and `return` constructs do not yield values. -Other control constructs may yield values if their subexpressions yield values: +The `drop` operator can be used to explicitly pop and a value from the stack. -* `block`: yields either the value of the last expression in the block or the result of an inner branch that targeted the label of the block -* `loop`: yields the value of the last expression in the loop -* `if`: yields either the value of the last *then* expression or the last *else* expression or the result of an inner branch that targeted the label of one of these. +The implicit popping associated with explicit branches makes compiling expression languages straightforward, even non-local +control-flow transfer, requiring fewer drops. -In all constructs containing block-like sequences of expressions, all expressions but the last must not yield a value. -The `drop` operator can be used to explicitly discard unwanted expression results. +Note that in the MVP, all control constructs and control instructions, including `return` are +restricted to at most one value. ### `br_table` @@ -657,10 +666,7 @@ outside the range which rounds to an integer in range) traps. ## Unreachable - * `unreachable`: An expression which can take on any type, and which, if - executed, always traps. It is intended to be used for example after - calls to functions which are known by the producer not to return (otherwise - the producer would have to create another expression with an unused value - to satisfy the type check). This trap is intended to be impossible for user - code to catch or handle, even in the future when it may be possible to + * `unreachable`: An instruction which always traps. + It is intended to be used for example after calls to functions which are known by the producer not to return. + This trap is intended to be impossible for user code to catch or handle, even in the future when it may be possible to handle some other kinds of traps or exceptions. diff --git a/TextFormat.md b/TextFormat.md index 2318a615..d6d3dc08 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -17,7 +17,7 @@ The text format will be standardized, but only for tooling purposes: implement WebAssembly semantics. Given that the code representation is actually an -[Abstract Syntax Tree](AstSemantics.md), the syntax would contain nested +[Abstract Syntax Tree](Semantics.md), the syntax would contain nested statements and expressions (instead of the linear list of instructions most assembly languages have). @@ -50,7 +50,7 @@ readability will therefore factor into standardizing a text format. There are, however, prototype syntaxes which are used to bring up WebAssembly: it's easier to develop using a text format than it is with a binary format, even if the ultimate WebAssembly format will be binary. Most of these prototypes use [s-expressions][] because they -can easily represent expression trees and [ASTs](AstSemantics.md) (as opposed to CFGs) +can easily represent expression trees and [ASTs](Semantics.md) (as opposed to CFGs) and don't have much of a syntax to speak of (avoiding syntax bikeshed discussions). [s-expressions]: https://en.wikipedia.org/wiki/S-expression