From ef125d3a611a2bfbe4ab12cdfefa76fd7cb7177c Mon Sep 17 00:00:00 2001
From: Jakob Stoklund Olesen <jolesen@mozilla.com>
Date: Tue, 6 Jun 2017 10:50:39 -0700
Subject: [PATCH 1/6] Remove boolean types.

The WebAssembly community group was not able to reach a consensus on
whether boolean vector types should be included or not.

In order to move forward with prototyping of the proposal, remove the
boolean vector types along with supporting instructions. Change
comparison instructions to return a v128 mask vector and replace the
select instructions with a single v128.bitselect instruction.

These semantics are simply to implement on all current instruction set
architectures.
---
 proposals/simd/SIMD.md | 339 ++++++++++++++---------------------------
 1 file changed, 114 insertions(+), 225 deletions(-)

diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
index f843a5b23..4217c6209 100644
--- a/proposals/simd/SIMD.md
+++ b/proposals/simd/SIMD.md
@@ -6,45 +6,24 @@ current popular instruction set architectures.
 
 # Types
 
-WebAssembly is extended with five new value types and a number of new kinds of
-immediate operands used by the SIMD instructions.
+WebAssembly is extended with a new `v128` value type and a number of new kinds
+of immediate operands used by the SIMD instructions.
 
-## SIMD value types
+## SIMD value type
 
-The `v128` type has a concrete mapping to a 128-bit representation. The boolean
-types do not have a bit-pattern representation.
-
-* `v128`: A 128-bit SIMD vector. Bits are numbered 0–127.
-* `b8x16`: A vector of 16 `boolean` lanes numbered 0–15.
-* `b16x8`: A vector of 8 `boolean` lanes numbered 0–7.
-* `b32x4`: A vector of 4 `boolean` lanes numbered 0–3.
-* `b64x2`: A vector of 2 `boolean` lanes numbered 0–1.
-
-The `v128` type corresponds to a vector register in a typical SIMD ISA. The
-interpretation of the 128 bits in the vector register is provided by the
-individual instructions. When a `v128` value is represented as 16 bytes, bits
-0-7 go in the first byte with bit 0 as the LSB, bits 8-15 go in the second
+The `v128` value type has a concrete mapping to a 128-bit representation with bits
+numbered 0–127. The `v128` type corresponds to a vector register in a typical
+SIMD ISA. The interpretation of the 128 bits in the vector register is provided
+by the individual instructions. When a `v128` value is represented as 16 bytes,
+bits 0-7 go in the first byte with bit 0 as the LSB, bits 8-15 go in the second
 byte, etc.
 
-The abstract boolean vector types can be mapped to vector registers or predicate
-registers by an implementation. They have a property `S.Lanes` which is used by
-the pseudo-code below:
-
-|    S    | S.Lanes |
-|---------|--------:|
-| `b8x16` |      16 |
-| `b16x8` |       8 |
-| `b32x4` |       4 |
-| `b64x2` |       2 |
-
 ## Immediate operands
 
 Some of the new SIMD instructions defined here have immediate operands that are
 encoded as individual bytes in the binary encoding. Many have a limited valid
 range, and it is a validation error if the immediate operands are out of range.
 
-* `ImmBits2`: A byte with values in the range 0-3 used to initialize a `b64x2`.
-* `ImmBits4`: A byte with values in the range 0-15 used to initialize a `b32x4`.
 * `ImmByte`: A single unconstrained byte (0-255).
 * `LaneIdx2`: A byte with values in the range 0–1 identifying a lane.
 * `LaneIdx4`: A byte with values in the range 0–3 identifying a lane.
@@ -52,14 +31,12 @@ range, and it is a validation error if the immediate operands are out of range.
 * `LaneIdx16`: A byte with values in the range 0–15 identifying a lane.
 * `LaneIdx32`: A byte with values in the range 0–31 identifying a lane.
 
-## Interpreting SIMD value types
+## Interpreting the SIMD value type
 
 The single `v128` SIMD type can represent packed data in multiple ways.
 Instructions specify how the bits should be interpreted through a hierarchy of
 *interpretations*.
 
-The boolean vector types only have the one interpretation given by their type.
-
 ### Lane division interpretation
 
 The first level of interpretations of the `v128` type imposes a lane structure on
@@ -74,12 +51,12 @@ The lane dividing interpretations don't say anything about the semantics of the
 bits in each lane. The interpretations have *properties* used by the semantic
 specification pseudo-code below:
 
-|    S    | S.LaneBits | S.Lanes | S.BoolType |
+|    S    | S.LaneBits | S.Lanes | S.MaskType |
 |---------|-----------:|--------:|:----------:|
-| `v8x16` |          8 |      16 | `b8x16`    |
-| `v16x8` |         16 |       8 | `b16x8`    |
-| `v32x4` |         32 |       4 | `b32x4`    |
-| `v64x2` |         64 |       2 | `b64x2`    |
+| `v8x16` |          8 |      16 | `i8x16`    |
+| `v16x8` |         16 |       8 | `i16x8`    |
+| `v32x4` |         32 |       4 | `i32x4`    |
+| `v64x2` |         64 |       2 | `i64x2`    |
 
 Since WebAssembly is little-endian, the least significant bit in each lane is
 the bit with the lowest number.
@@ -147,35 +124,28 @@ def S.lanewise_binary(func, a, b):
     return result
 ```
 
-Comparison operators produce a boolean vector:
+Comparison operators produce a mask vector where the bits in each lane are 0
+for false and all ones for true:
 
 ```python
 def S.lanewise_comparison(func, a, b):
-    result = S.BoolType.New()
+    all_ones = S.MaskType.Umax
+    result = S.MaskType.New()
     for i in range(S.Lanes):
-        result[i] = func(a[i], b[i])
+        result[i] = all_ones if func(a[i], b[i]) else 0
     return result
 ```
 
 ## Constructing SIMD values
 
-### Constants
+### Constant
 * `v128.const(imm: ImmByte[16]) -> v128`
-* `b8x16.const(imm: ImmByte[2]) -> b8x16`
-* `b16x8.const(imm: ImmByte) -> b16x8`
-* `b32x4.const(imm: ImmBits4) -> b32x4`
-* `b64x2.const(imm: ImmBits2) -> b64x2`
 
 Materialize a constant SIMD value from the immediate operands. The `v128.const`
 instruction is encoded with 16 immediate bytes which provide the bits of the
-vector directly. The boolean constants are encoded with one bit per lane such
-that lane 0 is the LSB of the first immediate byte.
+vector directly.
 
 ### Create vector with identical lanes
-* `b8x16.splat(x: i32) -> b8x16`
-* `b16x8.splat(x: i32) -> b16x8`
-* `b32x4.splat(x: i32) -> b32x4`
-* `b64x2.splat(x: i32) -> b64x2`
 * `i8x16.splat(x: i32) -> v128`
 * `i16x8.splat(x: i32) -> v128`
 * `i32x4.splat(x: i32) -> v128`
@@ -193,17 +163,9 @@ def S.splat(x):
     return result
 ```
 
-The boolean vector splats will create a vector with all false lanes if `x` is
-zero, all true lanes otherwise. The `i8x16.splat` and `i16x8.splat`
-instructions ignore the high bits of `x`.
-
 ## Accessing lanes
 
 ### Extract lane as a scalar
-* `b8x16.extract_lane(a: b8x16, i: LaneIdx16) -> i32`
-* `b16x8.extract_lane(a: b16x8, i: LaneIdx8) -> i32`
-* `b32x4.extract_lane(a: b32x4, i: LaneIdx4) -> i32`
-* `b64x2.extract_lane(a: b64x2, i: LaneIdx2) -> i32`
 * `i8x16.extract_lane_s(a: v128, i: LaneIdx16) -> i32`
 * `i8x16.extract_lane_u(a: v128, i: LaneIdx16) -> i32`
 * `i16x8.extract_lane_s(a: v128, i: LaneIdx8) -> i32`
@@ -221,14 +183,9 @@ def S.extract_lane(a, i):
 ```
 
 The `_s` and `_u` variants will sign-extend or zero-extend the lane value to
-`i32` respectively. Boolean lanes are returned as an `i32` with the value 0 or
-1.
+`i32` respectively.
 
 ### Replace lane value
-* `b8x16.replace_lane(a: b8x16, i: LaneIdx16, x: i32) -> b8x16`
-* `b16x8.replace_lane(a: b16x8, i: LaneIdx8, x: i32) -> b16x8`
-* `b32x4.replace_lane(a: b32x4, i: LaneIdx4, x: i32) -> b32x4`
-* `b64x2.replace_lane(a: b64x2, i: LaneIdx2, x: i32) -> b64x2`
 * `i8x16.replace_lane(a: v128, i: LaneIdx16, x: i32) -> v128`
 * `i16x8.replace_lane(a: v128, i: LaneIdx8, x: i32) -> v128`
 * `i32x4.replace_lane(a: v128, i: LaneIdx4, x: i32) -> v128`
@@ -249,31 +206,7 @@ def S.replace_lane(a, i, x):
 ```
 
 The input lane value, `x`, is interpreted the same way as for the splat
-instructions. For the boolean vectors, non-zero means true; for the `i8` and
-`i16` lanes, the high bits of `x` are ignored.
-
-### Lane-wise select
-* `v8x16.select(s: b8x16, t: v128, f: v128) -> v128`
-* `v16x8.select(s: b16x8, t: v128, f: v128) -> v128`
-* `v32x4.select(s: b32x4, t: v128, f: v128) -> v128`
-* `v64x2.select(s: b64x2, t: v128, f: v128) -> v128`
-
-Use a boolean vector to select lanes from two numerical vectors.
-
-```python
-def S.select(s, t, f):
-    result = S.New()
-    for i in range(S.Lanes):
-        if s[i]:
-            result[i] = t[i]
-        else
-            result[i] = f[i]
-    return result
-```
-
-Note that the normal WebAssembly `select` instruction also works with vector
-types. It selects between two whole vectors controlled by a scalar value,
-rather than selecting lanes controlled by a boolean vector.
+instructions. For the `i8` and `i16` lanes, the high bits of `x` are ignored.
 
 ### Swizzle lanes
 * `v8x16.swizzle(a: v128, s: LaneIdx16[16]) -> v128`
@@ -480,14 +413,14 @@ arithmetic right shift for the `_s` variants and a logical right shift for the
 `_u` variants.
 
 ```python
-def S.shl_s(a, y):
+def S.shr_s(a, y):
     # Number of bits to shift: 0 .. S.LaneBits - 1.
     amount = y mod S.LaneBits
     def shift(x):
         return x >> amount
     return S.lanewise_unary(shift, S.AsSigned(a))
 
-def S.shl_u(a, y):
+def S.shr_u(a, y):
     # Number of bits to shift: 0 .. S.LaneBits - 1.
     amount = y mod S.LaneBits
     def shift(x):
@@ -495,107 +428,66 @@ def S.shl_u(a, y):
     return S.lanewise_unary(shift, S.AsUnsigned(a))
 ```
 
-## Logical operations
-
-The logical operations are defined on the boolean SIMD types. See also the
-[Bitwise operations](#bitwise-operations) below.
-
-### Logical and
-* `b8x16.and(a: b8x16, b: b8x16) -> b8x16`
-* `b16x8.and(a: b16x8, b: b16x8) -> b16x8`
-* `b32x4.and(a: b32x4, b: b32x4) -> b32x4`
-* `b64x2.and(a: b64x2, b: b64x2) -> b64x2`
-
-```python
-def S.and(a, b):
-    def logical_and(x, y):
-        return x and y
-    return S.lanewise_binary(logical_and, a, b)
-```
-
-### Logical or
-* `b8x16.or(a: b8x16, b: b8x16) -> b8x16`
-* `b16x8.or(a: b16x8, b: b16x8) -> b16x8`
-* `b32x4.or(a: b32x4, b: b32x4) -> b32x4`
-* `b64x2.or(a: b64x2, b: b64x2) -> b64x2`
-
-```python
-def S.or(a, b):
-    def logical_or(x, y):
-        return x or y
-    return S.lanewise_binary(logical_or, a, b)
-```
-
-### Logical xor
-* `b8x16.xor(a: b8x16, b: b8x16) -> b8x16`
-* `b16x8.xor(a: b16x8, b: b16x8) -> b16x8`
-* `b32x4.xor(a: b32x4, b: b32x4) -> b32x4`
-* `b64x2.xor(a: b64x2, b: b64x2) -> b64x2`
-
-```python
-def S.xor(a, b):
-    def logical_xor(x, y):
-        return x xor y
-    return S.lanewise_binary(logical_xor, a, b)
-```
-
-### Logical not
-* `b8x16.not(a: b8x16) -> b8x16`
-* `b16x8.not(a: b16x8) -> b16x8`
-* `b32x4.not(a: b32x4) -> b32x4`
-* `b64x2.not(a: b64x2) -> b64x2`
-
-```python
-def S.not(a):
-    def logical_not(x):
-        return not x
-    return S.lanewise_unary(logical_not, a)
-```
 
 ## Bitwise operations
 
-The same logical operations defined on the boolean types are also available on
-the `v128` type where they operate bitwise the same way C's `&`, `|`, `^`, and
-`~` operators work on an `unsigned` type.
+Bitwise operations treat a `v128` value type as a vector of 128 independent bits.
 
+### Bitwise logic
 * `v128.and(a: v128, b: v128) -> v128`
 * `v128.or(a: v128, b: v128) -> v128`
 * `v128.xor(a: v128, b: v128) -> v128`
 * `v128.not(a: v128) -> v128`
 
+The logical operations defined on the scalar integer types are also available
+on the `v128` type where they operate bitwise the same way C's `&`, `|`, `^`,
+and `~` operators work on an `unsigned` type.
+
+### Bitwise select
+* `v128.bitselect(v1: v128, v2: v128, c: v128) -> v128`
+
+Use the bits in the control mask `c` to select the corresponding bit from `v1`
+when 1 and `v2` when 0.
+This is the same as `v128.or(v128.and(v1, c), v128.and(v2, v128.not(c)))`.
+
+Note that the normal WebAssembly `select` instruction also works with vector
+types. It selects between two whole vectors controlled by a single scalar value,
+rather than selecting bits controlled by a control mask vector.
+
+
 ## Boolean horizontal reductions
 
-These operations reduce all the lanes of a boolean vector to a single scalar
-boolean value.
+These operations reduce all the lanes of an integer vector to a single scalar
+0 or 1 value. A lane is considered "true" if it is non-zero.
 
 ### Any lane true
-* `b8x16.any_true(a: b8x16) -> i32`
-* `b16x8.any_true(a: b16x8) -> i32`
-* `b32x4.any_true(a: b32x4) -> i32`
-* `b64x2.any_true(a: b64x2) -> i32`
+* `i8x16.any_true(a: v128) -> i32`
+* `i16x8.any_true(a: v128) -> i32`
+* `i32x4.any_true(a: v128) -> i32`
+* `i64x2.any_true(a: v128) -> i32`
 
-These functions return 1 if any lane in `a` is true, 0 otherwise.
+These functions return 1 if any lane in `a` is non-zero, 0 otherwise.
 
 ```python
 def S.any_true(a):
     for i in range(S.Lanes):
-        if a[i]:
+        if a[i] != 0:
             return 1
     return 0
 ```
 
 ### All lanes true
-* `b8x16.all_true(a: b8x16) -> i32`
-* `b16x8.all_true(a: b16x8) -> i32`
-* `b32x4.all_true(a: b32x4) -> i32`
-* `b64x2.all_true(a: b64x2) -> i32`
+* `i8x16.all_true(a: v128) -> i32`
+* `i16x8.all_true(a: v128) -> i32`
+* `i32x4.all_true(a: v128) -> i32`
+* `i64x2.all_true(a: v128) -> i32`
 
-These functions return 1 if all lanes in `a` are true, 0 otherwise.
+These functions return 1 if all lanes in `a` are non-zero, 0 otherwise.
 
 ```python
 def S.all_true(a):
     for i in range(S.Lanes):
-        if not a[i]:
+        if a[i] == 0:
             return 0
     return 1
 ```
@@ -603,15 +495,15 @@ def S.all_true(a):
 ## Comparisons
 
 The comparison operations all compare two vectors lane-wise, and produce a
-boolean vector with the same number of lanes as the input interpretation.
+mask vector with the same number of lanes as the input interpretation.
 
 ### Equality
-* `i8x16.eq(a: v128, b: v128) -> b8x16`
-* `i16x8.eq(a: v128, b: v128) -> b16x8`
-* `i32x4.eq(a: v128, b: v128) -> b32x4`
-* `i64x2.eq(a: v128, b: v128) -> b64x2`
-* `f32x4.eq(a: v128, b: v128) -> b32x4`
-* `f64x2.eq(a: v128, b: v128) -> b64x2`
+* `i8x16.eq(a: v128, b: v128) -> v128`
+* `i16x8.eq(a: v128, b: v128) -> v128`
+* `i32x4.eq(a: v128, b: v128) -> v128`
+* `i64x2.eq(a: v128, b: v128) -> v128`
+* `f32x4.eq(a: v128, b: v128) -> v128`
+* `f64x2.eq(a: v128, b: v128) -> v128`
 
 Integer equality is independent of the signed/unsigned interpretation. Floating
 point equality follows IEEE semantics, so a NaN lane compares not equal with
@@ -625,12 +517,12 @@ def S.eq(a, b):
 ```
 
 ### Non-equality
-* `i8x16.ne(a: v128, b: v128) -> b8x16`
-* `i16x8.ne(a: v128, b: v128) -> b16x8`
-* `i32x4.ne(a: v128, b: v128) -> b32x4`
-* `i64x2.ne(a: v128, b: v128) -> b64x2`
-* `f32x4.ne(a: v128, b: v128) -> b32x4`
-* `f64x2.ne(a: v128, b: v128) -> b64x2`
+* `i8x16.ne(a: v128, b: v128) -> v128`
+* `i16x8.ne(a: v128, b: v128) -> v128`
+* `i32x4.ne(a: v128, b: v128) -> v128`
+* `i64x2.ne(a: v128, b: v128) -> v128`
+* `f32x4.ne(a: v128, b: v128) -> v128`
+* `f64x2.ne(a: v128, b: v128) -> v128`
 
 The `ne` operations produce the inverse of their `ne` counterparts:
 
@@ -642,62 +534,59 @@ def S.ne(a, b):
 ```
 
 ### Less than
-* `i8x16.lt_s(a: v128, b: v128) -> b8x16`
-* `i8x16.lt_u(a: v128, b: v128) -> b8x16`
-* `i16x8.lt_s(a: v128, b: v128) -> b16x8`
-* `i16x8.lt_u(a: v128, b: v128) -> b16x8`
-* `i32x4.lt_s(a: v128, b: v128) -> b32x4`
-* `i32x4.lt_u(a: v128, b: v128) -> b32x4`
-* `i64x2.lt_s(a: v128, b: v128) -> b64x2`
-* `i64x2.lt_u(a: v128, b: v128) -> b64x2`
-* `f32x4.lt(a: v128, b: v128) -> b32x4`
-* `f64x2.lt(a: v128, b: v128) -> b64x2`
+* `i8x16.lt_s(a: v128, b: v128) -> v128`
+* `i8x16.lt_u(a: v128, b: v128) -> v128`
+* `i16x8.lt_s(a: v128, b: v128) -> v128`
+* `i16x8.lt_u(a: v128, b: v128) -> v128`
+* `i32x4.lt_s(a: v128, b: v128) -> v128`
+* `i32x4.lt_u(a: v128, b: v128) -> v128`
+* `i64x2.lt_s(a: v128, b: v128) -> v128`
+* `i64x2.lt_u(a: v128, b: v128) -> v128`
+* `f32x4.lt(a: v128, b: v128) -> v128`
+* `f64x2.lt(a: v128, b: v128) -> v128`
 
 ### Less than or equal
-* `i8x16.le_s(a: v128, b: v128) -> b8x16`
-* `i8x16.le_u(a: v128, b: v128) -> b8x16`
-* `i16x8.le_s(a: v128, b: v128) -> b16x8`
-* `i16x8.le_u(a: v128, b: v128) -> b16x8`
-* `i32x4.le_s(a: v128, b: v128) -> b32x4`
-* `i32x4.le_u(a: v128, b: v128) -> b32x4`
-* `i64x2.le_s(a: v128, b: v128) -> b64x2`
-* `i64x2.le_u(a: v128, b: v128) -> b64x2`
-* `f32x4.le(a: v128, b: v128) -> b32x4`
-* `f64x2.le(a: v128, b: v128) -> b64x2`
+* `i8x16.le_s(a: v128, b: v128) -> v128`
+* `i8x16.le_u(a: v128, b: v128) -> v128`
+* `i16x8.le_s(a: v128, b: v128) -> v128`
+* `i16x8.le_u(a: v128, b: v128) -> v128`
+* `i32x4.le_s(a: v128, b: v128) -> v128`
+* `i32x4.le_u(a: v128, b: v128) -> v128`
+* `i64x2.le_s(a: v128, b: v128) -> v128`
+* `i64x2.le_u(a: v128, b: v128) -> v128`
+* `f32x4.le(a: v128, b: v128) -> v128`
+* `f64x2.le(a: v128, b: v128) -> v128`
 
 ### Greater than
-* `i8x16.gt_s(a: v128, b: v128) -> b8x16`
-* `i8x16.gt_u(a: v128, b: v128) -> b8x16`
-* `i16x8.gt_s(a: v128, b: v128) -> b16x8`
-* `i16x8.gt_u(a: v128, b: v128) -> b16x8`
-* `i32x4.gt_s(a: v128, b: v128) -> b32x4`
-* `i32x4.gt_u(a: v128, b: v128) -> b32x4`
-* `i64x2.gt_s(a: v128, b: v128) -> b64x2`
-* `i64x2.gt_u(a: v128, b: v128) -> b64x2`
-* `f32x4.gt(a: v128, b: v128) -> b32x4`
-* `f64x2.gt(a: v128, b: v128) -> b64x2`
+* `i8x16.gt_s(a: v128, b: v128) -> v128`
+* `i8x16.gt_u(a: v128, b: v128) -> v128`
+* `i16x8.gt_s(a: v128, b: v128) -> v128`
+* `i16x8.gt_u(a: v128, b: v128) -> v128`
+* `i32x4.gt_s(a: v128, b: v128) -> v128`
+* `i32x4.gt_u(a: v128, b: v128) -> v128`
+* `i64x2.gt_s(a: v128, b: v128) -> v128`
+* `i64x2.gt_u(a: v128, b: v128) -> v128`
+* `f32x4.gt(a: v128, b: v128) -> v128`
+* `f64x2.gt(a: v128, b: v128) -> v128`
 
 ### Greater than or equal
-* `i8x16.ge_s(a: v128, b: v128) -> b8x16`
-* `i8x16.ge_u(a: v128, b: v128) -> b8x16`
-* `i16x8.ge_s(a: v128, b: v128) -> b16x8`
-* `i16x8.ge_u(a: v128, b: v128) -> b16x8`
-* `i32x4.ge_s(a: v128, b: v128) -> b32x4`
-* `i32x4.ge_u(a: v128, b: v128) -> b32x4`
-* `i64x2.ge_s(a: v128, b: v128) -> b64x2`
-* `i64x2.ge_u(a: v128, b: v128) -> b64x2`
-* `f32x4.ge(a: v128, b: v128) -> b32x4`
-* `f64x2.ge(a: v128, b: v128) -> b64x2`
+* `i8x16.ge_s(a: v128, b: v128) -> v128`
+* `i8x16.ge_u(a: v128, b: v128) -> v128`
+* `i16x8.ge_s(a: v128, b: v128) -> v128`
+* `i16x8.ge_u(a: v128, b: v128) -> v128`
+* `i32x4.ge_s(a: v128, b: v128) -> v128`
+* `i32x4.ge_u(a: v128, b: v128) -> v128`
+* `i64x2.ge_s(a: v128, b: v128) -> v128`
+* `i64x2.ge_u(a: v128, b: v128) -> v128`
+* `f32x4.ge(a: v128, b: v128) -> v128`
+* `f64x2.ge(a: v128, b: v128) -> v128`
 
 ## Load and store
 
-Load and store operations are provided for `v128` vectors, but not for the
-boolean vectors; we don't want to prescribe a bitwise representation of the
-boolean vectors.
-
-The memory operations take the same arguments and have the same semantics as
-the existing scalar WebAssembly load and store instructions. The difference is
-that the memory access size is 16 bytes which is also the natural alignment.
+Load and store operations are provided for the `v128` vectors. The memory
+operations take the same arguments and have the same semantics as the existing
+scalar WebAssembly load and store instructions. The difference is that the
+memory access size is 16 bytes which is also the natural alignment.
 
 ### Load
 

From 6673c0cd83deefd48a5ac976b9382d3442569170 Mon Sep 17 00:00:00 2001
From: Jakob Stoklund Olesen <jolesen@mozilla.com>
Date: Tue, 6 Jun 2017 10:57:20 -0700
Subject: [PATCH 2/6] Keep only a single v8x16.shuffle instruction.

Remove all other swizzle and shuffle instructions since they can be
implemented in terms of the general v8x16.shuffle.

We can add more compact encodings of popular shuffles later if needed.
---
 proposals/simd/SIMD.md | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
index 4217c6209..26f500755 100644
--- a/proposals/simd/SIMD.md
+++ b/proposals/simd/SIMD.md
@@ -208,27 +208,8 @@ def S.replace_lane(a, i, x):
 The input lane value, `x`, is interpreted the same way as for the splat
 instructions. For the `i8` and `i16` lanes, the high bits of `x` are ignored.
 
-### Swizzle lanes
-* `v8x16.swizzle(a: v128, s: LaneIdx16[16]) -> v128`
-* `v16x8.swizzle(a: v128, s: LaneIdx8[8]) -> v128`
-* `v32x4.swizzle(a: v128, s: LaneIdx4[4]) -> v128`
-* `v64x2.swizzle(a: v128, s: LaneIdx2[2]) -> v128`
-
-Create vector with lanes rearranged:
-
-```python
-def S.swizzle(a, s):
-    result = S.New()
-    for i in range(S.Lanes):
-        result[i] = a[s[i]]
-    return result
-```
-
 ### Shuffle lanes
 * `v8x16.shuffle(a: v128, b: v128, s: LaneIdx32[16]) -> v128`
-* `v16x8.shuffle(a: v128, b: v128, s: LaneIdx16[8]) -> v128`
-* `v32x4.shuffle(a: v128, b: v128, s: LaneIdx8[4]) -> v128`
-* `v64x2.shuffle(a: v128, b: v128, s: LaneIdx4[2]) -> v128`
 
 Create vector with lanes selected from the lanes of two input vectors:
 

From c04b57d72d93f4ec2caa27f345b160fa06d2493e Mon Sep 17 00:00:00 2001
From: Jakob Stoklund Olesen <jolesen@mozilla.com>
Date: Tue, 6 Jun 2017 11:59:43 -0700
Subject: [PATCH 3/6] Remove controversial i64x2 operations.

The CG consensus is to omit the following i64x2 operations: i64x2.mul,
equalities, inequalities.

Other i64x2 operations remain since they were supported by CG
consensus.
---
 proposals/simd/SIMD.md | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
index 26f500755..ca6dcc63d 100644
--- a/proposals/simd/SIMD.md
+++ b/proposals/simd/SIMD.md
@@ -271,7 +271,6 @@ def S.sub(a, b):
 * `i8x16.mul(a: v128, b: v128) -> v128`
 * `i16x8.mul(a: v128, b: v128) -> v128`
 * `i32x4.mul(a: v128, b: v128) -> v128`
-* `i64x2.mul(a: v128, b: v128) -> v128`
 
 Lane-wise wrapping integer multiplication:
 
@@ -482,7 +481,6 @@ mask vector with the same number of lanes as the input interpretation.
 * `i8x16.eq(a: v128, b: v128) -> v128`
 * `i16x8.eq(a: v128, b: v128) -> v128`
 * `i32x4.eq(a: v128, b: v128) -> v128`
-* `i64x2.eq(a: v128, b: v128) -> v128`
 * `f32x4.eq(a: v128, b: v128) -> v128`
 * `f64x2.eq(a: v128, b: v128) -> v128`
 
@@ -501,7 +499,6 @@ def S.eq(a, b):
 * `i8x16.ne(a: v128, b: v128) -> v128`
 * `i16x8.ne(a: v128, b: v128) -> v128`
 * `i32x4.ne(a: v128, b: v128) -> v128`
-* `i64x2.ne(a: v128, b: v128) -> v128`
 * `f32x4.ne(a: v128, b: v128) -> v128`
 * `f64x2.ne(a: v128, b: v128) -> v128`
 
@@ -521,8 +518,6 @@ def S.ne(a, b):
 * `i16x8.lt_u(a: v128, b: v128) -> v128`
 * `i32x4.lt_s(a: v128, b: v128) -> v128`
 * `i32x4.lt_u(a: v128, b: v128) -> v128`
-* `i64x2.lt_s(a: v128, b: v128) -> v128`
-* `i64x2.lt_u(a: v128, b: v128) -> v128`
 * `f32x4.lt(a: v128, b: v128) -> v128`
 * `f64x2.lt(a: v128, b: v128) -> v128`
 
@@ -533,8 +528,6 @@ def S.ne(a, b):
 * `i16x8.le_u(a: v128, b: v128) -> v128`
 * `i32x4.le_s(a: v128, b: v128) -> v128`
 * `i32x4.le_u(a: v128, b: v128) -> v128`
-* `i64x2.le_s(a: v128, b: v128) -> v128`
-* `i64x2.le_u(a: v128, b: v128) -> v128`
 * `f32x4.le(a: v128, b: v128) -> v128`
 * `f64x2.le(a: v128, b: v128) -> v128`
 
@@ -545,8 +538,6 @@ def S.ne(a, b):
 * `i16x8.gt_u(a: v128, b: v128) -> v128`
 * `i32x4.gt_s(a: v128, b: v128) -> v128`
 * `i32x4.gt_u(a: v128, b: v128) -> v128`
-* `i64x2.gt_s(a: v128, b: v128) -> v128`
-* `i64x2.gt_u(a: v128, b: v128) -> v128`
 * `f32x4.gt(a: v128, b: v128) -> v128`
 * `f64x2.gt(a: v128, b: v128) -> v128`
 
@@ -557,8 +548,6 @@ def S.ne(a, b):
 * `i16x8.ge_u(a: v128, b: v128) -> v128`
 * `i32x4.ge_s(a: v128, b: v128) -> v128`
 * `i32x4.ge_u(a: v128, b: v128) -> v128`
-* `i64x2.ge_s(a: v128, b: v128) -> v128`
-* `i64x2.ge_u(a: v128, b: v128) -> v128`
 * `f32x4.ge(a: v128, b: v128) -> v128`
 * `f64x2.ge(a: v128, b: v128) -> v128`
 

From 6365a6566646109bfcee819d57a4cb709dead356 Mon Sep 17 00:00:00 2001
From: Jakob Stoklund Olesen <jolesen@mozilla.com>
Date: Tue, 6 Jun 2017 12:08:15 -0700
Subject: [PATCH 4/6] Saturating float-to-int conversions.

The CG consensus was to replace the trapping conversion instructions
with saturating semantics.
---
 proposals/simd/SIMD.md | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
index ca6dcc63d..ab2a99b82 100644
--- a/proposals/simd/SIMD.md
+++ b/proposals/simd/SIMD.md
@@ -662,13 +662,14 @@ Lane-wise IEEE `squareRoot`.
 Lane-wise conversion from integer to floating point. Some integer values will be
 rounded.
 
-### Floating point to integer
-* `i32x4.trunc_s/f32x4(a: v128) -> v128`
-* `i32x4.trunc_u/f32x4(a: v128) -> v128`
-* `i64x2.trunc_s/f64x2(a: v128) -> v128`
-* `i64x2.trunc_u/f64x2(a: v128) -> v128`
-
-Lane-wise conversion from floating point to integer using the IEEE
-`convertToIntegerTowardZero` function. If any lane is a NaN or the rounded
-integer value is outside the range of the destination type, these instructions
-trap.
+### Floating point to integer with saturation
+* `i32x4.trunc_s/f32x4:sat(a: v128) -> v128`
+* `i32x4.trunc_u/f32x4:sat(a: v128) -> v128`
+* `i64x2.trunc_s/f64x2:sat(a: v128) -> v128`
+* `i64x2.trunc_u/f64x2:sat(a: v128) -> v128`
+
+Lane-wise saturating conversion from floating point to integer using the IEEE
+`convertToIntegerTowardZero` function. If any input lane is a NaN, the
+resulting lane is 0. If the rounded integer value of a lane is outside the
+range of the destination type, the result is saturated to the nearest
+representable integer value.

From d404f13de8d42cbd2f777b5cc5c702233814a86d Mon Sep 17 00:00:00 2001
From: Jakob Stoklund Olesen <jolesen@mozilla.com>
Date: Tue, 6 Jun 2017 13:08:27 -0700
Subject: [PATCH 5/6] Add a separate document for the binary encoding of SIMD.

---
 proposals/simd/BinarySIMD.md | 169 +++++++++++++++++++++++++++++++++++
 proposals/simd/SIMD.md       |   2 +
 2 files changed, 171 insertions(+)
 create mode 100644 proposals/simd/BinarySIMD.md

diff --git a/proposals/simd/BinarySIMD.md b/proposals/simd/BinarySIMD.md
new file mode 100644
index 000000000..736266c34
--- /dev/null
+++ b/proposals/simd/BinarySIMD.md
@@ -0,0 +1,169 @@
+# Binary encoding of SIMD
+
+This document describes the binary encoding of the SIMD value type and
+instructions.
+
+## SIMD value type
+
+The `v128` value type is encoded as 0x7b:
+
+```
+valtype ::= ...
+          | 0x7B => v128
+```
+
+## SIMD instruction encodings
+
+All SIMD instructions are encoded as a 0xfd prefix byte followed by a
+SIMD-specific opcode in LEB128 format:
+
+```
+instr ::= ...
+        | 0xFB simdop:varuint32 ...
+```
+
+Some SIMD instructions have additional immediate operands following `simdop`.
+The `v8x16.shuffle` instruction has 16 bytes after `simdop`.
+
+| Instruction               | `simdop` | Immediate operands |
+| --------------------------|---------:|--------------------|
+| `v128.const`              |        0 | -                  |
+| `v128.load`               |        1 | m:memarg           |
+| `v128.store`              |        2 | m:memarg           |
+| `i8x16.splat`             |        3 | -                  |
+| `i16x8.splat`             |        4 | -                  |
+| `i32x4.splat`             |        5 | -                  |
+| `i64x2.splat`             |        6 | -                  |
+| `f32x4.splat`             |        7 | -                  |
+| `f64x2.splat`             |        8 | -                  |
+| `i8x16.extract_lane_s`    |        9 | i:LaneIdx16        |
+| `i8x16.extract_lane_u`    |       10 | i:LaneIdx16        |
+| `i16x8.extract_lane_s`    |       11 | i:LaneIdx8         |
+| `i16x8.extract_lane_u`    |       12 | i:LaneIdx8         |
+| `i32x4.extract_lane`      |       13 | i:LaneIdx4         |
+| `i64x2.extract_lane`      |       14 | i:LaneIdx2         |
+| `f32x4.extract_lane`      |       15 | i:LaneIdx4         |
+| `f64x2.extract_lane`      |       16 | i:LaneIdx2         |
+| `i8x16.replace_lane`      |       17 | i:LaneIdx16        |
+| `i16x8.replace_lane`      |       18 | i:LaneIdx8         |
+| `i32x4.replace_lane`      |       19 | i:LaneIdx4         |
+| `i64x2.replace_lane`      |       20 | i:LaneIdx2         |
+| `f32x4.replace_lane`      |       21 | i:LaneIdx4         |
+| `f64x2.replace_lane`      |       22 | i:LaneIdx2         |
+| `v8x16.shuffle`           |       23 | s:LaneIdx32[16]    |
+| `i8x16.add`               |       24 | -                  |
+| `i16x8.add`               |       25 | -                  |
+| `i32x4.add`               |       26 | -                  |
+| `i64x2.add`               |       27 | -                  |
+| `i8x16.sub`               |       28 | -                  |
+| `i16x8.sub`               |       29 | -                  |
+| `i32x4.sub`               |       30 | -                  |
+| `i64x2.sub`               |       31 | -                  |
+| `i8x16.mul`               |       32 | -                  |
+| `i16x8.mul`               |       33 | -                  |
+| `i32x4.mul`               |       34 | -                  |
+| `i8x16.neg`               |       35 | -                  |
+| `i16x8.neg`               |       36 | -                  |
+| `i32x4.neg`               |       37 | -                  |
+| `i64x2.neg`               |       38 | -                  |
+| `i8x16.add_saturate_s`    |       39 | -                  |
+| `i8x16.add_saturate_u`    |       40 | -                  |
+| `i16x8.add_saturate_s`    |       41 | -                  |
+| `i16x8.add_saturate_u`    |       42 | -                  |
+| `i8x16.sub_saturate_s`    |       43 | -                  |
+| `i8x16.sub_saturate_u`    |       44 | -                  |
+| `i16x8.sub_saturate_s`    |       45 | -                  |
+| `i16x8.sub_saturate_u`    |       46 | -                  |
+| `i8x16.shl`               |       47 | -                  |
+| `i16x8.shl`               |       48 | -                  |
+| `i32x4.shl`               |       49 | -                  |
+| `i64x2.shl`               |       50 | -                  |
+| `i8x16.shr_s`             |       51 | -                  |
+| `i8x16.shr_u`             |       52 | -                  |
+| `i16x8.shr_s`             |       53 | -                  |
+| `i16x8.shr_u`             |       54 | -                  |
+| `i32x4.shr_s`             |       55 | -                  |
+| `i32x4.shr_u`             |       56 | -                  |
+| `i64x2.shr_s`             |       57 | -                  |
+| `i64x2.shr_u`             |       58 | -                  |
+| `v128.and`                |       59 | -                  |
+| `v128.or`                 |       60 | -                  |
+| `v128.xor`                |       61 | -                  |
+| `v128.not`                |       62 | -                  |
+| `v128.bitselect`          |       63 | -                  |
+| `i8x16.any_true`          |       64 | -                  |
+| `i16x8.any_true`          |       65 | -                  |
+| `i32x4.any_true`          |       66 | -                  |
+| `i64x2.any_true`          |       67 | -                  |
+| `i8x16.all_true`          |       68 | -                  |
+| `i16x8.all_true`          |       69 | -                  |
+| `i32x4.all_true`          |       70 | -                  |
+| `i64x2.all_true`          |       71 | -                  |
+| `i8x16.eq`                |       72 | -                  |
+| `i16x8.eq`                |       73 | -                  |
+| `i32x4.eq`                |       74 | -                  |
+| `f32x4.eq`                |       75 | -                  |
+| `f64x2.eq`                |       76 | -                  |
+| `i8x16.ne`                |       77 | -                  |
+| `i16x8.ne`                |       78 | -                  |
+| `i32x4.ne`                |       79 | -                  |
+| `f32x4.ne`                |       80 | -                  |
+| `f64x2.ne`                |       81 | -                  |
+| `i8x16.lt_s`              |       82 | -                  |
+| `i8x16.lt_u`              |       83 | -                  |
+| `i16x8.lt_s`              |       84 | -                  |
+| `i16x8.lt_u`              |       85 | -                  |
+| `i32x4.lt_s`              |       86 | -                  |
+| `i32x4.lt_u`              |       87 | -                  |
+| `f32x4.lt`                |       88 | -                  |
+| `f64x2.lt`                |       89 | -                  |
+| `i8x16.le_s`              |       90 | -                  |
+| `i8x16.le_u`              |       91 | -                  |
+| `i16x8.le_s`              |       92 | -                  |
+| `i16x8.le_u`              |       93 | -                  |
+| `i32x4.le_s`              |       94 | -                  |
+| `i32x4.le_u`              |       95 | -                  |
+| `f32x4.le`                |       96 | -                  |
+| `f64x2.le`                |       97 | -                  |
+| `i8x16.gt_s`              |       98 | -                  |
+| `i8x16.gt_u`              |       99 | -                  |
+| `i16x8.gt_s`              |      100 | -                  |
+| `i16x8.gt_u`              |      101 | -                  |
+| `i32x4.gt_s`              |      102 | -                  |
+| `i32x4.gt_u`              |      103 | -                  |
+| `f32x4.gt`                |      104 | -                  |
+| `f64x2.gt`                |      105 | -                  |
+| `i8x16.ge_s`              |      106 | -                  |
+| `i8x16.ge_u`              |      107 | -                  |
+| `i16x8.ge_s`              |      108 | -                  |
+| `i16x8.ge_u`              |      109 | -                  |
+| `i32x4.ge_s`              |      110 | -                  |
+| `i32x4.ge_u`              |      111 | -                  |
+| `f32x4.ge`                |      112 | -                  |
+| `f64x2.ge`                |      113 | -                  |
+| `f32x4.neg`               |      114 | -                  |
+| `f64x2.neg`               |      115 | -                  |
+| `f32x4.abs`               |      116 | -                  |
+| `f64x2.abs`               |      117 | -                  |
+| `f32x4.min`               |      118 | -                  |
+| `f64x2.min`               |      119 | -                  |
+| `f32x4.max`               |      120 | -                  |
+| `f64x2.max`               |      121 | -                  |
+| `f32x4.add`               |      122 | -                  |
+| `f64x2.add`               |      123 | -                  |
+| `f32x4.sub`               |      124 | -                  |
+| `f64x2.sub`               |      125 | -                  |
+| `f32x4.div`               |      126 | -                  |
+| `f64x2.div`               |      127 | -                  |
+| `f32x4.mul`               |      128 | -                  |
+| `f64x2.mul`               |      129 | -                  |
+| `f32x4.sqrt`              |      130 | -                  |
+| `f64x2.sqrt`              |      131 | -                  |
+| `f32x4.convert_s/i32x4`   |      132 | -                  |
+| `f32x4.convert_u/i32x4`   |      133 | -                  |
+| `f64x2.convert_s/i64x2`   |      134 | -                  |
+| `f64x2.convert_u/i64x2`   |      135 | -                  |
+| `i32x4.trunc_s/f32x4:sat` |      136 | -                  |
+| `i32x4.trunc_u/f32x4:sat` |      137 | -                  |
+| `i64x2.trunc_s/f64x2:sat` |      138 | -                  |
+| `i64x2.trunc_u/f64x2:sat` |      139 | -                  |
diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
index ab2a99b82..afcdc9c09 100644
--- a/proposals/simd/SIMD.md
+++ b/proposals/simd/SIMD.md
@@ -4,6 +4,8 @@ This specification describes a 128-bit packed *Single Instruction Multiple
 Data* (SIMD) extension to WebAssembly that can be implemented efficiently on
 current popular instruction set architectures.
 
+See also [The binary encoding of SIMD instructions](BinarySIMD.md).
+
 # Types
 
 WebAssembly is extended with a new `v128` value type and a number of new kinds

From 2d865345ff094ded9856bb75fde6e73acef50582 Mon Sep 17 00:00:00 2001
From: Jakob Stoklund Olesen <jolesen@mozilla.com>
Date: Tue, 6 Jun 2017 13:12:32 -0700
Subject: [PATCH 6/6] Typo

---
 proposals/simd/BinarySIMD.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/simd/BinarySIMD.md b/proposals/simd/BinarySIMD.md
index 736266c34..06c0774f9 100644
--- a/proposals/simd/BinarySIMD.md
+++ b/proposals/simd/BinarySIMD.md
@@ -19,7 +19,7 @@ SIMD-specific opcode in LEB128 format:
 
 ```
 instr ::= ...
-        | 0xFB simdop:varuint32 ...
+        | 0xFD simdop:varuint32 ...
 ```
 
 Some SIMD instructions have additional immediate operands following `simdop`.