diff --git a/proposals/simd/BinarySIMD.md b/proposals/simd/BinarySIMD.md new file mode 100644 index 000000000..06c0774f9 --- /dev/null +++ b/proposals/simd/BinarySIMD.md @@ -0,0 +1,169 @@ +# Binary encoding of SIMD + +This document describes the binary encoding of the SIMD value type and +instructions. + +## SIMD value type + +The `v128` value type is encoded as 0x7b: + +``` +valtype ::= ... + | 0x7B => v128 +``` + +## SIMD instruction encodings + +All SIMD instructions are encoded as a 0xfd prefix byte followed by a +SIMD-specific opcode in LEB128 format: + +``` +instr ::= ... + | 0xFD simdop:varuint32 ... +``` + +Some SIMD instructions have additional immediate operands following `simdop`. +The `v8x16.shuffle` instruction has 16 bytes after `simdop`. + +| Instruction | `simdop` | Immediate operands | +| --------------------------|---------:|--------------------| +| `v128.const` | 0 | - | +| `v128.load` | 1 | m:memarg | +| `v128.store` | 2 | m:memarg | +| `i8x16.splat` | 3 | - | +| `i16x8.splat` | 4 | - | +| `i32x4.splat` | 5 | - | +| `i64x2.splat` | 6 | - | +| `f32x4.splat` | 7 | - | +| `f64x2.splat` | 8 | - | +| `i8x16.extract_lane_s` | 9 | i:LaneIdx16 | +| `i8x16.extract_lane_u` | 10 | i:LaneIdx16 | +| `i16x8.extract_lane_s` | 11 | i:LaneIdx8 | +| `i16x8.extract_lane_u` | 12 | i:LaneIdx8 | +| `i32x4.extract_lane` | 13 | i:LaneIdx4 | +| `i64x2.extract_lane` | 14 | i:LaneIdx2 | +| `f32x4.extract_lane` | 15 | i:LaneIdx4 | +| `f64x2.extract_lane` | 16 | i:LaneIdx2 | +| `i8x16.replace_lane` | 17 | i:LaneIdx16 | +| `i16x8.replace_lane` | 18 | i:LaneIdx8 | +| `i32x4.replace_lane` | 19 | i:LaneIdx4 | +| `i64x2.replace_lane` | 20 | i:LaneIdx2 | +| `f32x4.replace_lane` | 21 | i:LaneIdx4 | +| `f64x2.replace_lane` | 22 | i:LaneIdx2 | +| `v8x16.shuffle` | 23 | s:LaneIdx32[16] | +| `i8x16.add` | 24 | - | +| `i16x8.add` | 25 | - | +| `i32x4.add` | 26 | - | +| `i64x2.add` | 27 | - | +| `i8x16.sub` | 28 | - | +| `i16x8.sub` | 29 | - | +| `i32x4.sub` | 30 | - | +| `i64x2.sub` | 31 | - | +| `i8x16.mul` | 32 | - | +| `i16x8.mul` | 33 | - | +| `i32x4.mul` | 34 | - | +| `i8x16.neg` | 35 | - | +| `i16x8.neg` | 36 | - | +| `i32x4.neg` | 37 | - | +| `i64x2.neg` | 38 | - | +| `i8x16.add_saturate_s` | 39 | - | +| `i8x16.add_saturate_u` | 40 | - | +| `i16x8.add_saturate_s` | 41 | - | +| `i16x8.add_saturate_u` | 42 | - | +| `i8x16.sub_saturate_s` | 43 | - | +| `i8x16.sub_saturate_u` | 44 | - | +| `i16x8.sub_saturate_s` | 45 | - | +| `i16x8.sub_saturate_u` | 46 | - | +| `i8x16.shl` | 47 | - | +| `i16x8.shl` | 48 | - | +| `i32x4.shl` | 49 | - | +| `i64x2.shl` | 50 | - | +| `i8x16.shr_s` | 51 | - | +| `i8x16.shr_u` | 52 | - | +| `i16x8.shr_s` | 53 | - | +| `i16x8.shr_u` | 54 | - | +| `i32x4.shr_s` | 55 | - | +| `i32x4.shr_u` | 56 | - | +| `i64x2.shr_s` | 57 | - | +| `i64x2.shr_u` | 58 | - | +| `v128.and` | 59 | - | +| `v128.or` | 60 | - | +| `v128.xor` | 61 | - | +| `v128.not` | 62 | - | +| `v128.bitselect` | 63 | - | +| `i8x16.any_true` | 64 | - | +| `i16x8.any_true` | 65 | - | +| `i32x4.any_true` | 66 | - | +| `i64x2.any_true` | 67 | - | +| `i8x16.all_true` | 68 | - | +| `i16x8.all_true` | 69 | - | +| `i32x4.all_true` | 70 | - | +| `i64x2.all_true` | 71 | - | +| `i8x16.eq` | 72 | - | +| `i16x8.eq` | 73 | - | +| `i32x4.eq` | 74 | - | +| `f32x4.eq` | 75 | - | +| `f64x2.eq` | 76 | - | +| `i8x16.ne` | 77 | - | +| `i16x8.ne` | 78 | - | +| `i32x4.ne` | 79 | - | +| `f32x4.ne` | 80 | - | +| `f64x2.ne` | 81 | - | +| `i8x16.lt_s` | 82 | - | +| `i8x16.lt_u` | 83 | - | +| `i16x8.lt_s` | 84 | - | +| `i16x8.lt_u` | 85 | - | +| `i32x4.lt_s` | 86 | - | +| `i32x4.lt_u` | 87 | - | +| `f32x4.lt` | 88 | - | +| `f64x2.lt` | 89 | - | +| `i8x16.le_s` | 90 | - | +| `i8x16.le_u` | 91 | - | +| `i16x8.le_s` | 92 | - | +| `i16x8.le_u` | 93 | - | +| `i32x4.le_s` | 94 | - | +| `i32x4.le_u` | 95 | - | +| `f32x4.le` | 96 | - | +| `f64x2.le` | 97 | - | +| `i8x16.gt_s` | 98 | - | +| `i8x16.gt_u` | 99 | - | +| `i16x8.gt_s` | 100 | - | +| `i16x8.gt_u` | 101 | - | +| `i32x4.gt_s` | 102 | - | +| `i32x4.gt_u` | 103 | - | +| `f32x4.gt` | 104 | - | +| `f64x2.gt` | 105 | - | +| `i8x16.ge_s` | 106 | - | +| `i8x16.ge_u` | 107 | - | +| `i16x8.ge_s` | 108 | - | +| `i16x8.ge_u` | 109 | - | +| `i32x4.ge_s` | 110 | - | +| `i32x4.ge_u` | 111 | - | +| `f32x4.ge` | 112 | - | +| `f64x2.ge` | 113 | - | +| `f32x4.neg` | 114 | - | +| `f64x2.neg` | 115 | - | +| `f32x4.abs` | 116 | - | +| `f64x2.abs` | 117 | - | +| `f32x4.min` | 118 | - | +| `f64x2.min` | 119 | - | +| `f32x4.max` | 120 | - | +| `f64x2.max` | 121 | - | +| `f32x4.add` | 122 | - | +| `f64x2.add` | 123 | - | +| `f32x4.sub` | 124 | - | +| `f64x2.sub` | 125 | - | +| `f32x4.div` | 126 | - | +| `f64x2.div` | 127 | - | +| `f32x4.mul` | 128 | - | +| `f64x2.mul` | 129 | - | +| `f32x4.sqrt` | 130 | - | +| `f64x2.sqrt` | 131 | - | +| `f32x4.convert_s/i32x4` | 132 | - | +| `f32x4.convert_u/i32x4` | 133 | - | +| `f64x2.convert_s/i64x2` | 134 | - | +| `f64x2.convert_u/i64x2` | 135 | - | +| `i32x4.trunc_s/f32x4:sat` | 136 | - | +| `i32x4.trunc_u/f32x4:sat` | 137 | - | +| `i64x2.trunc_s/f64x2:sat` | 138 | - | +| `i64x2.trunc_u/f64x2:sat` | 139 | - | diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md index f843a5b23..afcdc9c09 100644 --- a/proposals/simd/SIMD.md +++ b/proposals/simd/SIMD.md @@ -4,47 +4,28 @@ This specification describes a 128-bit packed *Single Instruction Multiple Data* (SIMD) extension to WebAssembly that can be implemented efficiently on current popular instruction set architectures. -# Types - -WebAssembly is extended with five new value types and a number of new kinds of -immediate operands used by the SIMD instructions. +See also [The binary encoding of SIMD instructions](BinarySIMD.md). -## SIMD value types +# Types -The `v128` type has a concrete mapping to a 128-bit representation. The boolean -types do not have a bit-pattern representation. +WebAssembly is extended with a new `v128` value type and a number of new kinds +of immediate operands used by the SIMD instructions. -* `v128`: A 128-bit SIMD vector. Bits are numbered 0–127. -* `b8x16`: A vector of 16 `boolean` lanes numbered 0–15. -* `b16x8`: A vector of 8 `boolean` lanes numbered 0–7. -* `b32x4`: A vector of 4 `boolean` lanes numbered 0–3. -* `b64x2`: A vector of 2 `boolean` lanes numbered 0–1. +## SIMD value type -The `v128` type corresponds to a vector register in a typical SIMD ISA. The -interpretation of the 128 bits in the vector register is provided by the -individual instructions. When a `v128` value is represented as 16 bytes, bits -0-7 go in the first byte with bit 0 as the LSB, bits 8-15 go in the second +The `v128` value type has a concrete mapping to a 128-bit representation with bits +numbered 0–127. The `v128` type corresponds to a vector register in a typical +SIMD ISA. The interpretation of the 128 bits in the vector register is provided +by the individual instructions. When a `v128` value is represented as 16 bytes, +bits 0-7 go in the first byte with bit 0 as the LSB, bits 8-15 go in the second byte, etc. -The abstract boolean vector types can be mapped to vector registers or predicate -registers by an implementation. They have a property `S.Lanes` which is used by -the pseudo-code below: - -| S | S.Lanes | -|---------|--------:| -| `b8x16` | 16 | -| `b16x8` | 8 | -| `b32x4` | 4 | -| `b64x2` | 2 | - ## Immediate operands Some of the new SIMD instructions defined here have immediate operands that are encoded as individual bytes in the binary encoding. Many have a limited valid range, and it is a validation error if the immediate operands are out of range. -* `ImmBits2`: A byte with values in the range 0-3 used to initialize a `b64x2`. -* `ImmBits4`: A byte with values in the range 0-15 used to initialize a `b32x4`. * `ImmByte`: A single unconstrained byte (0-255). * `LaneIdx2`: A byte with values in the range 0–1 identifying a lane. * `LaneIdx4`: A byte with values in the range 0–3 identifying a lane. @@ -52,14 +33,12 @@ range, and it is a validation error if the immediate operands are out of range. * `LaneIdx16`: A byte with values in the range 0–15 identifying a lane. * `LaneIdx32`: A byte with values in the range 0–31 identifying a lane. -## Interpreting SIMD value types +## Interpreting the SIMD value type The single `v128` SIMD type can represent packed data in multiple ways. Instructions specify how the bits should be interpreted through a hierarchy of *interpretations*. -The boolean vector types only have the one interpretation given by their type. - ### Lane division interpretation The first level of interpretations of the `v128` type imposes a lane structure on @@ -74,12 +53,12 @@ The lane dividing interpretations don't say anything about the semantics of the bits in each lane. The interpretations have *properties* used by the semantic specification pseudo-code below: -| S | S.LaneBits | S.Lanes | S.BoolType | +| S | S.LaneBits | S.Lanes | S.MaskType | |---------|-----------:|--------:|:----------:| -| `v8x16` | 8 | 16 | `b8x16` | -| `v16x8` | 16 | 8 | `b16x8` | -| `v32x4` | 32 | 4 | `b32x4` | -| `v64x2` | 64 | 2 | `b64x2` | +| `v8x16` | 8 | 16 | `i8x16` | +| `v16x8` | 16 | 8 | `i16x8` | +| `v32x4` | 32 | 4 | `i32x4` | +| `v64x2` | 64 | 2 | `i64x2` | Since WebAssembly is little-endian, the least significant bit in each lane is the bit with the lowest number. @@ -147,35 +126,28 @@ def S.lanewise_binary(func, a, b): return result ``` -Comparison operators produce a boolean vector: +Comparison operators produce a mask vector where the bits in each lane are 0 +for false and all ones for true: ```python def S.lanewise_comparison(func, a, b): - result = S.BoolType.New() + all_ones = S.MaskType.Umax + result = S.MaskType.New() for i in range(S.Lanes): - result[i] = func(a[i], b[i]) + result[i] = all_ones if func(a[i], b[i]) else 0 return result ``` ## Constructing SIMD values -### Constants +### Constant * `v128.const(imm: ImmByte[16]) -> v128` -* `b8x16.const(imm: ImmByte[2]) -> b8x16` -* `b16x8.const(imm: ImmByte) -> b16x8` -* `b32x4.const(imm: ImmBits4) -> b32x4` -* `b64x2.const(imm: ImmBits2) -> b64x2` Materialize a constant SIMD value from the immediate operands. The `v128.const` instruction is encoded with 16 immediate bytes which provide the bits of the -vector directly. The boolean constants are encoded with one bit per lane such -that lane 0 is the LSB of the first immediate byte. +vector directly. ### Create vector with identical lanes -* `b8x16.splat(x: i32) -> b8x16` -* `b16x8.splat(x: i32) -> b16x8` -* `b32x4.splat(x: i32) -> b32x4` -* `b64x2.splat(x: i32) -> b64x2` * `i8x16.splat(x: i32) -> v128` * `i16x8.splat(x: i32) -> v128` * `i32x4.splat(x: i32) -> v128` @@ -193,17 +165,9 @@ def S.splat(x): return result ``` -The boolean vector splats will create a vector with all false lanes if `x` is -zero, all true lanes otherwise. The `i8x16.splat` and `i16x8.splat` -instructions ignore the high bits of `x`. - ## Accessing lanes ### Extract lane as a scalar -* `b8x16.extract_lane(a: b8x16, i: LaneIdx16) -> i32` -* `b16x8.extract_lane(a: b16x8, i: LaneIdx8) -> i32` -* `b32x4.extract_lane(a: b32x4, i: LaneIdx4) -> i32` -* `b64x2.extract_lane(a: b64x2, i: LaneIdx2) -> i32` * `i8x16.extract_lane_s(a: v128, i: LaneIdx16) -> i32` * `i8x16.extract_lane_u(a: v128, i: LaneIdx16) -> i32` * `i16x8.extract_lane_s(a: v128, i: LaneIdx8) -> i32` @@ -221,14 +185,9 @@ def S.extract_lane(a, i): ``` The `_s` and `_u` variants will sign-extend or zero-extend the lane value to -`i32` respectively. Boolean lanes are returned as an `i32` with the value 0 or -1. +`i32` respectively. ### Replace lane value -* `b8x16.replace_lane(a: b8x16, i: LaneIdx16, x: i32) -> b8x16` -* `b16x8.replace_lane(a: b16x8, i: LaneIdx8, x: i32) -> b16x8` -* `b32x4.replace_lane(a: b32x4, i: LaneIdx4, x: i32) -> b32x4` -* `b64x2.replace_lane(a: b64x2, i: LaneIdx2, x: i32) -> b64x2` * `i8x16.replace_lane(a: v128, i: LaneIdx16, x: i32) -> v128` * `i16x8.replace_lane(a: v128, i: LaneIdx8, x: i32) -> v128` * `i32x4.replace_lane(a: v128, i: LaneIdx4, x: i32) -> v128` @@ -249,53 +208,10 @@ def S.replace_lane(a, i, x): ``` The input lane value, `x`, is interpreted the same way as for the splat -instructions. For the boolean vectors, non-zero means true; for the `i8` and -`i16` lanes, the high bits of `x` are ignored. - -### Lane-wise select -* `v8x16.select(s: b8x16, t: v128, f: v128) -> v128` -* `v16x8.select(s: b16x8, t: v128, f: v128) -> v128` -* `v32x4.select(s: b32x4, t: v128, f: v128) -> v128` -* `v64x2.select(s: b64x2, t: v128, f: v128) -> v128` - -Use a boolean vector to select lanes from two numerical vectors. - -```python -def S.select(s, t, f): - result = S.New() - for i in range(S.Lanes): - if s[i]: - result[i] = t[i] - else - result[i] = f[i] - return result -``` - -Note that the normal WebAssembly `select` instruction also works with vector -types. It selects between two whole vectors controlled by a scalar value, -rather than selecting lanes controlled by a boolean vector. - -### Swizzle lanes -* `v8x16.swizzle(a: v128, s: LaneIdx16[16]) -> v128` -* `v16x8.swizzle(a: v128, s: LaneIdx8[8]) -> v128` -* `v32x4.swizzle(a: v128, s: LaneIdx4[4]) -> v128` -* `v64x2.swizzle(a: v128, s: LaneIdx2[2]) -> v128` - -Create vector with lanes rearranged: - -```python -def S.swizzle(a, s): - result = S.New() - for i in range(S.Lanes): - result[i] = a[s[i]] - return result -``` +instructions. For the `i8` and `i16` lanes, the high bits of `x` are ignored. ### Shuffle lanes * `v8x16.shuffle(a: v128, b: v128, s: LaneIdx32[16]) -> v128` -* `v16x8.shuffle(a: v128, b: v128, s: LaneIdx16[8]) -> v128` -* `v32x4.shuffle(a: v128, b: v128, s: LaneIdx8[4]) -> v128` -* `v64x2.shuffle(a: v128, b: v128, s: LaneIdx4[2]) -> v128` Create vector with lanes selected from the lanes of two input vectors: @@ -357,7 +273,6 @@ def S.sub(a, b): * `i8x16.mul(a: v128, b: v128) -> v128` * `i16x8.mul(a: v128, b: v128) -> v128` * `i32x4.mul(a: v128, b: v128) -> v128` -* `i64x2.mul(a: v128, b: v128) -> v128` Lane-wise wrapping integer multiplication: @@ -480,14 +395,14 @@ arithmetic right shift for the `_s` variants and a logical right shift for the `_u` variants. ```python -def S.shl_s(a, y): +def S.shr_s(a, y): # Number of bits to shift: 0 .. S.LaneBits - 1. amount = y mod S.LaneBits def shift(x): return x >> amount return S.lanewise_unary(shift, S.AsSigned(a)) -def S.shl_u(a, y): +def S.shr_u(a, y): # Number of bits to shift: 0 .. S.LaneBits - 1. amount = y mod S.LaneBits def shift(x): @@ -495,107 +410,66 @@ def S.shl_u(a, y): return S.lanewise_unary(shift, S.AsUnsigned(a)) ``` -## Logical operations - -The logical operations are defined on the boolean SIMD types. See also the -[Bitwise operations](#bitwise-operations) below. - -### Logical and -* `b8x16.and(a: b8x16, b: b8x16) -> b8x16` -* `b16x8.and(a: b16x8, b: b16x8) -> b16x8` -* `b32x4.and(a: b32x4, b: b32x4) -> b32x4` -* `b64x2.and(a: b64x2, b: b64x2) -> b64x2` - -```python -def S.and(a, b): - def logical_and(x, y): - return x and y - return S.lanewise_binary(logical_and, a, b) -``` - -### Logical or -* `b8x16.or(a: b8x16, b: b8x16) -> b8x16` -* `b16x8.or(a: b16x8, b: b16x8) -> b16x8` -* `b32x4.or(a: b32x4, b: b32x4) -> b32x4` -* `b64x2.or(a: b64x2, b: b64x2) -> b64x2` - -```python -def S.or(a, b): - def logical_or(x, y): - return x or y - return S.lanewise_binary(logical_or, a, b) -``` - -### Logical xor -* `b8x16.xor(a: b8x16, b: b8x16) -> b8x16` -* `b16x8.xor(a: b16x8, b: b16x8) -> b16x8` -* `b32x4.xor(a: b32x4, b: b32x4) -> b32x4` -* `b64x2.xor(a: b64x2, b: b64x2) -> b64x2` - -```python -def S.xor(a, b): - def logical_xor(x, y): - return x xor y - return S.lanewise_binary(logical_xor, a, b) -``` - -### Logical not -* `b8x16.not(a: b8x16) -> b8x16` -* `b16x8.not(a: b16x8) -> b16x8` -* `b32x4.not(a: b32x4) -> b32x4` -* `b64x2.not(a: b64x2) -> b64x2` - -```python -def S.not(a): - def logical_not(x): - return not x - return S.lanewise_unary(logical_not, a) -``` ## Bitwise operations -The same logical operations defined on the boolean types are also available on -the `v128` type where they operate bitwise the same way C's `&`, `|`, `^`, and -`~` operators work on an `unsigned` type. +Bitwise operations treat a `v128` value type as a vector of 128 independent bits. +### Bitwise logic * `v128.and(a: v128, b: v128) -> v128` * `v128.or(a: v128, b: v128) -> v128` * `v128.xor(a: v128, b: v128) -> v128` * `v128.not(a: v128) -> v128` +The logical operations defined on the scalar integer types are also available +on the `v128` type where they operate bitwise the same way C's `&`, `|`, `^`, +and `~` operators work on an `unsigned` type. + +### Bitwise select +* `v128.bitselect(v1: v128, v2: v128, c: v128) -> v128` + +Use the bits in the control mask `c` to select the corresponding bit from `v1` +when 1 and `v2` when 0. +This is the same as `v128.or(v128.and(v1, c), v128.and(v2, v128.not(c)))`. + +Note that the normal WebAssembly `select` instruction also works with vector +types. It selects between two whole vectors controlled by a single scalar value, +rather than selecting bits controlled by a control mask vector. + + ## Boolean horizontal reductions -These operations reduce all the lanes of a boolean vector to a single scalar -boolean value. +These operations reduce all the lanes of an integer vector to a single scalar +0 or 1 value. A lane is considered "true" if it is non-zero. ### Any lane true -* `b8x16.any_true(a: b8x16) -> i32` -* `b16x8.any_true(a: b16x8) -> i32` -* `b32x4.any_true(a: b32x4) -> i32` -* `b64x2.any_true(a: b64x2) -> i32` +* `i8x16.any_true(a: v128) -> i32` +* `i16x8.any_true(a: v128) -> i32` +* `i32x4.any_true(a: v128) -> i32` +* `i64x2.any_true(a: v128) -> i32` -These functions return 1 if any lane in `a` is true, 0 otherwise. +These functions return 1 if any lane in `a` is non-zero, 0 otherwise. ```python def S.any_true(a): for i in range(S.Lanes): - if a[i]: + if a[i] != 0: return 1 return 0 ``` ### All lanes true -* `b8x16.all_true(a: b8x16) -> i32` -* `b16x8.all_true(a: b16x8) -> i32` -* `b32x4.all_true(a: b32x4) -> i32` -* `b64x2.all_true(a: b64x2) -> i32` +* `i8x16.all_true(a: v128) -> i32` +* `i16x8.all_true(a: v128) -> i32` +* `i32x4.all_true(a: v128) -> i32` +* `i64x2.all_true(a: v128) -> i32` -These functions return 1 if all lanes in `a` are true, 0 otherwise. +These functions return 1 if all lanes in `a` are non-zero, 0 otherwise. ```python def S.all_true(a): for i in range(S.Lanes): - if not a[i]: + if a[i] == 0: return 0 return 1 ``` @@ -603,15 +477,14 @@ def S.all_true(a): ## Comparisons The comparison operations all compare two vectors lane-wise, and produce a -boolean vector with the same number of lanes as the input interpretation. +mask vector with the same number of lanes as the input interpretation. ### Equality -* `i8x16.eq(a: v128, b: v128) -> b8x16` -* `i16x8.eq(a: v128, b: v128) -> b16x8` -* `i32x4.eq(a: v128, b: v128) -> b32x4` -* `i64x2.eq(a: v128, b: v128) -> b64x2` -* `f32x4.eq(a: v128, b: v128) -> b32x4` -* `f64x2.eq(a: v128, b: v128) -> b64x2` +* `i8x16.eq(a: v128, b: v128) -> v128` +* `i16x8.eq(a: v128, b: v128) -> v128` +* `i32x4.eq(a: v128, b: v128) -> v128` +* `f32x4.eq(a: v128, b: v128) -> v128` +* `f64x2.eq(a: v128, b: v128) -> v128` Integer equality is independent of the signed/unsigned interpretation. Floating point equality follows IEEE semantics, so a NaN lane compares not equal with @@ -625,12 +498,11 @@ def S.eq(a, b): ``` ### Non-equality -* `i8x16.ne(a: v128, b: v128) -> b8x16` -* `i16x8.ne(a: v128, b: v128) -> b16x8` -* `i32x4.ne(a: v128, b: v128) -> b32x4` -* `i64x2.ne(a: v128, b: v128) -> b64x2` -* `f32x4.ne(a: v128, b: v128) -> b32x4` -* `f64x2.ne(a: v128, b: v128) -> b64x2` +* `i8x16.ne(a: v128, b: v128) -> v128` +* `i16x8.ne(a: v128, b: v128) -> v128` +* `i32x4.ne(a: v128, b: v128) -> v128` +* `f32x4.ne(a: v128, b: v128) -> v128` +* `f64x2.ne(a: v128, b: v128) -> v128` The `ne` operations produce the inverse of their `ne` counterparts: @@ -642,62 +514,51 @@ def S.ne(a, b): ``` ### Less than -* `i8x16.lt_s(a: v128, b: v128) -> b8x16` -* `i8x16.lt_u(a: v128, b: v128) -> b8x16` -* `i16x8.lt_s(a: v128, b: v128) -> b16x8` -* `i16x8.lt_u(a: v128, b: v128) -> b16x8` -* `i32x4.lt_s(a: v128, b: v128) -> b32x4` -* `i32x4.lt_u(a: v128, b: v128) -> b32x4` -* `i64x2.lt_s(a: v128, b: v128) -> b64x2` -* `i64x2.lt_u(a: v128, b: v128) -> b64x2` -* `f32x4.lt(a: v128, b: v128) -> b32x4` -* `f64x2.lt(a: v128, b: v128) -> b64x2` +* `i8x16.lt_s(a: v128, b: v128) -> v128` +* `i8x16.lt_u(a: v128, b: v128) -> v128` +* `i16x8.lt_s(a: v128, b: v128) -> v128` +* `i16x8.lt_u(a: v128, b: v128) -> v128` +* `i32x4.lt_s(a: v128, b: v128) -> v128` +* `i32x4.lt_u(a: v128, b: v128) -> v128` +* `f32x4.lt(a: v128, b: v128) -> v128` +* `f64x2.lt(a: v128, b: v128) -> v128` ### Less than or equal -* `i8x16.le_s(a: v128, b: v128) -> b8x16` -* `i8x16.le_u(a: v128, b: v128) -> b8x16` -* `i16x8.le_s(a: v128, b: v128) -> b16x8` -* `i16x8.le_u(a: v128, b: v128) -> b16x8` -* `i32x4.le_s(a: v128, b: v128) -> b32x4` -* `i32x4.le_u(a: v128, b: v128) -> b32x4` -* `i64x2.le_s(a: v128, b: v128) -> b64x2` -* `i64x2.le_u(a: v128, b: v128) -> b64x2` -* `f32x4.le(a: v128, b: v128) -> b32x4` -* `f64x2.le(a: v128, b: v128) -> b64x2` +* `i8x16.le_s(a: v128, b: v128) -> v128` +* `i8x16.le_u(a: v128, b: v128) -> v128` +* `i16x8.le_s(a: v128, b: v128) -> v128` +* `i16x8.le_u(a: v128, b: v128) -> v128` +* `i32x4.le_s(a: v128, b: v128) -> v128` +* `i32x4.le_u(a: v128, b: v128) -> v128` +* `f32x4.le(a: v128, b: v128) -> v128` +* `f64x2.le(a: v128, b: v128) -> v128` ### Greater than -* `i8x16.gt_s(a: v128, b: v128) -> b8x16` -* `i8x16.gt_u(a: v128, b: v128) -> b8x16` -* `i16x8.gt_s(a: v128, b: v128) -> b16x8` -* `i16x8.gt_u(a: v128, b: v128) -> b16x8` -* `i32x4.gt_s(a: v128, b: v128) -> b32x4` -* `i32x4.gt_u(a: v128, b: v128) -> b32x4` -* `i64x2.gt_s(a: v128, b: v128) -> b64x2` -* `i64x2.gt_u(a: v128, b: v128) -> b64x2` -* `f32x4.gt(a: v128, b: v128) -> b32x4` -* `f64x2.gt(a: v128, b: v128) -> b64x2` +* `i8x16.gt_s(a: v128, b: v128) -> v128` +* `i8x16.gt_u(a: v128, b: v128) -> v128` +* `i16x8.gt_s(a: v128, b: v128) -> v128` +* `i16x8.gt_u(a: v128, b: v128) -> v128` +* `i32x4.gt_s(a: v128, b: v128) -> v128` +* `i32x4.gt_u(a: v128, b: v128) -> v128` +* `f32x4.gt(a: v128, b: v128) -> v128` +* `f64x2.gt(a: v128, b: v128) -> v128` ### Greater than or equal -* `i8x16.ge_s(a: v128, b: v128) -> b8x16` -* `i8x16.ge_u(a: v128, b: v128) -> b8x16` -* `i16x8.ge_s(a: v128, b: v128) -> b16x8` -* `i16x8.ge_u(a: v128, b: v128) -> b16x8` -* `i32x4.ge_s(a: v128, b: v128) -> b32x4` -* `i32x4.ge_u(a: v128, b: v128) -> b32x4` -* `i64x2.ge_s(a: v128, b: v128) -> b64x2` -* `i64x2.ge_u(a: v128, b: v128) -> b64x2` -* `f32x4.ge(a: v128, b: v128) -> b32x4` -* `f64x2.ge(a: v128, b: v128) -> b64x2` +* `i8x16.ge_s(a: v128, b: v128) -> v128` +* `i8x16.ge_u(a: v128, b: v128) -> v128` +* `i16x8.ge_s(a: v128, b: v128) -> v128` +* `i16x8.ge_u(a: v128, b: v128) -> v128` +* `i32x4.ge_s(a: v128, b: v128) -> v128` +* `i32x4.ge_u(a: v128, b: v128) -> v128` +* `f32x4.ge(a: v128, b: v128) -> v128` +* `f64x2.ge(a: v128, b: v128) -> v128` ## Load and store -Load and store operations are provided for `v128` vectors, but not for the -boolean vectors; we don't want to prescribe a bitwise representation of the -boolean vectors. - -The memory operations take the same arguments and have the same semantics as -the existing scalar WebAssembly load and store instructions. The difference is -that the memory access size is 16 bytes which is also the natural alignment. +Load and store operations are provided for the `v128` vectors. The memory +operations take the same arguments and have the same semantics as the existing +scalar WebAssembly load and store instructions. The difference is that the +memory access size is 16 bytes which is also the natural alignment. ### Load @@ -803,13 +664,14 @@ Lane-wise IEEE `squareRoot`. Lane-wise conversion from integer to floating point. Some integer values will be rounded. -### Floating point to integer -* `i32x4.trunc_s/f32x4(a: v128) -> v128` -* `i32x4.trunc_u/f32x4(a: v128) -> v128` -* `i64x2.trunc_s/f64x2(a: v128) -> v128` -* `i64x2.trunc_u/f64x2(a: v128) -> v128` - -Lane-wise conversion from floating point to integer using the IEEE -`convertToIntegerTowardZero` function. If any lane is a NaN or the rounded -integer value is outside the range of the destination type, these instructions -trap. +### Floating point to integer with saturation +* `i32x4.trunc_s/f32x4:sat(a: v128) -> v128` +* `i32x4.trunc_u/f32x4:sat(a: v128) -> v128` +* `i64x2.trunc_s/f64x2:sat(a: v128) -> v128` +* `i64x2.trunc_u/f64x2:sat(a: v128) -> v128` + +Lane-wise saturating conversion from floating point to integer using the IEEE +`convertToIntegerTowardZero` function. If any input lane is a NaN, the +resulting lane is 0. If the rounded integer value of a lane is outside the +range of the destination type, the result is saturated to the nearest +representable integer value.