diff --git a/proposals/simd/BinarySIMD.md b/proposals/simd/BinarySIMD.md
new file mode 100644
index 000000000..06c0774f9
--- /dev/null
+++ b/proposals/simd/BinarySIMD.md
@@ -0,0 +1,169 @@
+# Binary encoding of SIMD
+
+This document describes the binary encoding of the SIMD value type and
+instructions.
+
+## SIMD value type
+
+The `v128` value type is encoded as 0x7b:
+
+```
+valtype ::= ...
+          | 0x7B => v128
+```
+
+## SIMD instruction encodings
+
+All SIMD instructions are encoded as a 0xfd prefix byte followed by a
+SIMD-specific opcode in LEB128 format:
+
+```
+instr ::= ...
+        | 0xFD simdop:varuint32 ...
+```
+
+Some SIMD instructions have additional immediate operands following `simdop`.
+The `v8x16.shuffle` instruction has 16 bytes after `simdop`.
+
+| Instruction               | `simdop` | Immediate operands |
+| --------------------------|---------:|--------------------|
+| `v128.const`              |        0 | -                  |
+| `v128.load`               |        1 | m:memarg           |
+| `v128.store`              |        2 | m:memarg           |
+| `i8x16.splat`             |        3 | -                  |
+| `i16x8.splat`             |        4 | -                  |
+| `i32x4.splat`             |        5 | -                  |
+| `i64x2.splat`             |        6 | -                  |
+| `f32x4.splat`             |        7 | -                  |
+| `f64x2.splat`             |        8 | -                  |
+| `i8x16.extract_lane_s`    |        9 | i:LaneIdx16        |
+| `i8x16.extract_lane_u`    |       10 | i:LaneIdx16        |
+| `i16x8.extract_lane_s`    |       11 | i:LaneIdx8         |
+| `i16x8.extract_lane_u`    |       12 | i:LaneIdx8         |
+| `i32x4.extract_lane`      |       13 | i:LaneIdx4         |
+| `i64x2.extract_lane`      |       14 | i:LaneIdx2         |
+| `f32x4.extract_lane`      |       15 | i:LaneIdx4         |
+| `f64x2.extract_lane`      |       16 | i:LaneIdx2         |
+| `i8x16.replace_lane`      |       17 | i:LaneIdx16        |
+| `i16x8.replace_lane`      |       18 | i:LaneIdx8         |
+| `i32x4.replace_lane`      |       19 | i:LaneIdx4         |
+| `i64x2.replace_lane`      |       20 | i:LaneIdx2         |
+| `f32x4.replace_lane`      |       21 | i:LaneIdx4         |
+| `f64x2.replace_lane`      |       22 | i:LaneIdx2         |
+| `v8x16.shuffle`           |       23 | s:LaneIdx32[16]    |
+| `i8x16.add`               |       24 | -                  |
+| `i16x8.add`               |       25 | -                  |
+| `i32x4.add`               |       26 | -                  |
+| `i64x2.add`               |       27 | -                  |
+| `i8x16.sub`               |       28 | -                  |
+| `i16x8.sub`               |       29 | -                  |
+| `i32x4.sub`               |       30 | -                  |
+| `i64x2.sub`               |       31 | -                  |
+| `i8x16.mul`               |       32 | -                  |
+| `i16x8.mul`               |       33 | -                  |
+| `i32x4.mul`               |       34 | -                  |
+| `i8x16.neg`               |       35 | -                  |
+| `i16x8.neg`               |       36 | -                  |
+| `i32x4.neg`               |       37 | -                  |
+| `i64x2.neg`               |       38 | -                  |
+| `i8x16.add_saturate_s`    |       39 | -                  |
+| `i8x16.add_saturate_u`    |       40 | -                  |
+| `i16x8.add_saturate_s`    |       41 | -                  |
+| `i16x8.add_saturate_u`    |       42 | -                  |
+| `i8x16.sub_saturate_s`    |       43 | -                  |
+| `i8x16.sub_saturate_u`    |       44 | -                  |
+| `i16x8.sub_saturate_s`    |       45 | -                  |
+| `i16x8.sub_saturate_u`    |       46 | -                  |
+| `i8x16.shl`               |       47 | -                  |
+| `i16x8.shl`               |       48 | -                  |
+| `i32x4.shl`               |       49 | -                  |
+| `i64x2.shl`               |       50 | -                  |
+| `i8x16.shr_s`             |       51 | -                  |
+| `i8x16.shr_u`             |       52 | -                  |
+| `i16x8.shr_s`             |       53 | -                  |
+| `i16x8.shr_u`             |       54 | -                  |
+| `i32x4.shr_s`             |       55 | -                  |
+| `i32x4.shr_u`             |       56 | -                  |
+| `i64x2.shr_s`             |       57 | -                  |
+| `i64x2.shr_u`             |       58 | -                  |
+| `v128.and`                |       59 | -                  |
+| `v128.or`                 |       60 | -                  |
+| `v128.xor`                |       61 | -                  |
+| `v128.not`                |       62 | -                  |
+| `v128.bitselect`          |       63 | -                  |
+| `i8x16.any_true`          |       64 | -                  |
+| `i16x8.any_true`          |       65 | -                  |
+| `i32x4.any_true`          |       66 | -                  |
+| `i64x2.any_true`          |       67 | -                  |
+| `i8x16.all_true`          |       68 | -                  |
+| `i16x8.all_true`          |       69 | -                  |
+| `i32x4.all_true`          |       70 | -                  |
+| `i64x2.all_true`          |       71 | -                  |
+| `i8x16.eq`                |       72 | -                  |
+| `i16x8.eq`                |       73 | -                  |
+| `i32x4.eq`                |       74 | -                  |
+| `f32x4.eq`                |       75 | -                  |
+| `f64x2.eq`                |       76 | -                  |
+| `i8x16.ne`                |       77 | -                  |
+| `i16x8.ne`                |       78 | -                  |
+| `i32x4.ne`                |       79 | -                  |
+| `f32x4.ne`                |       80 | -                  |
+| `f64x2.ne`                |       81 | -                  |
+| `i8x16.lt_s`              |       82 | -                  |
+| `i8x16.lt_u`              |       83 | -                  |
+| `i16x8.lt_s`              |       84 | -                  |
+| `i16x8.lt_u`              |       85 | -                  |
+| `i32x4.lt_s`              |       86 | -                  |
+| `i32x4.lt_u`              |       87 | -                  |
+| `f32x4.lt`                |       88 | -                  |
+| `f64x2.lt`                |       89 | -                  |
+| `i8x16.le_s`              |       90 | -                  |
+| `i8x16.le_u`              |       91 | -                  |
+| `i16x8.le_s`              |       92 | -                  |
+| `i16x8.le_u`              |       93 | -                  |
+| `i32x4.le_s`              |       94 | -                  |
+| `i32x4.le_u`              |       95 | -                  |
+| `f32x4.le`                |       96 | -                  |
+| `f64x2.le`                |       97 | -                  |
+| `i8x16.gt_s`              |       98 | -                  |
+| `i8x16.gt_u`              |       99 | -                  |
+| `i16x8.gt_s`              |      100 | -                  |
+| `i16x8.gt_u`              |      101 | -                  |
+| `i32x4.gt_s`              |      102 | -                  |
+| `i32x4.gt_u`              |      103 | -                  |
+| `f32x4.gt`                |      104 | -                  |
+| `f64x2.gt`                |      105 | -                  |
+| `i8x16.ge_s`              |      106 | -                  |
+| `i8x16.ge_u`              |      107 | -                  |
+| `i16x8.ge_s`              |      108 | -                  |
+| `i16x8.ge_u`              |      109 | -                  |
+| `i32x4.ge_s`              |      110 | -                  |
+| `i32x4.ge_u`              |      111 | -                  |
+| `f32x4.ge`                |      112 | -                  |
+| `f64x2.ge`                |      113 | -                  |
+| `f32x4.neg`               |      114 | -                  |
+| `f64x2.neg`               |      115 | -                  |
+| `f32x4.abs`               |      116 | -                  |
+| `f64x2.abs`               |      117 | -                  |
+| `f32x4.min`               |      118 | -                  |
+| `f64x2.min`               |      119 | -                  |
+| `f32x4.max`               |      120 | -                  |
+| `f64x2.max`               |      121 | -                  |
+| `f32x4.add`               |      122 | -                  |
+| `f64x2.add`               |      123 | -                  |
+| `f32x4.sub`               |      124 | -                  |
+| `f64x2.sub`               |      125 | -                  |
+| `f32x4.div`               |      126 | -                  |
+| `f64x2.div`               |      127 | -                  |
+| `f32x4.mul`               |      128 | -                  |
+| `f64x2.mul`               |      129 | -                  |
+| `f32x4.sqrt`              |      130 | -                  |
+| `f64x2.sqrt`              |      131 | -                  |
+| `f32x4.convert_s/i32x4`   |      132 | -                  |
+| `f32x4.convert_u/i32x4`   |      133 | -                  |
+| `f64x2.convert_s/i64x2`   |      134 | -                  |
+| `f64x2.convert_u/i64x2`   |      135 | -                  |
+| `i32x4.trunc_s/f32x4:sat` |      136 | -                  |
+| `i32x4.trunc_u/f32x4:sat` |      137 | -                  |
+| `i64x2.trunc_s/f64x2:sat` |      138 | -                  |
+| `i64x2.trunc_u/f64x2:sat` |      139 | -                  |
diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
index f843a5b23..afcdc9c09 100644
--- a/proposals/simd/SIMD.md
+++ b/proposals/simd/SIMD.md
@@ -4,47 +4,28 @@ This specification describes a 128-bit packed *Single Instruction Multiple
 Data* (SIMD) extension to WebAssembly that can be implemented efficiently on
 current popular instruction set architectures.
 
-# Types
-
-WebAssembly is extended with five new value types and a number of new kinds of
-immediate operands used by the SIMD instructions.
+See also [The binary encoding of SIMD instructions](BinarySIMD.md).
 
-## SIMD value types
+# Types
 
-The `v128` type has a concrete mapping to a 128-bit representation. The boolean
-types do not have a bit-pattern representation.
+WebAssembly is extended with a new `v128` value type and a number of new kinds
+of immediate operands used by the SIMD instructions.
 
-* `v128`: A 128-bit SIMD vector. Bits are numbered 0–127.
-* `b8x16`: A vector of 16 `boolean` lanes numbered 0–15.
-* `b16x8`: A vector of 8 `boolean` lanes numbered 0–7.
-* `b32x4`: A vector of 4 `boolean` lanes numbered 0–3.
-* `b64x2`: A vector of 2 `boolean` lanes numbered 0–1.
+## SIMD value type
 
-The `v128` type corresponds to a vector register in a typical SIMD ISA. The
-interpretation of the 128 bits in the vector register is provided by the
-individual instructions. When a `v128` value is represented as 16 bytes, bits
-0-7 go in the first byte with bit 0 as the LSB, bits 8-15 go in the second
+The `v128` value type has a concrete mapping to a 128-bit representation with bits
+numbered 0–127. The `v128` type corresponds to a vector register in a typical
+SIMD ISA. The interpretation of the 128 bits in the vector register is provided
+by the individual instructions. When a `v128` value is represented as 16 bytes,
+bits 0-7 go in the first byte with bit 0 as the LSB, bits 8-15 go in the second
 byte, etc.
 
-The abstract boolean vector types can be mapped to vector registers or predicate
-registers by an implementation. They have a property `S.Lanes` which is used by
-the pseudo-code below:
-
-|    S    | S.Lanes |
-|---------|--------:|
-| `b8x16` |      16 |
-| `b16x8` |       8 |
-| `b32x4` |       4 |
-| `b64x2` |       2 |
-
 ## Immediate operands
 
 Some of the new SIMD instructions defined here have immediate operands that are
 encoded as individual bytes in the binary encoding. Many have a limited valid
 range, and it is a validation error if the immediate operands are out of range.
 
-* `ImmBits2`: A byte with values in the range 0-3 used to initialize a `b64x2`.
-* `ImmBits4`: A byte with values in the range 0-15 used to initialize a `b32x4`.
 * `ImmByte`: A single unconstrained byte (0-255).
 * `LaneIdx2`: A byte with values in the range 0–1 identifying a lane.
 * `LaneIdx4`: A byte with values in the range 0–3 identifying a lane.
@@ -52,14 +33,12 @@ range, and it is a validation error if the immediate operands are out of range.
 * `LaneIdx16`: A byte with values in the range 0–15 identifying a lane.
 * `LaneIdx32`: A byte with values in the range 0–31 identifying a lane.
 
-## Interpreting SIMD value types
+## Interpreting the SIMD value type
 
 The single `v128` SIMD type can represent packed data in multiple ways.
 Instructions specify how the bits should be interpreted through a hierarchy of
 *interpretations*.
 
-The boolean vector types only have the one interpretation given by their type.
-
 ### Lane division interpretation
 
 The first level of interpretations of the `v128` type imposes a lane structure on
@@ -74,12 +53,12 @@ The lane dividing interpretations don't say anything about the semantics of the
 bits in each lane. The interpretations have *properties* used by the semantic
 specification pseudo-code below:
 
-|    S    | S.LaneBits | S.Lanes | S.BoolType |
+|    S    | S.LaneBits | S.Lanes | S.MaskType |
 |---------|-----------:|--------:|:----------:|
-| `v8x16` |          8 |      16 | `b8x16`    |
-| `v16x8` |         16 |       8 | `b16x8`    |
-| `v32x4` |         32 |       4 | `b32x4`    |
-| `v64x2` |         64 |       2 | `b64x2`    |
+| `v8x16` |          8 |      16 | `i8x16`    |
+| `v16x8` |         16 |       8 | `i16x8`    |
+| `v32x4` |         32 |       4 | `i32x4`    |
+| `v64x2` |         64 |       2 | `i64x2`    |
 
 Since WebAssembly is little-endian, the least significant bit in each lane is
 the bit with the lowest number.
@@ -147,35 +126,28 @@ def S.lanewise_binary(func, a, b):
     return result
 ```
 
-Comparison operators produce a boolean vector:
+Comparison operators produce a mask vector where the bits in each lane are 0
+for false and all ones for true:
 
 ```python
 def S.lanewise_comparison(func, a, b):
-    result = S.BoolType.New()
+    all_ones = S.MaskType.Umax
+    result = S.MaskType.New()
     for i in range(S.Lanes):
-        result[i] = func(a[i], b[i])
+        result[i] = all_ones if func(a[i], b[i]) else 0
     return result
 ```
 
 ## Constructing SIMD values
 
-### Constants
+### Constant
 * `v128.const(imm: ImmByte[16]) -> v128`
-* `b8x16.const(imm: ImmByte[2]) -> b8x16`
-* `b16x8.const(imm: ImmByte) -> b16x8`
-* `b32x4.const(imm: ImmBits4) -> b32x4`
-* `b64x2.const(imm: ImmBits2) -> b64x2`
 
 Materialize a constant SIMD value from the immediate operands. The `v128.const`
 instruction is encoded with 16 immediate bytes which provide the bits of the
-vector directly. The boolean constants are encoded with one bit per lane such
-that lane 0 is the LSB of the first immediate byte.
+vector directly.
 
 ### Create vector with identical lanes
-* `b8x16.splat(x: i32) -> b8x16`
-* `b16x8.splat(x: i32) -> b16x8`
-* `b32x4.splat(x: i32) -> b32x4`
-* `b64x2.splat(x: i32) -> b64x2`
 * `i8x16.splat(x: i32) -> v128`
 * `i16x8.splat(x: i32) -> v128`
 * `i32x4.splat(x: i32) -> v128`
@@ -193,17 +165,9 @@ def S.splat(x):
     return result
 ```
 
-The boolean vector splats will create a vector with all false lanes if `x` is
-zero, all true lanes otherwise. The `i8x16.splat` and `i16x8.splat`
-instructions ignore the high bits of `x`.
-
 ## Accessing lanes
 
 ### Extract lane as a scalar
-* `b8x16.extract_lane(a: b8x16, i: LaneIdx16) -> i32`
-* `b16x8.extract_lane(a: b16x8, i: LaneIdx8) -> i32`
-* `b32x4.extract_lane(a: b32x4, i: LaneIdx4) -> i32`
-* `b64x2.extract_lane(a: b64x2, i: LaneIdx2) -> i32`
 * `i8x16.extract_lane_s(a: v128, i: LaneIdx16) -> i32`
 * `i8x16.extract_lane_u(a: v128, i: LaneIdx16) -> i32`
 * `i16x8.extract_lane_s(a: v128, i: LaneIdx8) -> i32`
@@ -221,14 +185,9 @@ def S.extract_lane(a, i):
 ```
 
 The `_s` and `_u` variants will sign-extend or zero-extend the lane value to
-`i32` respectively. Boolean lanes are returned as an `i32` with the value 0 or
-1.
+`i32` respectively.
 
 ### Replace lane value
-* `b8x16.replace_lane(a: b8x16, i: LaneIdx16, x: i32) -> b8x16`
-* `b16x8.replace_lane(a: b16x8, i: LaneIdx8, x: i32) -> b16x8`
-* `b32x4.replace_lane(a: b32x4, i: LaneIdx4, x: i32) -> b32x4`
-* `b64x2.replace_lane(a: b64x2, i: LaneIdx2, x: i32) -> b64x2`
 * `i8x16.replace_lane(a: v128, i: LaneIdx16, x: i32) -> v128`
 * `i16x8.replace_lane(a: v128, i: LaneIdx8, x: i32) -> v128`
 * `i32x4.replace_lane(a: v128, i: LaneIdx4, x: i32) -> v128`
@@ -249,53 +208,10 @@ def S.replace_lane(a, i, x):
 ```
 
 The input lane value, `x`, is interpreted the same way as for the splat
-instructions. For the boolean vectors, non-zero means true; for the `i8` and
-`i16` lanes, the high bits of `x` are ignored.
-
-### Lane-wise select
-* `v8x16.select(s: b8x16, t: v128, f: v128) -> v128`
-* `v16x8.select(s: b16x8, t: v128, f: v128) -> v128`
-* `v32x4.select(s: b32x4, t: v128, f: v128) -> v128`
-* `v64x2.select(s: b64x2, t: v128, f: v128) -> v128`
-
-Use a boolean vector to select lanes from two numerical vectors.
-
-```python
-def S.select(s, t, f):
-    result = S.New()
-    for i in range(S.Lanes):
-        if s[i]:
-            result[i] = t[i]
-        else
-            result[i] = f[i]
-    return result
-```
-
-Note that the normal WebAssembly `select` instruction also works with vector
-types. It selects between two whole vectors controlled by a scalar value,
-rather than selecting lanes controlled by a boolean vector.
-
-### Swizzle lanes
-* `v8x16.swizzle(a: v128, s: LaneIdx16[16]) -> v128`
-* `v16x8.swizzle(a: v128, s: LaneIdx8[8]) -> v128`
-* `v32x4.swizzle(a: v128, s: LaneIdx4[4]) -> v128`
-* `v64x2.swizzle(a: v128, s: LaneIdx2[2]) -> v128`
-
-Create vector with lanes rearranged:
-
-```python
-def S.swizzle(a, s):
-    result = S.New()
-    for i in range(S.Lanes):
-        result[i] = a[s[i]]
-    return result
-```
+instructions. For the `i8` and `i16` lanes, the high bits of `x` are ignored.
 
 ### Shuffle lanes
 * `v8x16.shuffle(a: v128, b: v128, s: LaneIdx32[16]) -> v128`
-* `v16x8.shuffle(a: v128, b: v128, s: LaneIdx16[8]) -> v128`
-* `v32x4.shuffle(a: v128, b: v128, s: LaneIdx8[4]) -> v128`
-* `v64x2.shuffle(a: v128, b: v128, s: LaneIdx4[2]) -> v128`
 
 Create vector with lanes selected from the lanes of two input vectors:
 
@@ -357,7 +273,6 @@ def S.sub(a, b):
 * `i8x16.mul(a: v128, b: v128) -> v128`
 * `i16x8.mul(a: v128, b: v128) -> v128`
 * `i32x4.mul(a: v128, b: v128) -> v128`
-* `i64x2.mul(a: v128, b: v128) -> v128`
 
 Lane-wise wrapping integer multiplication:
 
@@ -480,14 +395,14 @@ arithmetic right shift for the `_s` variants and a logical right shift for the
 `_u` variants.
 
 ```python
-def S.shl_s(a, y):
+def S.shr_s(a, y):
     # Number of bits to shift: 0 .. S.LaneBits - 1.
     amount = y mod S.LaneBits
     def shift(x):
         return x >> amount
     return S.lanewise_unary(shift, S.AsSigned(a))
 
-def S.shl_u(a, y):
+def S.shr_u(a, y):
     # Number of bits to shift: 0 .. S.LaneBits - 1.
     amount = y mod S.LaneBits
     def shift(x):
@@ -495,107 +410,66 @@ def S.shl_u(a, y):
     return S.lanewise_unary(shift, S.AsUnsigned(a))
 ```
 
-## Logical operations
-
-The logical operations are defined on the boolean SIMD types. See also the
-[Bitwise operations](#bitwise-operations) below.
-
-### Logical and
-* `b8x16.and(a: b8x16, b: b8x16) -> b8x16`
-* `b16x8.and(a: b16x8, b: b16x8) -> b16x8`
-* `b32x4.and(a: b32x4, b: b32x4) -> b32x4`
-* `b64x2.and(a: b64x2, b: b64x2) -> b64x2`
-
-```python
-def S.and(a, b):
-    def logical_and(x, y):
-        return x and y
-    return S.lanewise_binary(logical_and, a, b)
-```
-
-### Logical or
-* `b8x16.or(a: b8x16, b: b8x16) -> b8x16`
-* `b16x8.or(a: b16x8, b: b16x8) -> b16x8`
-* `b32x4.or(a: b32x4, b: b32x4) -> b32x4`
-* `b64x2.or(a: b64x2, b: b64x2) -> b64x2`
-
-```python
-def S.or(a, b):
-    def logical_or(x, y):
-        return x or y
-    return S.lanewise_binary(logical_or, a, b)
-```
-
-### Logical xor
-* `b8x16.xor(a: b8x16, b: b8x16) -> b8x16`
-* `b16x8.xor(a: b16x8, b: b16x8) -> b16x8`
-* `b32x4.xor(a: b32x4, b: b32x4) -> b32x4`
-* `b64x2.xor(a: b64x2, b: b64x2) -> b64x2`
-
-```python
-def S.xor(a, b):
-    def logical_xor(x, y):
-        return x xor y
-    return S.lanewise_binary(logical_xor, a, b)
-```
-
-### Logical not
-* `b8x16.not(a: b8x16) -> b8x16`
-* `b16x8.not(a: b16x8) -> b16x8`
-* `b32x4.not(a: b32x4) -> b32x4`
-* `b64x2.not(a: b64x2) -> b64x2`
-
-```python
-def S.not(a):
-    def logical_not(x):
-        return not x
-    return S.lanewise_unary(logical_not, a)
-```
 
 ## Bitwise operations
 
-The same logical operations defined on the boolean types are also available on
-the `v128` type where they operate bitwise the same way C's `&`, `|`, `^`, and
-`~` operators work on an `unsigned` type.
+Bitwise operations treat a `v128` value type as a vector of 128 independent bits.
 
+### Bitwise logic
 * `v128.and(a: v128, b: v128) -> v128`
 * `v128.or(a: v128, b: v128) -> v128`
 * `v128.xor(a: v128, b: v128) -> v128`
 * `v128.not(a: v128) -> v128`
 
+The logical operations defined on the scalar integer types are also available
+on the `v128` type where they operate bitwise the same way C's `&`, `|`, `^`,
+and `~` operators work on an `unsigned` type.
+
+### Bitwise select
+* `v128.bitselect(v1: v128, v2: v128, c: v128) -> v128`
+
+Use the bits in the control mask `c` to select the corresponding bit from `v1`
+when 1 and `v2` when 0.
+This is the same as `v128.or(v128.and(v1, c), v128.and(v2, v128.not(c)))`.
+
+Note that the normal WebAssembly `select` instruction also works with vector
+types. It selects between two whole vectors controlled by a single scalar value,
+rather than selecting bits controlled by a control mask vector.
+
+
 ## Boolean horizontal reductions
 
-These operations reduce all the lanes of a boolean vector to a single scalar
-boolean value.
+These operations reduce all the lanes of an integer vector to a single scalar
+0 or 1 value. A lane is considered "true" if it is non-zero.
 
 ### Any lane true
-* `b8x16.any_true(a: b8x16) -> i32`
-* `b16x8.any_true(a: b16x8) -> i32`
-* `b32x4.any_true(a: b32x4) -> i32`
-* `b64x2.any_true(a: b64x2) -> i32`
+* `i8x16.any_true(a: v128) -> i32`
+* `i16x8.any_true(a: v128) -> i32`
+* `i32x4.any_true(a: v128) -> i32`
+* `i64x2.any_true(a: v128) -> i32`
 
-These functions return 1 if any lane in `a` is true, 0 otherwise.
+These functions return 1 if any lane in `a` is non-zero, 0 otherwise.
 
 ```python
 def S.any_true(a):
     for i in range(S.Lanes):
-        if a[i]:
+        if a[i] != 0:
             return 1
     return 0
 ```
 
 ### All lanes true
-* `b8x16.all_true(a: b8x16) -> i32`
-* `b16x8.all_true(a: b16x8) -> i32`
-* `b32x4.all_true(a: b32x4) -> i32`
-* `b64x2.all_true(a: b64x2) -> i32`
+* `i8x16.all_true(a: v128) -> i32`
+* `i16x8.all_true(a: v128) -> i32`
+* `i32x4.all_true(a: v128) -> i32`
+* `i64x2.all_true(a: v128) -> i32`
 
-These functions return 1 if all lanes in `a` are true, 0 otherwise.
+These functions return 1 if all lanes in `a` are non-zero, 0 otherwise.
 
 ```python
 def S.all_true(a):
     for i in range(S.Lanes):
-        if not a[i]:
+        if a[i] == 0:
             return 0
     return 1
 ```
@@ -603,15 +477,14 @@ def S.all_true(a):
 ## Comparisons
 
 The comparison operations all compare two vectors lane-wise, and produce a
-boolean vector with the same number of lanes as the input interpretation.
+mask vector with the same number of lanes as the input interpretation.
 
 ### Equality
-* `i8x16.eq(a: v128, b: v128) -> b8x16`
-* `i16x8.eq(a: v128, b: v128) -> b16x8`
-* `i32x4.eq(a: v128, b: v128) -> b32x4`
-* `i64x2.eq(a: v128, b: v128) -> b64x2`
-* `f32x4.eq(a: v128, b: v128) -> b32x4`
-* `f64x2.eq(a: v128, b: v128) -> b64x2`
+* `i8x16.eq(a: v128, b: v128) -> v128`
+* `i16x8.eq(a: v128, b: v128) -> v128`
+* `i32x4.eq(a: v128, b: v128) -> v128`
+* `f32x4.eq(a: v128, b: v128) -> v128`
+* `f64x2.eq(a: v128, b: v128) -> v128`
 
 Integer equality is independent of the signed/unsigned interpretation. Floating
 point equality follows IEEE semantics, so a NaN lane compares not equal with
@@ -625,12 +498,11 @@ def S.eq(a, b):
 ```
 
 ### Non-equality
-* `i8x16.ne(a: v128, b: v128) -> b8x16`
-* `i16x8.ne(a: v128, b: v128) -> b16x8`
-* `i32x4.ne(a: v128, b: v128) -> b32x4`
-* `i64x2.ne(a: v128, b: v128) -> b64x2`
-* `f32x4.ne(a: v128, b: v128) -> b32x4`
-* `f64x2.ne(a: v128, b: v128) -> b64x2`
+* `i8x16.ne(a: v128, b: v128) -> v128`
+* `i16x8.ne(a: v128, b: v128) -> v128`
+* `i32x4.ne(a: v128, b: v128) -> v128`
+* `f32x4.ne(a: v128, b: v128) -> v128`
+* `f64x2.ne(a: v128, b: v128) -> v128`
 
 The `ne` operations produce the inverse of their `ne` counterparts:
 
@@ -642,62 +514,51 @@ def S.ne(a, b):
 ```
 
 ### Less than
-* `i8x16.lt_s(a: v128, b: v128) -> b8x16`
-* `i8x16.lt_u(a: v128, b: v128) -> b8x16`
-* `i16x8.lt_s(a: v128, b: v128) -> b16x8`
-* `i16x8.lt_u(a: v128, b: v128) -> b16x8`
-* `i32x4.lt_s(a: v128, b: v128) -> b32x4`
-* `i32x4.lt_u(a: v128, b: v128) -> b32x4`
-* `i64x2.lt_s(a: v128, b: v128) -> b64x2`
-* `i64x2.lt_u(a: v128, b: v128) -> b64x2`
-* `f32x4.lt(a: v128, b: v128) -> b32x4`
-* `f64x2.lt(a: v128, b: v128) -> b64x2`
+* `i8x16.lt_s(a: v128, b: v128) -> v128`
+* `i8x16.lt_u(a: v128, b: v128) -> v128`
+* `i16x8.lt_s(a: v128, b: v128) -> v128`
+* `i16x8.lt_u(a: v128, b: v128) -> v128`
+* `i32x4.lt_s(a: v128, b: v128) -> v128`
+* `i32x4.lt_u(a: v128, b: v128) -> v128`
+* `f32x4.lt(a: v128, b: v128) -> v128`
+* `f64x2.lt(a: v128, b: v128) -> v128`
 
 ### Less than or equal
-* `i8x16.le_s(a: v128, b: v128) -> b8x16`
-* `i8x16.le_u(a: v128, b: v128) -> b8x16`
-* `i16x8.le_s(a: v128, b: v128) -> b16x8`
-* `i16x8.le_u(a: v128, b: v128) -> b16x8`
-* `i32x4.le_s(a: v128, b: v128) -> b32x4`
-* `i32x4.le_u(a: v128, b: v128) -> b32x4`
-* `i64x2.le_s(a: v128, b: v128) -> b64x2`
-* `i64x2.le_u(a: v128, b: v128) -> b64x2`
-* `f32x4.le(a: v128, b: v128) -> b32x4`
-* `f64x2.le(a: v128, b: v128) -> b64x2`
+* `i8x16.le_s(a: v128, b: v128) -> v128`
+* `i8x16.le_u(a: v128, b: v128) -> v128`
+* `i16x8.le_s(a: v128, b: v128) -> v128`
+* `i16x8.le_u(a: v128, b: v128) -> v128`
+* `i32x4.le_s(a: v128, b: v128) -> v128`
+* `i32x4.le_u(a: v128, b: v128) -> v128`
+* `f32x4.le(a: v128, b: v128) -> v128`
+* `f64x2.le(a: v128, b: v128) -> v128`
 
 ### Greater than
-* `i8x16.gt_s(a: v128, b: v128) -> b8x16`
-* `i8x16.gt_u(a: v128, b: v128) -> b8x16`
-* `i16x8.gt_s(a: v128, b: v128) -> b16x8`
-* `i16x8.gt_u(a: v128, b: v128) -> b16x8`
-* `i32x4.gt_s(a: v128, b: v128) -> b32x4`
-* `i32x4.gt_u(a: v128, b: v128) -> b32x4`
-* `i64x2.gt_s(a: v128, b: v128) -> b64x2`
-* `i64x2.gt_u(a: v128, b: v128) -> b64x2`
-* `f32x4.gt(a: v128, b: v128) -> b32x4`
-* `f64x2.gt(a: v128, b: v128) -> b64x2`
+* `i8x16.gt_s(a: v128, b: v128) -> v128`
+* `i8x16.gt_u(a: v128, b: v128) -> v128`
+* `i16x8.gt_s(a: v128, b: v128) -> v128`
+* `i16x8.gt_u(a: v128, b: v128) -> v128`
+* `i32x4.gt_s(a: v128, b: v128) -> v128`
+* `i32x4.gt_u(a: v128, b: v128) -> v128`
+* `f32x4.gt(a: v128, b: v128) -> v128`
+* `f64x2.gt(a: v128, b: v128) -> v128`
 
 ### Greater than or equal
-* `i8x16.ge_s(a: v128, b: v128) -> b8x16`
-* `i8x16.ge_u(a: v128, b: v128) -> b8x16`
-* `i16x8.ge_s(a: v128, b: v128) -> b16x8`
-* `i16x8.ge_u(a: v128, b: v128) -> b16x8`
-* `i32x4.ge_s(a: v128, b: v128) -> b32x4`
-* `i32x4.ge_u(a: v128, b: v128) -> b32x4`
-* `i64x2.ge_s(a: v128, b: v128) -> b64x2`
-* `i64x2.ge_u(a: v128, b: v128) -> b64x2`
-* `f32x4.ge(a: v128, b: v128) -> b32x4`
-* `f64x2.ge(a: v128, b: v128) -> b64x2`
+* `i8x16.ge_s(a: v128, b: v128) -> v128`
+* `i8x16.ge_u(a: v128, b: v128) -> v128`
+* `i16x8.ge_s(a: v128, b: v128) -> v128`
+* `i16x8.ge_u(a: v128, b: v128) -> v128`
+* `i32x4.ge_s(a: v128, b: v128) -> v128`
+* `i32x4.ge_u(a: v128, b: v128) -> v128`
+* `f32x4.ge(a: v128, b: v128) -> v128`
+* `f64x2.ge(a: v128, b: v128) -> v128`
 
 ## Load and store
 
-Load and store operations are provided for `v128` vectors, but not for the
-boolean vectors; we don't want to prescribe a bitwise representation of the
-boolean vectors.
-
-The memory operations take the same arguments and have the same semantics as
-the existing scalar WebAssembly load and store instructions. The difference is
-that the memory access size is 16 bytes which is also the natural alignment.
+Load and store operations are provided for the `v128` vectors. The memory
+operations take the same arguments and have the same semantics as the existing
+scalar WebAssembly load and store instructions. The difference is that the
+memory access size is 16 bytes which is also the natural alignment.
 
 ### Load
 
@@ -803,13 +664,14 @@ Lane-wise IEEE `squareRoot`.
 Lane-wise conversion from integer to floating point. Some integer values will be
 rounded.
 
-### Floating point to integer
-* `i32x4.trunc_s/f32x4(a: v128) -> v128`
-* `i32x4.trunc_u/f32x4(a: v128) -> v128`
-* `i64x2.trunc_s/f64x2(a: v128) -> v128`
-* `i64x2.trunc_u/f64x2(a: v128) -> v128`
-
-Lane-wise conversion from floating point to integer using the IEEE
-`convertToIntegerTowardZero` function. If any lane is a NaN or the rounded
-integer value is outside the range of the destination type, these instructions
-trap.
+### Floating point to integer with saturation
+* `i32x4.trunc_s/f32x4:sat(a: v128) -> v128`
+* `i32x4.trunc_u/f32x4:sat(a: v128) -> v128`
+* `i64x2.trunc_s/f64x2:sat(a: v128) -> v128`
+* `i64x2.trunc_u/f64x2:sat(a: v128) -> v128`
+
+Lane-wise saturating conversion from floating point to integer using the IEEE
+`convertToIntegerTowardZero` function. If any input lane is a NaN, the
+resulting lane is 0. If the rounded integer value of a lane is outside the
+range of the destination type, the result is saturated to the nearest
+representable integer value.