In SSE and AVX instruction sets on x86 many instructions have separate integer, single-precision, and double-precision forms, e.g. MOVDQU/MOVUPS/MOVUPD. On "big" Intel and AMD cores, there is an extra penalty if a register produced by an integer SIMD op is consumed by a floating-point SIMD op, and vice versa.
However, WebAssembly SIMD doesn't make distinction between e.g. integer & FP loads, and although this information can, in theory, be reconstructed from instruction stream, such reconstruction requires expensive analysis passes, which streaming WebAssembly engines can not afford.
There are only few classes of ops have separate integer / floating-point instructions on x86:
- Loads and stores
- Shuffles
- Broadcasts ("load-and-splat")
- Binary logic (AND, OR, XOR, ANDNOT)
- Blends
I think it is worth to consider splitting corresponding WebAssembly instructions into separate integer and floating-point variants in the SIMD spec. Initially both compilers and WAsm engines can treat both the integer and the floating-point variants the same, but at least it will allow to properly fix it in the future. Here is the list of instructions that would need two forms:
v128.const
v8x16.shuffle
v128.and
v128.or
v128.xor
v128.not
v128.andnot
v128.bitselect (decomposed into AND, ANDNOT, and OR on x86)
v128.load
v8x16.load_splat
v16x8.load_splat
v32x4.load_splat
v64x2.load_splat
v128.store
Note that the problem is specific to the distinction between integer and floating-point SIMD instructions on x86. ARM NEON doesn't distinguish between integer/floating-point variants at ISA level, and as far as I know no x86 CPUs distinguish between "double-precision" (e.g. ANDPD) and "single-precision" (e.g. ANDPS) instructions.