This repository has been archived by the owner on Dec 22, 2021. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
This is proposal to add 64-bit variant of the existing
min_u
andmax_u
instructions. Only x86 processors with AVX512 natively support these instructions.Applications
Mapping to Common Instruction Sets
This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.
x86/x86-64 processors with AVX512F and AVX512VL instruction sets
y = i64x2.min_u(a, b)
is lowered toVPMINUQ xmm_y, xmm_a, xmm_b
y = i64x2.max_u(a, b)
is lowered toVPMAXUQ xmm_y, xmm_a, xmm_b
x86/x86-64 processors with XOP instruction set
y = i64x2.min_u(a, b)
(y
is nota
andy
is notb
) is lowered to:VPCOMGTUQ xmm_y, xmm_a, xmm_b
VPBLENDVB xmm_y, xmm_b, xmm_a, xmm_y
y = i64x2.max_u(a, b)
(y
is nota
andy
is notb
) is lowered to:VPCOMGTUQ xmm_y, xmm_a, xmm_b
VPBLENDVB xmm_y, xmm_a, xmm_b, xmm_y
x86/x86-64 processors with AVX instruction set
y = i64x2.min_u(a, b)
(y
is nota
andy
is notb
) is lowered to:VMOVDQA xmm_tmp, [wasm_i64x2_splat(0x8000000000000000)]
VPXOR xmm_y, xmm_tmp, xmm_a
VPXOR xmm_tmp, xmm_tmp, xmm_b
VPCMPGTQ xmm_y, xmm_y, xmm_tmp
VPBLENDVB xmm_y, xmm_a, xmm_b, xmm_y
y = i64x2.max_u(a, b)
(y
is nota
andy
is notb
) is lowered to:VMOVDQA xmm_tmp, [wasm_i64x2_splat(0x8000000000000000)]
VPXOR xmm_y, xmm_tmp, xmm_a
VPXOR xmm_tmp, xmm_tmp, xmm_b
VPCMPGTQ xmm_y, xmm_y, xmm_tmp
VPBLENDVB xmm_y, xmm_b, xmm_a, xmm_y
x86/x86-64 processors with SSE4.2 instruction set
y = i64x2.min_u(a, b)
(y
is nota
andy
is notb
anda
/b
/y
is not inxmm0
) is lowered to:MOVDQA xmm_y, [wasm_i64x2_splat(0x8000000000000000)]
MOVDQA xmm0, xmm_a
PXOR xmm0, xmm_y
PXOR xmm_y, xmm_b
PCMPGTQ xmm0, xmm_y
MOVDQA xmm_y, xmm_a
PBLENDVB xmm_y, xmm_b
y = i64x2.max_u(a, b)
(y
is nota
andy
is notb
anda
/b
/y
is not inxmm0
) is lowered to:MOVDQA xmm_y, [wasm_i64x2_splat(0x8000000000000000)]
MOVDQA xmm0, xmm_a
PXOR xmm0, xmm_y
PXOR xmm_y, xmm_b
PCMPGTQ xmm0, xmm_y
MOVDQA xmm_y, xmm_b
PBLENDVB xmm_y, xmm_a
x86/x86-64 processors with SSE4.1 instruction set
Based on this answer by user aqrit on Stack Overflow
y = i64x2.min_u(a, b)
(y
is nota
andy
is notb
anda
/b
/y
is not inxmm0
) is lowered to:MOVDQA xmm_y, xmm_b
MOVDQA xmm0, xmm_b
PSUBQ xmm_y, xmm_a
PXOR xmm0, xmm_a
PANDN xmm0, xmm_y
MOVDQA xmm_y, xmm_b
PANDN xmm_y, xmm_a
POR xmm0, xmm_y
PSRAD xmm0, 31
MOVDQA xmm_y, xmm_a
PSHUFD xmm0, xmm0, 0xF5
PBLENDVB xmm_y, xmm_b
y = i64x2.max_u(a, b)
(y
is nota
andy
is notb
anda
/b
/y
is not inxmm0
) is lowered to:MOVDQA xmm_y, xmm_b
MOVDQA xmm0, xmm_b
PSUBQ xmm_y, xmm_a
PXOR xmm0, xmm_a
PANDN xmm0, xmm_y
MOVDQA xmm_y, xmm_b
PANDN xmm_y, xmm_a
POR xmm0, xmm_y
PSRAD xmm0, 31
MOVDQA xmm_y, xmm_b
PSHUFD xmm0, xmm0, 0xF5
PBLENDVB xmm_y, xmm_a
x86/x86-64 processors with SSE2 instruction set
Based on this answer by user aqrit on Stack Overflow
y = i64x2.min_u(a, b)
(y
is nota
andy
is notb
) is lowered to:MOVDQA xmm_tmp, xmm_b
MOVDQA xmm_y, xmm_b
PSUBQ xmm_tmp, xmm_a
PXOR xmm_y, xmm_a
PANDN xmm_y, xmm_tmp
MOVDQA xmm_tmp, xmm_b
PANDN xmm_tmp, xmm_a
POR xmm_y, xmm_tmp
PSRAD xmm_y, 31
MOVDQA xmm_tmp, xmm_b
PSHUFD xmm_y, xmm_y, 0xF5
PAND xmm_tmp, xmm_y
PANDN xmm_y, xmm_a
POR xmm_y, xmm_tmp
y = i64x2.max_u(a, b)
(y
is nota
andy
is notb
) is lowered to:MOVDQA xmm_tmp, xmm_b
MOVDQA xmm_y, xmm_b
PSUBQ xmm_tmp, xmm_a
PXOR xmm_y, xmm_a
PANDN xmm_y, xmm_tmp
MOVDQA xmm_tmp, xmm_b
PANDN xmm_tmp, xmm_a
POR xmm_y, xmm_tmp
PSRAD xmm_y, 31
MOVDQA xmm_tmp, xmm_a
PSHUFD xmm_y, xmm_y, 0xF5
PAND xmm_tmp, xmm_y
PANDN xmm_y, xmm_b
POR xmm_y, xmm_tmp
ARM64 processors
y = i64x2.min_u(a, b)
(y
is nota
andy
is notb
) is lowered to:CMHI Vy.2D, Va.2D, Vb.2D
BSL Vy.16B, Vb.16B, Va.16B
y = i64x2.max_u(a, b)
(y
is nota
andy
is notb
) is lowered to:CMHI Vy.2D, Va.2D, Vb.2D
BSL Vy.16B, Va.16B, Vb.16B
ARMv7 processors with NEON instruction set
y = i64x2.min_u(a, b)
(y
is nota
andy
is notb
) is lowered to:VQSUB.U64 Qy, Qa, Qb
VSUB.I64 Qy, Qa, Qy
y = i64x2.max_u(a, b)
(y
is nota
andy
is notb
) is lowered to:VQSUB.U64 Qy, Qa, Qb
VADD.I64 Qy, Qb, Qy