-
Notifications
You must be signed in to change notification settings - Fork 42
ANDNOT operation #102
ANDNOT operation #102
Conversation
|
The NOT operation being sub optimal on x86 and x86-64 stood out during implementation, and having a more optimal ANDNOT operation when supported across architectures would make sense to me. After preliminary research, it doesn't look like any of the hardware instructions these operations map to have performance issues. That said, I'd like to make sure that the operations this is widely used. Could you add examples where this is already in use? Anything that makes use of the equivalent XMM/Neon intrinsics would be helpful. |
|
GitHub search shows 127K+ files using y = _mm_andnot_ps(_mm_cmpgt_ps(x, _mm_set1_ps(-0x1.9FE368p+6f)), y);Note: inverting comparison and using |
|
Thanks @Maratyszcza! I'm in favor of merging this because this is not introducing a class of new operations, but a more efficient combination of operations that are already supported by this proposal, have an exact match on relevant architectures, is widely used as pointed out above. Are there any objections to including this in the proposal? |
dtig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like there are any objections to the inclusion of these operations, so change lgtm with minor nits. Could you add an entry to ImplementationStatus.md as well?
proposals/simd/BinarySIMD.md
Outdated
| | `v128.load` | `0x00`| m:memarg | | ||
| | `v128.store` | `0x01`| m:memarg | | ||
| | `v128.const` | `0x02`| i:ImmByte[16] | | ||
| | `v128.andnot` | `0x03`| - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this down to the end of the opcode space instead of here? This belongs with the other logical operations, but there isn't a good spot for it right now. It's what we've tried to do with other new opcodes when it doesn't fit into the current opcode space.
|
Done: added an entry in |
As specified at WebAssembly/simd#102. Also fixes bugs in the JS API for other SIMD bitwise operators.
As specified at WebAssembly/simd#102. Also fixes bugs in the JS API for other SIMD bitwise operators.
Introduction
ANDNOT is a widely supported SIMD operation which computes
a & ~b. ANDNOT is involved in a common idiom of zeroing out elements which don't satisfy a condition, i.e.In the present SIMD instruction set, a vectorized version of this snippet would require two WAsm SIMD instructions,
v128.andandv128.not:Representing ANDNOT as two instructions is inefficient on all architectures:
NOToperation. Thus, WAsm engine would typically generate three instructions: two (PXOR tmp_zero, tmp_zeroto zero a temporary register andPANDNOT b, tmp_zero(!) to emulatev128.not), and an extraPAND.This PR introduce combined ANDNOT instruction to enable WebAssembly engines to directly leverage architecture-specific ANDNOT instructions without doing complicated and expensive analysis of the instruction stream
Mapping to Common Instruction Sets
This section illustrates how the new WebAssembly instruction can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.
x86/x86-64 processors with AVX instruction set
c = v128.andnot(a, b)maps toVANDNPS xmm_c, xmm_b, xmm_a(note the inverted order of operands)x86/x86-64 processors with SSE instruction set
c = v128.andnot(a, b)maps toMOVAPS xmm_c, xmm_b+ANDNPS xmm_c, xmm_a(note the inverted order of operands)b = v128.andnot(a, b)maps toANDNPS xmm_b, xmm_a(note the inverted order of operands)ARMv7+ processors with NEON instruction set
c = v128.andnot(a, b)maps toVBIC Qc, Qa, QbARM64 processors
c = v128.andnot(a, b)maps toBIC Vc.16B, Va.16B, Vb.16BPOWER processors with Vector facility (VMX)
c = v128.andnot(a, b)maps toVANDC Vc, Va, Vb