ANDNOT operation #102

Maratyszcza · 2019-09-10T18:15:40Z

Introduction

ANDNOT is a widely supported SIMD operation which computes a & ~b. ANDNOT is involved in a common idiom of zeroing out elements which don't satisfy a condition, i.e.

float x = ...;
const bool cond = ...;
if (!cond) x = 0;

In the present SIMD instruction set, a vectorized version of this snippet would require two WAsm SIMD instructions, v128.and and v128.not:

v128 x = ...;
const v128 cond = ...;
x = v128.and(x, v128.not(cond));

Representing ANDNOT as two instructions is inefficient on all architectures:

On ARM, ARM64, and PowerPC it requires two machine instructions even though ANDNOT has an exact equivalent in their SIMD instruction sets.
On x86 and x86-64 it requires three machine instructions because x86 SIMD extensions (until AVX512) do not include SIMD NOT operation. Thus, WAsm engine would typically generate three instructions: two (PXOR tmp_zero, tmp_zero to zero a temporary register and PANDNOT b, tmp_zero (!) to emulate v128.not), and an extra PAND.

This PR introduce combined ANDNOT instruction to enable WebAssembly engines to directly leverage architecture-specific ANDNOT instructions without doing complicated and expensive analysis of the instruction stream

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instruction can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX instruction set

c = v128.andnot(a, b) maps to VANDNPS xmm_c, xmm_b, xmm_a (note the inverted order of operands)

x86/x86-64 processors with SSE instruction set

c = v128.andnot(a, b) maps to MOVAPS xmm_c, xmm_b + ANDNPS xmm_c, xmm_a (note the inverted order of operands)
b = v128.andnot(a, b) maps to ANDNPS xmm_b, xmm_a (note the inverted order of operands)

ARMv7+ processors with NEON instruction set

c = v128.andnot(a, b) maps to VBIC Qc, Qa, Qb

ARM64 processors

c = v128.andnot(a, b) maps to BIC Vc.16B, Va.16B, Vb.16B

POWER processors with Vector facility (VMX)

c = v128.andnot(a, b) maps to VANDC Vc, Va, Vb

dtig · 2019-09-13T21:54:00Z

The NOT operation being sub optimal on x86 and x86-64 stood out during implementation, and having a more optimal ANDNOT operation when supported across architectures would make sense to me. After preliminary research, it doesn't look like any of the hardware instructions these operations map to have performance issues. That said, I'd like to make sure that the operations this is widely used. Could you add examples where this is already in use? Anything that makes use of the equivalent XMM/Neon intrinsics would be helpful.

Maratyszcza · 2019-09-14T01:32:41Z

GitHub search shows 127K+ files using _mm_andnot_ps (x86 instrinsic for ANDNOT) and 40K+ C++ files using vbicq_u32 (ARM NEON intrinsic for ANDNOT). From my experience, ANDNOT is useful when implementing vector mathematical functions, e.g. expf must return +0.0f when input is below -0x1.9FE368p+6f:

y = _mm_andnot_ps(_mm_cmpgt_ps(x, _mm_set1_ps(-0x1.9FE368p+6f)), y);

Note: inverting comparison and using _mm_and_ps wouldn't work, because it change result for NaN values.

dtig · 2019-09-19T19:17:00Z

Thanks @Maratyszcza! I'm in favor of merging this because this is not introducing a class of new operations, but a more efficient combination of operations that are already supported by this proposal, have an exact match on relevant architectures, is widely used as pointed out above. Are there any objections to including this in the proposal?

dtig

It doesn't look like there are any objections to the inclusion of these operations, so change lgtm with minor nits. Could you add an entry to ImplementationStatus.md as well?

dtig · 2019-09-23T18:05:37Z

proposals/simd/BinarySIMD.md

 | `v128.load`                |    `0x00`| m:memarg           |
 | `v128.store`               |    `0x01`| m:memarg           |
 | `v128.const`               |    `0x02`| i:ImmByte[16]      |
+| `v128.andnot`              |    `0x03`| -                  |


Could you move this down to the end of the opcode space instead of here? This belongs with the other logical operations, but there isn't a good spot for it right now. It's what we've tried to do with other new opcodes when it doesn't fit into the current opcode space.

Maratyszcza · 2019-09-24T11:19:04Z

Done: added an entry in ImplementationStatus.md and moved to the end of the opcode list.

As specified at WebAssembly/simd#102. Also fixes bugs in the JS API for other SIMD bitwise operators.

dtig suggested changes Sep 23, 2019

View reviewed changes

ANDNOT operation

b2f7a4c

Maratyszcza force-pushed the andnot branch from 0806ea8 to b2f7a4c Compare September 24, 2019 11:18

Maratyszcza requested a review from dtig September 24, 2019 11:19

Maratyszcza changed the title ~~Add ANDNOT operation~~ ANDNOT operation Sep 24, 2019

tlively added a commit to tlively/binaryen that referenced this pull request Sep 24, 2019

v128.andnot instruction

67c6435

As specified at WebAssembly/simd#102. Also fixes bugs in the JS API for other SIMD bitwise operators.

tlively mentioned this pull request Sep 24, 2019

v128.andnot instruction WebAssembly/binaryen#2355

Merged

dtig approved these changes Sep 24, 2019

View reviewed changes

arunetm merged commit f31b325 into WebAssembly:master Sep 24, 2019

tlively added a commit to WebAssembly/binaryen that referenced this pull request Sep 24, 2019

v128.andnot instruction (#2355)

034ed38

As specified at WebAssembly/simd#102. Also fixes bugs in the JS API for other SIMD bitwise operators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ANDNOT operation #102

ANDNOT operation #102

Uh oh!

Maratyszcza commented Sep 10, 2019

Uh oh!

dtig commented Sep 13, 2019

Uh oh!

Maratyszcza commented Sep 14, 2019 •

edited

Loading

Uh oh!

dtig commented Sep 19, 2019 •

edited

Loading

Uh oh!

dtig left a comment

Uh oh!

dtig Sep 23, 2019

Uh oh!

Maratyszcza commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ANDNOT operation #102

ANDNOT operation #102

Uh oh!

Conversation

Maratyszcza commented Sep 10, 2019

Introduction

Mapping to Common Instruction Sets

x86/x86-64 processors with AVX instruction set

x86/x86-64 processors with SSE instruction set

ARMv7+ processors with NEON instruction set

ARM64 processors

POWER processors with Vector facility (VMX)

Uh oh!

dtig commented Sep 13, 2019

Uh oh!

Maratyszcza commented Sep 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtig commented Sep 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtig left a comment

Choose a reason for hiding this comment

Uh oh!

dtig Sep 23, 2019

Choose a reason for hiding this comment

Uh oh!

Maratyszcza commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Maratyszcza commented Sep 14, 2019 •

edited

Loading

dtig commented Sep 19, 2019 •

edited

Loading