Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxed Rounding Q-format Multiplication #40

Open
Maratyszcza opened this issue Oct 1, 2021 · 1 comment
Open

Relaxed Rounding Q-format Multiplication #40

Maratyszcza opened this issue Oct 1, 2021 · 1 comment
Labels
in-overview Instruction has been added to Overview.md instruction-proposal

Comments

@Maratyszcza
Copy link
Collaborator

Maratyszcza commented Oct 1, 2021

What are the instructions being proposed?

I propose a relaxed version of the Saturating Rounding Q-format Multiplication i16x8.q15mulr_sat_s introduced in WebAssembly/simd#365. I suggest i16x8.q15mulr_s as the tentative name for the relaxed instruction.

What are the semantics of these instructions?

i16x8.q15mulr_sat_s implements the mathematical operation of multiplication of fixed-point numbers in Q15 format (see WebAssembly/simd#365 for details). The multiplication overflows if and only if both inputs are INT16_MIN, and x86 SSSE3 and ARM NEON instructions differ in how they handle this situation: x86 version wraps around while ARM version saturates. WebAssembly SIMD instruction i16x8.q15mulr_sat_s standardized on the ARM overflow semantics, resulting in additional overflow checks on x86. However, as the case of both inputs INT16_MIN is rare and often can be guaranteed to never happen due to higher-level structure of an algorithm, having an relaxed version that allows both overflow options would help performance on x86.

The proposed i16x8.q15mulr_s Relaxed SIMD instruction computes the lane-wise rounded multiplication of Q15 numbers, and allows for either saturation or wrap-around behavior in the overflow case (where both inputs are INT16_MIN).

How will these instructions be implemented?

x86/x86-64 processors with AVX instruction set

  • y = i16x8.q15mulr_s(a, b) is lowered to VPMULHRSW xmm_y, xmm_a, xmm_b

x86/x86-64 processors with SSSE3 instruction set

  • y = i16x8.q15mulr_s(a, b) is lowered to MOVDQA xmm_y, xmm_a + PMULHRSW xmm_y, xmm_b

x86/x86-64 processors with SSE2 instruction set

  • y = i16x8.q15mulr_s(a, b) (y is NOT a and y is NOT b) is lowered to
    • MOVDQA xmm_y, xmm_a
    • MOVDQA xmm_tmp, xmm_a
    • PMULLW xmm_y, xmm_b
    • PMULHW xmm_tmp, xmm_b
    • PSRLW xmm_y, 14
    • PADDW xmm_tmp, xmm_tmp
    • PAVGW xmm_y, wasm_i16x8_splat(0)
    • PADDW xmm_y, xmm_tmp

ARM64 processors

  • y = i16x8.q15mulr_s(a, b) is lowered to SQRDMULH Vy.8H, Va.8H, Vb.8H

ARMv7 processors with NEON instruction set

  • y = i16x8.q15mulr_s(a, b) is lowered to VQRDMULH.S16 Qy, Qa, Qb

Reference lowering through the WAsm SIMD128 instruction set

  • y = i16x8.q15mulr_s(a, b) is lowered as y = i16x8.q15mulr_sat_s(a, b)

How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

When both inputs are INT16_MIN, x86/x86-64 will produce INT16_MIN result while ARM/ARM64 will produce INT16_MAX result. x86/x86-64 can already be distinguished from ARM/ARM64 based on NaN behavior, so this instruction doesn't add any new fingerprinting surfaces.

What use cases are there?

@ngzhian ngzhian added the outstanding instruction proposed instructions not yet added to overview label Feb 18, 2022
@ngzhian
Copy link
Member

ngzhian commented Feb 18, 2022

Instruction LGTM, please leave comments or thumbs up. I will add this to overview some time next week.

@ngzhian ngzhian added in-overview Instruction has been added to Overview.md and removed outstanding instruction proposed instructions not yet added to overview labels Mar 7, 2022
tlively added a commit to WebAssembly/binaryen that referenced this issue Apr 7, 2022
tlively added a commit to WebAssembly/binaryen that referenced this issue Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-overview Instruction has been added to Overview.md instruction-proposal
Projects
None yet
Development

No branches or pull requests

2 participants