Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorise _mm_minpos_epu16 #551

Merged
merged 1 commit into from
Nov 2, 2022
Merged

Vectorise _mm_minpos_epu16 #551

merged 1 commit into from
Nov 2, 2022

Conversation

AymenQ
Copy link
Collaborator

@AymenQ AymenQ commented Nov 1, 2022

Use vector instructions to find the index of the minimum element for _mm_minpos_epu16. This avoids going through several comparisons and is generally much faster when the minimum value is not just the first element.

Codegen difference w/ GCC is included in the commit message.

Use vector instructions to find the index of the minimum element for
_mm_minpos_epu16. This avoids going through several comparisons and is
generally much faster when the minimum value is not just the first
element.

Example codegen for _mm_minpos_epu16(a) with GCC 11.2.0 (-O3)

Prior to commit:
    uminv   h1, v0.8h
    sub     sp, sp, 0x10
    umov    w1, v0.h[0]
    str     h1, [sp, 14]
    ldrh    w0, [sp, 14]
    cmp     w1, w0
    b.eq    0x400aec
    movi    v1.4s, 0x0
    ext     v0.16b, v0.16b, v1.16b, 2
    umov    w1, v0.h[0]
    cmp     w1, w0
    b.eq    0x400b10
    ext     v0.16b, v0.16b, v1.16b, 2
    umov    w1, v0.h[0]
    cmp     w1, w0
    b.eq    0x400b18
    ext     v0.16b, v0.16b, v1.16b, 2
    umov    w1, v0.h[0]
    cmp     w1, w0
    b.eq    0x400b20
    ext     v0.16b, v0.16b, v1.16b, 2
    umov    w1, v0.h[0]
    cmp     w1, w0
    b.eq    0x400b28
    ext     v0.16b, v0.16b, v1.16b, 2
    umov    w1, v0.h[0]
    cmp     w1, w0
    b.eq    0x400b30
    ext     v0.16b, v0.16b, v1.16b, 2
    umov    w1, v0.h[0]
    cmp     w1, w0
    b.eq    0x400b38
    ext     v0.16b, v0.16b, v1.16b, 2
    fmov    s1, wzr
    umov    w1, v0.h[0]
    cmp     w1, w0
    b.eq    0x400b40
    movi    v0.4s, 0x0
    add     x0, sp, 0xe
    ld1     {v0.h}[0], [x0]
    add     sp, sp, 0x10
    mov     v0.h[1], v1.h[0]
    ret
    mov     w0, 0x0
    and     w0, w0, 0xffff
    movi    v0.4s, 0x0
    fmov    s1, w0
    add     x0, sp, 0xe
    ld1     {v0.h}[0], [x0]
    add     sp, sp, 0x10
    mov     v0.h[1], v1.h[0]
    ret
    mov     w0, 0x1
    b       0x400af0
    mov     w0, 0x2
    b       0x400af0
    mov     w0, 0x3
    b       0x400af0
    mov     w0, 0x4
    b       0x400af0
    mov     w0, 0x5
    b       0x400af0
    mov     w0, 0x6
    b       0x400af0
    mov     w0, 0x7

After commit:
    mov     v2.16b, v0.16b
    uminv   h0, v0.8h
    adrp    x0, 0x400000
    movi    v3.4s, 0x0
    ldr     q4, [x0, 2864]
    dup     v1.8h, v0.h[0]
    mov     v3.h[0], v0.h[0]
    cmeq    v1.8h, v1.8h, v2.8h
    mov     v0.16b, v3.16b
    orn     v1.16b, v4.16b, v1.16b
    uminv   h1, v1.8h
    mov     v0.h[1], v1.h[0]

Co-authored-by: George Steed <george.steed@arm.com>
Co-authored-by: Aymen Qader <aymen.qader@arm.com>
@jserv jserv merged commit 923b911 into DLTcollab:master Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants