Possible suboptimal code generation for SIMD `any` function #72413

karwa · 2024-03-19T05:44:22Z

Description

Test code:

func test_stdlib_8(_ input: SIMD8<UInt8>) -> Bool {
    any(input .== SIMD8(repeating: 0x42))
}

Building this with -O produces:

.LCPI1_0:
        .byte   66
        .byte   66
        .byte   66
        .byte   66
        .byte   66
        .byte   66
        .byte   66
        .byte   66
        .zero   1
        .zero   1
        .zero   1
        .zero   1
        .zero   1
        .zero   1
        .zero   1
        .zero   1
output.test_stdlib_8(Swift.SIMD8<Swift.UInt8>) -> Swift.Bool:
        movq    xmm0, rdi
        pcmpeqb xmm0, xmmword ptr [rip + .LCPI1_0]
        pmovmskb        eax, xmm0
        test    al, al
        setne   al
        ret

Which is nice 👍

Unfortunately, when I widen the vector to 16+ elements, the any function becomes a massive, outlined, glob of code:

func test_stdlib_16(_ input: SIMD16<UInt8>) -> Bool {
    any(input .== SIMD16(repeating: 0x42))
}

.LCPI3_0:
        .zero   16,66
output.test_stdlib_16(Swift.SIMD16<Swift.UInt8>) -> Swift.Bool:
        push    rax
        pcmpeqb xmm0, xmmword ptr [rip + .LCPI3_0]
        call    (generic specialization <Swift.SIMD16<Swift.Int8>> of (extension in Swift):Swift.SIMD< where A.Scalar: Swift.Comparable>.min() -> A.Scalar)
        shr     al, 7
        pop     rcx
        ret

generic specialization <Swift.SIMD16<Swift.Int8>> of (extension in Swift):Swift.SIMD< where A.Scalar: Swift.Comparable>.min() -> A.Scalar:
        pshufd  xmm1, xmm0, 238
        movdqa  xmm2, xmm1
        pcmpgtb xmm2, xmm0
        pand    xmm0, xmm2
        pandn   xmm2, xmm1
        por     xmm2, xmm0
        pshufd  xmm0, xmm2, 85
        movdqa  xmm1, xmm0
        pcmpgtb xmm1, xmm2
        pand    xmm2, xmm1
        pandn   xmm1, xmm0
        por     xmm1, xmm2
        movdqa  xmm0, xmm1
        psrld   xmm0, 16
        movdqa  xmm2, xmm0
        pcmpgtb xmm2, xmm1
        pand    xmm1, xmm2
        pandn   xmm2, xmm0
        por     xmm2, xmm1
        movdqa  xmm0, xmm2
        psrlw   xmm0, 8
        movdqa  xmm1, xmm0
        pcmpgtb xmm1, xmm2
        pand    xmm2, xmm1
        pandn   xmm1, xmm0
        por     xmm1, xmm2
        movd    eax, xmm1
        ret

The SIMD mask is 16 bytes, and the any function basically amounts to mask != 0, so... even though I'm not an expert at SIMD instruction sets, it feels like this is probably not optimal.

Even if I enable all the advanced modern instruction sets I can think of (-O -Xcc -msse -Xcc -msse2 -Xcc -mavx -Xcc -mavx2), the code generated for the any function still feels suboptimal:

.LCPI5_0:
        .zero   16,128
generic specialization <Swift.SIMD16<Swift.Int8>> of (extension in Swift):Swift.SIMD< where A.Scalar: Swift.Comparable>.min() -> A.Scalar:
        vpxor   xmm0, xmm0, xmmword ptr [rip + .LCPI5_0]
        vpsrlw  xmm1, xmm0, 8
        vpminub xmm0, xmm0, xmm1
        vphminposuw     xmm0, xmm0
        vmovd   eax, xmm0
        add     al, -128
        ret

Reproduction

See above.

Also Godbolt

Expected behavior

Intuitively, I would expect any(SIMDMask<SIMD16<Int16>>) to compile down to far fewer instructions than it does. At the very least, it seems it could be implemented using two 64-bit comparisons to zero, which I have to believe it more efficient than the code we're generating today.

Environment

Swift version 6.0-dev (LLVM d1625da873daa4c, Swift bae6450)
Target: x86_64-unknown-linux-gnu

Additional information

No response

The text was updated successfully, but these errors were encountered:

stephentyrone · 2024-03-19T16:07:15Z

These optimizations all happen at the LLVM level; there's some work that we can maybe do in Swift so that they're not needed, however.

karwa added bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. triage needed This issue needs more specific labels labels Mar 19, 2024

hborla added SILOptimizer Area → compiler: SIL optimization passes simd and removed triage needed This issue needs more specific labels labels Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible suboptimal code generation for SIMD `any` function #72413

Possible suboptimal code generation for SIMD `any` function #72413

karwa commented Mar 19, 2024

stephentyrone commented Mar 19, 2024

Possible suboptimal code generation for SIMD any function #72413

Possible suboptimal code generation for SIMD any function #72413

Comments

karwa commented Mar 19, 2024

Description

Reproduction

Expected behavior

Environment

Additional information

stephentyrone commented Mar 19, 2024

Possible suboptimal code generation for SIMD `any` function #72413

Possible suboptimal code generation for SIMD `any` function #72413