Suspicious code generated for rpc_fast on avx512spr-x32 #2831

nurmukhametov · 2024-04-08T16:06:56Z

ISPC generates suspicious code for rcp_fast function for avx512spr-x32 and avx512spr-x64 targets.

The following example

uniform double foo(uniform double x) {
    return rcp_fast(x);
}

compiled with

ispc --target=avx512spr-x32 test.c ...

generates

.LCPI1_0:
        .quad   0x46c8a6e32246c99c              # double 9.9999999999999995E+32
.LCPI1_1:
        .quad   0x3914c4e977ba1f5c              # double 1.0000000000000001E-33
.LCPI1_3:
        .quad   0x4000000000000000              # double 2
.LCPI1_2:
        .long   0x3f800000                      # float 1
foo___und:                              # @foo___und
        vmovsd  xmm1, qword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero
        vucomisd        xmm1, xmm0
        jb      .LBB1_3
        vucomisd        xmm0, qword ptr [rip + .LCPI1_1]
        jb      .LBB1_3
        vcvtsd2ss       xmm1, xmm0, xmm0
        vmovss  xmm2, dword ptr [rip + .LCPI1_2] # xmm2 = mem[0],zero,zero,zero
        vdivss  xmm1, xmm2, xmm1
        vcvtss2sd       xmm1, xmm1, xmm1
        vmovsd  xmm2, qword ptr [rip + .LCPI1_3] # xmm2 = mem[0],zero
        vmovapd xmm3, xmm0
        vfnmadd213sd    xmm3, xmm1, xmm2        # xmm3 = -(xmm1 * xmm3) + xmm2
        vmulsd  xmm1, xmm3, xmm1
        vfnmadd213sd    xmm0, xmm1, xmm2        # xmm0 = -(xmm1 * xmm0) + xmm2
        vmulsd  xmm0, xmm1, xmm0
        ret
.LBB1_3:
        vmovq   rax, xmm0
        movabs  rcx, 9214364837600034816
        and     rcx, rax
        movabs  rax, 9209861237972664319
        sub     rax, rcx
        vmovq   xmm1, rax
        vmulsd  xmm2, xmm1, xmm0
        vcvtsd2ss       xmm2, xmm2, xmm2
        vmovss  xmm3, dword ptr [rip + .LCPI1_2] # xmm3 = mem[0],zero,zero,zero
        vdivss  xmm2, xmm3, xmm2
        vcvtss2sd       xmm2, xmm2, xmm2
        vmulsd  xmm1, xmm1, xmm2
        vmovsd  xmm2, qword ptr [rip + .LCPI1_3] # xmm2 = mem[0],zero
        vmovapd xmm3, xmm0
        vfnmadd213sd    xmm3, xmm1, xmm2        # xmm3 = -(xmm1 * xmm3) + xmm2
        vmulsd  xmm1, xmm1, xmm3
        vfnmadd213sd    xmm0, xmm1, xmm2        # xmm0 = -(xmm1 * xmm0) + xmm2
        vmulsd  xmm0, xmm1, xmm0
        ret

Whereas for avx512spr-x16, it is just a single instruction

foo___und:                              # @foo___und
        vrcp14sd        xmm0, xmm0, xmm0
        ret

The text was updated successfully, but these errors were encountered:

dbabokin · 2024-04-08T19:10:28Z

It's this TODO:

ispc/builtins/target-avx512spr-x32.ll

Line 536 in ea4617c

;; TODO: need to use intrinsics

Here's this code in -x16 version:

ispc/builtins/target-avx512spr-x16.ll

Line 22 in ea4617c

    
           define <16 x float> @__rcp_fast_varying_float(<16 x float>) nounwind readonly alwaysinline {

rcp_fast_* should map to pure instructions/intrinsics without extra refining steps.

nurmukhametov added the Performance All issues related to performance/code generation label Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suspicious code generated for rpc_fast on avx512spr-x32 #2831

Suspicious code generated for rpc_fast on avx512spr-x32 #2831

nurmukhametov commented Apr 8, 2024

dbabokin commented Apr 8, 2024

Suspicious code generated for rpc_fast on avx512spr-x32 #2831

Suspicious code generated for rpc_fast on avx512spr-x32 #2831

Comments

nurmukhametov commented Apr 8, 2024

dbabokin commented Apr 8, 2024