Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspicious code generated for rpc_fast on avx512spr-x32 #2831

Open
nurmukhametov opened this issue Apr 8, 2024 · 1 comment
Open

Suspicious code generated for rpc_fast on avx512spr-x32 #2831

nurmukhametov opened this issue Apr 8, 2024 · 1 comment
Labels
Performance All issues related to performance/code generation

Comments

@nurmukhametov
Copy link
Collaborator

ISPC generates suspicious code for rcp_fast function for avx512spr-x32 and avx512spr-x64 targets.

The following example

uniform double foo(uniform double x) {
    return rcp_fast(x);
}

compiled with

ispc --target=avx512spr-x32 test.c ...

generates

.LCPI1_0:
        .quad   0x46c8a6e32246c99c              # double 9.9999999999999995E+32
.LCPI1_1:
        .quad   0x3914c4e977ba1f5c              # double 1.0000000000000001E-33
.LCPI1_3:
        .quad   0x4000000000000000              # double 2
.LCPI1_2:
        .long   0x3f800000                      # float 1
foo___und:                              # @foo___und
        vmovsd  xmm1, qword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero
        vucomisd        xmm1, xmm0
        jb      .LBB1_3
        vucomisd        xmm0, qword ptr [rip + .LCPI1_1]
        jb      .LBB1_3
        vcvtsd2ss       xmm1, xmm0, xmm0
        vmovss  xmm2, dword ptr [rip + .LCPI1_2] # xmm2 = mem[0],zero,zero,zero
        vdivss  xmm1, xmm2, xmm1
        vcvtss2sd       xmm1, xmm1, xmm1
        vmovsd  xmm2, qword ptr [rip + .LCPI1_3] # xmm2 = mem[0],zero
        vmovapd xmm3, xmm0
        vfnmadd213sd    xmm3, xmm1, xmm2        # xmm3 = -(xmm1 * xmm3) + xmm2
        vmulsd  xmm1, xmm3, xmm1
        vfnmadd213sd    xmm0, xmm1, xmm2        # xmm0 = -(xmm1 * xmm0) + xmm2
        vmulsd  xmm0, xmm1, xmm0
        ret
.LBB1_3:
        vmovq   rax, xmm0
        movabs  rcx, 9214364837600034816
        and     rcx, rax
        movabs  rax, 9209861237972664319
        sub     rax, rcx
        vmovq   xmm1, rax
        vmulsd  xmm2, xmm1, xmm0
        vcvtsd2ss       xmm2, xmm2, xmm2
        vmovss  xmm3, dword ptr [rip + .LCPI1_2] # xmm3 = mem[0],zero,zero,zero
        vdivss  xmm2, xmm3, xmm2
        vcvtss2sd       xmm2, xmm2, xmm2
        vmulsd  xmm1, xmm1, xmm2
        vmovsd  xmm2, qword ptr [rip + .LCPI1_3] # xmm2 = mem[0],zero
        vmovapd xmm3, xmm0
        vfnmadd213sd    xmm3, xmm1, xmm2        # xmm3 = -(xmm1 * xmm3) + xmm2
        vmulsd  xmm1, xmm1, xmm3
        vfnmadd213sd    xmm0, xmm1, xmm2        # xmm0 = -(xmm1 * xmm0) + xmm2
        vmulsd  xmm0, xmm1, xmm0
        ret

Whereas for avx512spr-x16, it is just a single instruction

foo___und:                              # @foo___und
        vrcp14sd        xmm0, xmm0, xmm0
        ret
@dbabokin
Copy link
Collaborator

dbabokin commented Apr 8, 2024

It's this TODO:

;; TODO: need to use intrinsics

Here's this code in -x16 version:

define <16 x float> @__rcp_fast_varying_float(<16 x float>) nounwind readonly alwaysinline {

rcp_fast_* should map to pure instructions/intrinsics without extra refining steps.

@nurmukhametov nurmukhametov added the Performance All issues related to performance/code generation label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance All issues related to performance/code generation
Projects
None yet
Development

No branches or pull requests

2 participants