Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Math(F).FusedMultiplyAdd codegen #27060

Open
wants to merge 11 commits into
base: master
from

Conversation

@EgorBo
Copy link
Contributor

commented Oct 7, 2019

Fixes #25829 (currently Math(F).FusedMultiplyAdd always emits vfmadd213ss\d and xors if there are negations)

Test cases:

static float Test1(float a, float b, float c) => MathF.FusedMultiplyAdd( a,  b,  c);
static float Test2(float a, float b, float c) => MathF.FusedMultiplyAdd( a, -b,  c);
static float Test3(float a, float b, float c) => MathF.FusedMultiplyAdd(-a,  b,  c);
static float Test4(float a, float b, float c) => MathF.FusedMultiplyAdd(-a, -b,  c);
static float Test5(float a, float b, float c) => MathF.FusedMultiplyAdd( a,  b, -c);
static float Test6(float a, float b, float c) => MathF.FusedMultiplyAdd( a, -b, -c);
static float Test7(float a, float b, float c) => MathF.FusedMultiplyAdd(-a,  b, -c);
static float Test8(float a, float b, float c) => MathF.FusedMultiplyAdd(-a, -b, -c);

Was:

; Method FmaFTests:Test1(float,float,float):float
G_M46841_IG01:
       vzeroupper 
G_M46841_IG02:
       vfmadd213ss xmm0, xmm1, xmm2
G_M46841_IG03:
       ret      
; Total bytes of code: 9


; Method FmaFTests:Test2(float,float,float):float
G_M46842_IG01:
       vzeroupper 
G_M46842_IG02:
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm1, xmm3
       vfmadd213ss xmm0, xmm1, xmm2
G_M46842_IG03:
       ret      
RWD00  dd	80000000h
; Total bytes of code: 21


; Method FmaFTests:Test3(float,float,float):float
G_M46843_IG01:
       vzeroupper 
G_M46843_IG02:
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm0, xmm3
       vfmadd213ss xmm0, xmm1, xmm2
G_M46843_IG03:
       ret      
RWD00  dd	80000000h
; Total bytes of code: 21


; Method FmaFTests:Test4(float,float,float):float
G_M46844_IG01:
       vzeroupper 
G_M46844_IG02:
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm0, xmm3
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm1, xmm3
       vfmadd213ss xmm0, xmm1, xmm2
G_M46844_IG03:
       ret      
RWD00  dd	80000000h
; Total bytes of code: 33


; Method FmaFTests:Test5(float,float,float):float
G_M46845_IG01:
       vzeroupper 
G_M46845_IG02:
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm2, xmm3
       vfmadd213ss xmm0, xmm1, xmm2
G_M46845_IG03:
       ret      
RWD00  dd	80000000h
; Total bytes of code: 21


; Method FmaFTests:Test6(float,float,float):float
G_M46846_IG01:
       vzeroupper 
G_M46846_IG02:
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm1, xmm3
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm2, xmm3
       vfmadd213ss xmm0, xmm1, xmm2
G_M46846_IG03:
       ret      
RWD00  dd	80000000h
; Total bytes of code: 33


; Method FmaFTests:Test7(float,float,float):float
G_M46847_IG01:
       vzeroupper 
G_M46847_IG02:
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm0, xmm3
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm2, xmm3
       vfmadd213ss xmm0, xmm1, xmm2
G_M46847_IG03:
       ret      
RWD00  dd	80000000h
; Total bytes of code: 33


; Method FmaFTests:Test8(float,float,float):float
G_M46832_IG01:
       vzeroupper 
G_M46832_IG02:
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm0, xmm3
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm1, xmm3
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm2, xmm3
       vfmadd213ss xmm0, xmm1, xmm2
G_M46832_IG03:
       ret      
RWD00  dd	80000000h
; Total bytes of code: 45

Now:

; Method FmaFTests:Test1(float,float,float):float
G_M12796_IG01:
       vzeroupper 
G_M12796_IG02:
       vfmadd213ss xmm0, xmm1, xmm2
G_M12796_IG03:
       ret      
; Total bytes of code: 9


; Method FmaFTests:Test2(float,float,float):float
G_M12799_IG01:
       vzeroupper 
G_M12799_IG02:
       vfnmadd213ss xmm0, xmm1, xmm2
G_M12799_IG03:
       ret      
; Total bytes of code: 9


; Method FmaFTests:Test3(float,float,float):float
G_M12798_IG01:
       vzeroupper 
G_M12798_IG02:
       vfnmadd213ss xmm0, xmm1, xmm2
G_M12798_IG03:
       ret      
; Total bytes of code: 9


; Method FmaFTests:Test4(float,float,float):float
G_M12793_IG01:
       vzeroupper 
G_M12793_IG02:
       vfmadd213ss xmm0, xmm1, xmm2
G_M12793_IG03:
       ret      
; Total bytes of code: 9


; Method FmaFTests:Test5(float,float,float):float
G_M12792_IG01:
       vzeroupper 
G_M12792_IG02:
       vfmsub213ss xmm0, xmm1, xmm2
G_M12792_IG03:
       ret      
; Total bytes of code: 9


; Method FmaFTests:Test6(float,float,float):float
G_M12795_IG01:
       vzeroupper 
G_M12795_IG02:
       vfnmsub213ss xmm0, xmm1, xmm2
G_M12795_IG03:
       ret      
; Total bytes of code: 9


; Method FmaFTests:Test7(float,float,float):float
G_M12794_IG01:
       vzeroupper 
G_M12794_IG02:
       vfnmsub213ss xmm0, xmm1, xmm2
G_M12794_IG03:
       ret      
; Total bytes of code: 9


; Method FmaFTests:Test8(float,float,float):float
G_M12789_IG01:
       vzeroupper 
G_M12789_IG02:
       vfmsub213ss xmm0, xmm1, xmm2
G_M12789_IG03:
       ret      
; Total bytes of code: 9

Diff.
/cc @tannergooding

EgorBo added 2 commits Oct 7, 2019
@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 7, 2019

This PR doesn't improve:

float t = MathF.FusedMultiplyAdd(x, y, y);

It's expected to be

vfmadd213ss xmm0 xmm1 xmm1

but it emits a redundant mov:

vmovaps  xmm2, xmm1
vfmadd213ss xmm0, xmm2, xmm1

The goal was to make this func:

static float Lerp(float v0, float v1, float t) =>
    MathF.FusedMultiplyAdd(t, v1, MathF.FusedMultiplyAdd(-t, v0, v0));

to have a perfect codegen

EgorBo added 2 commits Oct 7, 2019
src/jit/importer.cpp Outdated Show resolved Hide resolved
@sandreenko

This comment has been minimized.

Copy link
Member

commented Oct 7, 2019

src/jit/lowerxarch.cpp Outdated Show resolved Hide resolved
EgorBo added 3 commits Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.