-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HWIntrinsics: FMA suboptimal codegen #12212
Comments
This shouldn't be the case. We have a check here: https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsiccodegenxarch.cpp#L2354 Could you share a minimal repro I could look at? |
Yeah, this one matches the second condition in the |
I'm not sure why that is missing either, let me test out a fix real quick. |
Cool, I can make up a repro if that would help. I can confirm that swapping the first two arguments in my example causes |
A repro that I can directly test would be helpful 😄. Otherwise, I am left just testing the existing tests and local code. |
Ah, right. I remember why this is hard now. The code right now is choosing: lea rbx,[rdi+10h]
vfmadd132ps xmm4,xmm1,xmmword ptr [rbx]
vmovaps xmm1,xmm4 After the vfmadd132ps xmm4,xmm1,xmmword ptr [rdi+10h]
vmovaps xmm1,xmm4 We start off with This means we now have Given that |
Here's a quick one. using System;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
struct vec
{
public float f1;
public float f2;
public float f3;
public float f4;
}
class Program
{
static unsafe float fmaTest1()
{
vec b;
var a = Vector128.Create(1f);
var c = Vector128.Create(2f);
var d = Vector128.Create(3f);
c = Fma.MultiplyAdd(a, Sse.LoadVector128((float*)&b), c);
return Sse.Add(c, d).ToScalar();
}
static unsafe float fmaTest2()
{
vec b;
var a = Vector128.Create(1f);
var c = Vector128.Create(2f);
var d = Vector128.Create(3f);
c = Fma.MultiplyAdd(Sse.LoadVector128((float*)&b), a, c);
return Sse.Add(c, d).ToScalar();
}
static void Main(string[] args)
{
Console.WriteLine(fmaTest1());
Console.WriteLine(fmaTest2());
}
}
vmovss xmm0,dword ptr[7FFCBB577598h]
vbroadcastss xmm0,xmm0
vmovss xmm1,dword ptr[7FFCBB57759Ch]
vbroadcastss xmm1,xmm1
vmovss xmm2,dword ptr[7FFCBB5775A0h]
vbroadcastss xmm2,xmm2
lea rax,[rsp+18h]
vfmadd132ps xmm0,xmm1,xmmword ptr[rax]
vmovaps xmm1,xmm0
vaddps xmm0,xmm1,xmm2
vmovapd xmmword ptr[rsp], xmm0
vmovss xmm0,dword ptr[rsp]
vmovss xmm0,dword ptr[7FFCBB5794B8h]
vbroadcastss xmm0,xmm0
vmovss xmm1,dword ptr[7FFCBB5794BCh]
vbroadcastss xmm1,xmm1
vmovss xmm2,dword ptr[7FFCBB5794C0h]
vbroadcastss xmm2,xmm2
lea rax,[rsp+18h]
vfmadd231ps xmm1,xmm0,xmmword ptr[rax]
vaddps xmm0,xmm1,xmm2
vmovapd xmmword ptr[rsp], xmm0
vmovss xmm0,dword ptr[rsp] |
(Continued from https://github.com/dotnet/coreclr/issues/23115#issuecomment-470785697)
In the register allocator, we are already not setting a I think the problem is that we aren't really able to tell the register allocator that any of But, I'm not sure we have something tracking the former (that is, being able to say any node can be contained; but we need to ultimately decide on just Maybe @CarolEidt has some ideas here (but I would guess this is unlikely for 3.0). |
Ah, so it looks like this is a dupe of https://github.com/dotnet/coreclr/issues/20480. Not sure why I didn't see that one when I searched. |
Marking this as future... |
This was resolved by #58196 |
This code:
currently compiles to:
Assuming dotnet/coreclr#22944 would eliminate the extra
lea
there, I believe this should be generating:It looks like the logic in
genFMAIntrinsic
is missing the fact the two non-contained arguments could be swapped here.cc @tannergooding
category:cq
theme:hardware-intrinsics
skill-level:expert
cost:medium
The text was updated successfully, but these errors were encountered: