-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double constants usage in a loop can be CSEed #35257
Comments
As I discussed offline with @kunalspathak , doing this without unduly pessimizing cases with high register pressure may also require rematerialization (#6264) |
I guess this one is not arm specific as I see exactly the same picture on x86 |
Agree. I will update the title/label. |
Ah, by "hoisted out of loop" I meant to save gcc saves them: https://godbolt.org/z/gToudN |
fyi, for x64 it looks like we're creating multiple constant pool entries for identical values, so I added #35268 to address that. |
(At least on x64) this can be worked around by BitConverter-tricks* A more realistic repro is (pre-)computing** some heavy values to a table / computing results in batches like double x = 0d;
for (nuint i = 0; i < N; ++i, x += 0.01)
{
table[i] = Math.Sin(x); // table is of type double*
} ; ...
M00_L00:
vmovaps xmm0,xmm6
call System.Math.Sin(Double)
vmovsd qword ptr [rsi+rdi*8],xmm0
inc rdi
vaddsd xmm6,xmm6,qword ptr [7FFEBB6C48F8]
cmp rdi,3E8
jb short M00_L00
; ... It makes no difference if the increment The workaround here is private static readonly long s_inc = BitConverter.DoubleToInt64Bits(0.01);
private static double Inc => BitConverter.Int64BitsToDouble(s_inc);
// ...
double x = 0d;
for (nuint i = 0; i < N; ++i, x += Inc)
{
table[i] = Math.Sin(x);
} (one needs to set ; ...
xor edi,edi
mov rax,7AE147AE147B
vmovq xmm7,rax
M00_L00:
vmovaps xmm0,xmm6
call System.Math.Sin(Double)
vmovsd qword ptr [rsi+rdi*8],xmm0
inc rdi
vmovaps xmm0,xmm7 ; not needed
vaddsd xmm6,xmm6,xmm0
cmp rdi,3E8
jb short M00_L00
; ... (note 1: at the comment * BitConverter uses SSE2 as workaround |
We do hoist these now for x86/x64. We don't for Arm64 because these are viewed as "cheap" to materialize given they are small constants that can be embedded as an immediate. They are given a We could play with increasing the We could also play with allowing a something like I think either would give a good balance between ensuring loop code stays efficient and ensuring that we don't accidentally pessimize codegen for cases where hoisting a constant prevents us from optimizing. |
Personally, I think the idea of allowing anything to be CSE'd is the better option. We should have already done forward sub, morph, value numbering, and CSE by the time we get to proper constant prop. So, we should have already a decent view of things. One might want to have value numbering or CSE account for cases where we would try to undo a CSE, to ensure costing remains correctly tracked, but that is likely a more complex change than a limited amount of "undo" for special constants like |
Doubles present in a loop are reloaded repeatedly. Instead they can be set just once out of loop and use it inside the loop.
category:cq
theme:cse
skill-level:expert
cost:medium
impact:medium
The text was updated successfully, but these errors were encountered: