-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Unsafe.BitCast
for Int128
↔UInt128
operators
#104506
Conversation
Tagging subscribers to this area: @dotnet/area-system-numerics |
This in general LGTM, but it might be interesting to understand why promotion didn't handle this given it's a simple struct containing 2x ulong fields. CC. @jakobbotsch |
I would need some concrete cases to look at. As it is this looks like a size wise regression. @xtqqczze do you have concrete benchmarks showing improvements from the change? |
Is that not just because the system chose to inline more things as a result of this? If BitCast had been available when this code was first written, presumably we'd have chosen to use it then. |
Yeah, definitely looks like there are different inlining decisions here (both new inlines we perform and cases where we no longer inline, it looks like). Also looks like the second diff is a larger size-wise improvement than the first diff is a regression, so in that sense this isn't actually a size-wise regression (but as you said, it's hard to compare in the face of different inlining decisions). Either way I'd be happy to look at concrete cases if there are any, but I wasn't immediately able to identify anything that looks related to deficiencies in promotion in the diffs. If you and @tannergooding prefer |
That's what it looks like to me. Size diffs are always a little wonky for xarch due to the variable sized encoding and differing encoding cost for some sizes or registers. LSRA choosing to use R8 instead of RAX can lead to an additional byte, for example. In this case, it looks like we eliminate code, which allows inlining to kick in and that changes register preferences and some operation sizes causes the size to increase even though the number of instructions often decreases. For example in - mov rsi, rax
- or rsi, 1
- lzcnt rsi, rsi
- xor esi, 63
- movsxd rsi, esi
+ mov rdi, rax
+ or rdi, 1
+ lzcnt rdi, rdi
+ xor edi, 63
+ movsxd rdi, edi
+ mov rsi, 0xD1FFAB1E ; static handle
+ movzx rdi, byte ptr [rdi+rsi]
+ mov esi, edi
mov rdx, 0xD1FFAB1E ; static handle
- movzx rsi, byte ptr [rsi+rdx]
- mov edx, esi
- mov rcx, 0xD1FFAB1E ; static handle
- cmp rax, qword ptr [rcx+8*rdx]
- setb dl
- movzx rdx, dl
- sub esi, edx
- lea eax, [rsi+0x14]
+ cmp rax, qword ptr [rdx+8*rsi]
+ setb sil
+ movzx rsi, sil
+ sub edi, esi
+ lea eax, [rdi+0x14]
jmp SHORT G_M10567_IG08
- ;; size=106 bbWeight=0.50 PerfScore 9.25
+ ;; size=108 bbWeight=0.50 PerfScore 9.25 This code is 2 bytes bigger, but it actually hasn't fundamentally changed and if you were to replace the registers used with abstract names like We see quite a lot of diffs that are in this realm and it'd probably be beneficial to track the number of instructions in addition to the size so we can get a better view over what's actually changed. Later on in the method we get |
I think I prefer |
Thanks. |
Diffs show increased inlining and tail calls.
MihuBot/runtime-utils#478
MihuBot/runtime-utils#479